Frontier Leaderboards
Legacy Leaderboards
2025 Scale AI. All rights reserved.
Humanity's Last Exam
Challenging LLMs at the frontier of human knowledge
Last updated: April 30, 2025
Performance Comparison
1
20.32±1.58Calib Err: 34
1
19.20±1.54Calib Err: 39
1
18.16±1.51Calib Err: 71
1
18.08±1.51Calib Err: 57
1
17.80±1.50Calib Err: 70
6
14.28±1.37Calib Err: 59
6
12.08±1.28Calib Err: 80
8
8.12±1.07Calib Err: 82
8
8.04±1.07Calib Err: 80
8
7.96±1.06Calib Err: 83
8
6.56±0.97Calib Err: 82
8
5.68±0.91Calib Err: 83
11
5.44±0.89Calib Err: 85
11
5.40±0.89Calib Err: 89
12
4.60±0.82Calib Err: 88
12
4.52±0.81Calib Err: 77
13
4.40±0.80Calib Err: 80
13
4.08±0.78Calib Err: 84
15
3.64±0.73Calib Err: 82
18
2.72±0.64Calib Err: 89