LLM Leaderboard
Find the best LLMs
Find the best LLMs
Find the best LLMs
for your use case
for your use case
for your use case
Compare LLMs against standard benchmarks.
Choose the best AI model for your GenAI-powered apps.
Best LLMs Per Task
Best LLMs Per Task
Best LLMs Per Task
Discover the best LLMs for common tasks like multilingual Q&A, multi-task reasoning, and math problem-solving.
Reasoning (GPQA Diamond³)
84.8%
Claude 3.7 Sonnet (64K extended thinking)
84.6%
Grok 3 Beta (Extended thinking)
79.7%
OpenAI o3-mini¹
78%
OpenAI o1¹
71.5%
DeepSeek R1 (32K extended thinking)
Multilingual Q&A (MMMLU)
87.7%
OpenAI o1¹
86.1%
Claude 3.7 Sonnet (64K extended thinking)
82.1%
Claude 3.5 Sonnet
83.2%
Claude 3.7 Sonnet (No extended thinking)
79.5%
OpenAI o3-mini¹
Math Problem-Solving (MATH 500)
97.9%
OpenAI o3-mini¹
97.3%
DeepSeek R1 (32K extended thinking)
96.4%
OpenAI o1¹
96.2%
Claude 3.7 Sonnet (64K extended thinking)
82.2%
Claude 3.7 Sonnet (No extended thinking)
Fast & Cheapest LLMs
Fast & Cheapest LLMs
Fast & Cheapest LLMs
Fastest Models
2100
Llama 70b
723
Llama 8b
237
O-1 mini
150
Nova lite
11
GPT-4.0 mini
tokens/second
Lowest Latency
(TTFT)
0.3s
Nova Micro
0.3s
1.5 Flash
0.3s
Llama 8b
0.4s
Nova Pro
0.4s
Nova Lite
Cheapest Models
Input Cost
Output Cost
$0.6
$0.45
$0.3
$0.15
$0
Nova Micro
1.5 Flash
LIama 70b
GPT-4o mini
Model Comparison
Model
Release Date
Context Window
Input Cost / 1M tokens
Output Cost / 1M tokens
Average
MMLU (General)
GPQA (Reasoning)
HumanEval (Coding)
Math
BFCL (Tool Use)
AWS Nova Lite
03/12/2024
300000
$0
$0
N/A
80.50%
42%
85.40%
73.30%
66.60%
AWS Nova Micro
03/12/2024
300000
$0
$0
N/A
77.60%
40%
81.10%
69.30%
56.20%
AWS Nova Pro
03/12/2024
300000
$0
$0
N/A
85.90%
46.90%
89%
76.60%
68.40%
Claude 3 Haiku
13/03/2024
200000
$0.25
$1.25
62.90%
75.20%
35.70%
75.90%
38.90%
74.65%
Claude 3 Opus
14/3/2024
200000
$15
$75
76.70%
85.70%
50.40%
84.90%
60.10%
88.40%
Claude 3.5 Haiku
22/10/2024
200000
$0.80
$4
68.30%
65%
41.60%
88.10%
69.40%
60%
Claude 3.7
Sonnet
24/02/2025
200000
$3
$15
N/A
83.20%
68%
N/A
82.20%
N/A
Claude 3 Sonnet
(Reasoner)
20/06/2024
200000
$3
$15
N/A
N/A
N/A
N/A
N/A
N/A
DeepSeek R1
20/01/2025
128000
$0.55
$2.19
N/A
90.8%
71.5%
N/A
97.3%
N/A
DeepSeek V3
26/12/2024
128000
$0.27
$1.10
76.24%
88.50%
59.10%
82.60%
90.20%
57.23%
GPT-4.5
27/02/2025
128000
$25
$150
N/A
89.60%
71.4%
76%
36.7%
N/A
GPT-3.5 Turbo
30/11/2022
16000
$0.50
$1.50
59.20%
69.80%
30.80%
68%
34.10%
64.41%
GPT-4
14/03/2023
8000
$30
$60
75.50%
86.40%
41.40%
86.60%
64.50%
88.30%
GPT-4o
13/05/2024
128000
$5
$15
80.50%
88.70%
53.60%
90.20%
76.60%
83.59%
GPT-4o mini
18/07/2024
128000
$0.15
$0.60
N/A
82%
40.20%
87.20%
70.20%
N/A
Gemini 1.5 Flash
14/05/2024
1000000
$0.35
$0.70
66.70%
78.90%
39.50%
71.50%
54.90%
79.88%
Gemini 1.5 Pro
24/09/2024
128000
$7
$21
74.10%
85.90%
46.20%
71.90%
67..70%
84.35%
Gemini 2.0 Flash
30/01/2025
1000000
$0.15
$0.60
N/A
76.40%
62.10%
N/A
89.70%
N/A
Gemini Ultra
24/09/2024
32000
N/A
N/A
No
83.70%
35.70%
N/A
53.20%
N/A
Grok-2
13/08/2024
128000
$5
$15
N/A
87.50%
56%
88.40%
76.10%
N/A
Grok-2 mini
14/08/2024
128000
$2
$10
N/A
86.20%
51%
85.70%
73%
N/A
Llama 3.1 405b
23/07/2024
128000
$1.79
$1.79
80.40%
88.60%
51.10%
89%
73.80%
88.50%
Llama 3.1 70b
23/07/2024
128000
$0.23
$0.40
75.50%
86%
46.70%
80.50%
68%
84.80%
Llama 3.1 8b
23/07/2024
128000
$0.09
$0.09
62.50%
73%
32.80%
72.60%
51.90%
76.10%
Llama 3.3 70b
23/07/2024
128000
$0.23
$0.40
74.50%
86%
48%
88.40%
77%
77.50%
Mistral Large
26/02/2024
32000
$8
$24
N/A
81.20%
N/A
N/A
N/A
N/A
Mistral Medium
09/12/2023
32000
$2.70
$8.10
N/A
75.30%
N/A
N/A
N/A
N/A
Mistral Small
17/09/2024
16000
$2
$6
N/A
70.6%
N/A
N/A
N/A
N/A
OpenAI o1
05/12/2024
128000
$15
$60
85.39%
91.80%
75.70%
92.40%
96.40%
66.73%
OpenAI o1-mini
12/09/2024
64000
$1.10
$4.40
80.07%
85.20%
60%
92.40%
90%
62.89%
OpenAI o3-mini
31/01/2025
128000
$1.10
$4.40
N/A
86.90%
79.70%
N/A
97.90%
N.A
Qwen2.5-70b
19/09/2024
128000
$0.90
$1.20
N/A
N/A
N/A
88%
N/A
N/A
Qwen2.5-72b
19/09/2024
131000
$0.40
$0.75
No
86.1%
45.9%
59.1%
62.1%
61.31%
Model
Release Date
Context Window
Input Cost / 1M tokens
Output Cost / 1M tokens
Average
MMLU (General)
GPQA (Reasoning)
HumanEval (Coding)
Math
BFCL (Tool Use)
AWS Nova Lite
03/12/2024
300000
$0
$0
N/A
80.50%
42%
85.40%
73.30%
66.60%
AWS Nova Micro
03/12/2024
300000
$0
$0
N/A
77.60%
40%
81.10%
69.30%
56.20%
AWS Nova Pro
03/12/2024
300000
$0
$0
N/A
85.90%
46.90%
89%
76.60%
68.40%
Claude 3 Haiku
13/03/2024
200000
$0.25
$1.25
62.90%
75.20%
35.70%
75.90%
38.90%
74.65%
Claude 3 Opus
14/3/2024
200000
$15
$75
76.70%
85.70%
50.40%
84.90%
60.10%
88.40%
Claude 3.5 Haiku
22/10/2024
200000
$0.80
$4
68.30%
65%
41.60%
88.10%
69.40%
60%
Claude 3.7
Sonnet
24/02/2025
200000
$3
$15
N/A
83.20%
68%
N/A
82.20%
N/A
Claude 3 Sonnet
(Reasoner)
20/06/2024
200000
$3
$15
N/A
N/A
N/A
N/A
N/A
N/A
DeepSeek R1
20/01/2025
128000
$0.55
$2.19
N/A
90.8%
71.5%
N/A
97.3%
N/A
DeepSeek V3
26/12/2024
128000
$0.27
$1.10
76.24%
88.50%
59.10%
82.60%
90.20%
57.23%
GPT-4.5
27/02/2025
128000
$25
$150
N/A
89.60%
71.4%
76%
36.7%
N/A
GPT-3.5 Turbo
30/11/2022
16000
$0.50
$1.50
59.20%
69.80%
30.80%
68%
34.10%
64.41%
GPT-4
14/03/2023
8000
$30
$60
75.50%
86.40%
41.40%
86.60%
64.50%
88.30%
GPT-4o
13/05/2024
128000
$5
$15
80.50%
88.70%
53.60%
90.20%
76.60%
83.59%
GPT-4o mini
18/07/2024
128000
$0.15
$0.60
N/A
82%
40.20%
87.20%
70.20%
N/A
Gemini 1.5 Flash
14/05/2024
1000000
$0.35
$0.70
66.70%
78.90%
39.50%
71.50%
54.90%
79.88%
Gemini 1.5 Pro
24/09/2024
128000
$7
$21
74.10%
85.90%
46.20%
71.90%
67..70%
84.35%
Gemini 2.0 Flash
30/01/2025
1000000
$0.15
$0.60
N/A
76.40%
62.10%
N/A
89.70%
N/A
Gemini Ultra
24/09/2024
32000
N/A
N/A
No
83.70%
35.70%
N/A
53.20%
N/A
Grok-2
13/08/2024
128000
$5
$15
N/A
87.50%
56%
88.40%
76.10%
N/A
Grok-2 mini
14/08/2024
128000
$2
$10
N/A
86.20%
51%
85.70%
73%
N/A
Llama 3.1 405b
23/07/2024
128000
$1.79
$1.79
80.40%
88.60%
51.10%
89%
73.80%
88.50%
Llama 3.1 70b
23/07/2024
128000
$0.23
$0.40
75.50%
86%
46.70%
80.50%
68%
84.80%
Llama 3.1 8b
23/07/2024
128000
$0.09
$0.09
62.50%
73%
32.80%
72.60%
51.90%
76.10%
Llama 3.3 70b
23/07/2024
128000
$0.23
$0.40
74.50%
86%
48%
88.40%
77%
77.50%
Mistral Large
26/02/2024
32000
$8
$24
N/A
81.20%
N/A
N/A
N/A
N/A
Mistral Medium
09/12/2023
32000
$2.70
$8.10
N/A
75.30%
N/A
N/A
N/A
N/A
Mistral Small
17/09/2024
16000
$2
$6
N/A
70.6%
N/A
N/A
N/A
N/A
OpenAI o1
05/12/2024
128000
$15
$60
85.39%
91.80%
75.70%
92.40%
96.40%
66.73%
OpenAI o1-mini
12/09/2024
64000
$1.10
$4.40
80.07%
85.20%
60%
92.40%
90%
62.89%
OpenAI o3-mini
31/01/2025
128000
$1.10
$4.40
N/A
86.90%
79.70%
N/A
97.90%
N.A
Qwen2.5-70b
19/09/2024
128000
$0.90
$1.20
N/A
N/A
N/A
88%
N/A
N/A
Qwen2.5-72b
19/09/2024
131000
$0.40
$0.75
No
86.1%
45.9%
59.1%
62.1%
61.31%
Model
Release Date
Context Window
Input Cost / 1M tokens
Output Cost / 1M tokens
Average
MMLU (General)
GPQA (Reasoning)
HumanEval (Coding)
Math
BFCL (Tool Use)
AWS Nova Lite
03/12/2024
300000
$0
$0
N/A
80.50%
42%
85.40%
73.30%
66.60%
AWS Nova Micro
03/12/2024
300000
$0
$0
N/A
77.60%
40%
81.10%
69.30%
56.20%
AWS Nova Pro
03/12/2024
300000
$0
$0
N/A
85.90%
46.90%
89%
76.60%
68.40%
Claude 3 Haiku
13/03/2024
200000
$0.25
$1.25
62.90%
75.20%
35.70%
75.90%
38.90%
74.65%
Claude 3 Opus
14/3/2024
200000
$15
$75
76.70%
85.70%
50.40%
84.90%
60.10%
88.40%
Claude 3.5 Haiku
22/10/2024
200000
$0.80
$4
68.30%
65%
41.60%
88.10%
69.40%
60%
Claude 3.7
Sonnet
24/02/2025
200000
$3
$15
N/A
83.20%
68%
N/A
82.20%
N/A
Claude 3 Sonnet
(Reasoner)
20/06/2024
200000
$3
$15
N/A
N/A
N/A
N/A
N/A
N/A
DeepSeek R1
20/01/2025
128000
$0.55
$2.19
N/A
90.8%
71.5%
N/A
97.3%
N/A
DeepSeek V3
26/12/2024
128000
$0.27
$1.10
76.24%
88.50%
59.10%
82.60%
90.20%
57.23%
GPT-4.5
27/02/2025
128000
$25
$150
N/A
89.60%
71.4%
76%
36.7%
N/A
GPT-3.5 Turbo
30/11/2022
16000
$0.50
$1.50
59.20%
69.80%
30.80%
68%
34.10%
64.41%
GPT-4
14/03/2023
8000
$30
$60
75.50%
86.40%
41.40%
86.60%
64.50%
88.30%
GPT-4o
13/05/2024
128000
$5
$15
80.50%
88.70%
53.60%
90.20%
76.60%
83.59%
GPT-4o mini
18/07/2024
128000
$0.15
$0.60
N/A
82%
40.20%
87.20%
70.20%
N/A
Gemini 1.5 Flash
14/05/2024
1000000
$0.35
$0.70
66.70%
78.90%
39.50%
71.50%
54.90%
79.88%
Gemini 1.5 Pro
24/09/2024
128000
$7
$21
74.10%
85.90%
46.20%
71.90%
67..70%
84.35%
Gemini 2.0 Flash
30/01/2025
1000000
$0.15
$0.60
N/A
76.40%
62.10%
N/A
89.70%
N/A
Gemini Ultra
24/09/2024
32000
N/A
N/A
No
83.70%
35.70%
N/A
53.20%
N/A
Grok-2
13/08/2024
128000
$5
$15
N/A
87.50%
56%
88.40%
76.10%
N/A
Grok-2 mini
14/08/2024
128000
$2
$10
N/A
86.20%
51%
85.70%
73%
N/A
Llama 3.1 405b
23/07/2024
128000
$1.79
$1.79
80.40%
88.60%
51.10%
89%
73.80%
88.50%
Llama 3.1 70b
23/07/2024
128000
$0.23
$0.40
75.50%
86%
46.70%
80.50%
68%
84.80%
Llama 3.1 8b
23/07/2024
128000
$0.09
$0.09
62.50%
73%
32.80%
72.60%
51.90%
76.10%
Llama 3.3 70b
23/07/2024
128000
$0.23
$0.40
74.50%
86%
48%
88.40%
77%
77.50%
Mistral Large
26/02/2024
32000
$8
$24
N/A
81.20%
N/A
N/A
N/A
N/A
Mistral Medium
09/12/2023
32000
$2.70
$8.10
N/A
75.30%
N/A
N/A
N/A
N/A
Mistral Small
17/09/2024
16000
$2
$6
N/A
70.6%
N/A
N/A
N/A
N/A
OpenAI o1
05/12/2024
128000
$15
$60
85.39%
91.80%
75.70%
92.40%
96.40%
66.73%
OpenAI o1-mini
12/09/2024
64000
$1.10
$4.40
80.07%
85.20%
60%
92.40%
90%
62.89%
OpenAI o3-mini
31/01/2025
128000
$1.10
$4.40
N/A
86.90%
79.70%
N/A
97.90%
N.A
Qwen2.5-70b
19/09/2024
128000
$0.90
$1.20
N/A
N/A
N/A
88%
N/A
N/A
Qwen2.5-72b
19/09/2024
131000
$0.40
$0.75
No
86.1%
45.9%
59.1%
62.1%
61.31%
Sources
Frequently Asked Questions
Frequently Asked Questions
Frequently Asked Questions
What is the LLM Leaderboard, and how does it work?
What is the LLM Leaderboard, and how does it work?
What is the LLM Leaderboard, and how does it work?
What benchmarks are used to evaluate models on the LLM Leaderboard?
What benchmarks are used to evaluate models on the LLM Leaderboard?
What benchmarks are used to evaluate models on the LLM Leaderboard?
How does the LLM Leaderboard rank models based on speed, cost, and latency?
How does the LLM Leaderboard rank models based on speed, cost, and latency?
How does the LLM Leaderboard rank models based on speed, cost, and latency?
What details are included in the model comparison table?
What details are included in the model comparison table?
What details are included in the model comparison table?
How can I use the LLM Leaderboard to choose the best model for my use case?
How can I use the LLM Leaderboard to choose the best model for my use case?
How can I use the LLM Leaderboard to choose the best model for my use case?
Start building AI apps with Orq.ai
Take a 7-day free trial. Start building AI products with Orq.ai today.
Start building AI apps with Orq.ai
Take a 7-day free trial. Start building AI products with Orq.ai today.
Start building AI apps with Orq.ai
Take a 7-day free trial. Start building AI products with Orq.ai today.