Platform

Solutions

Resources

Company

LLM Leaderboard

Find the best LLMs

Find the best LLMs

Find the best LLMs

for your use case

for your use case

for your use case

Compare LLMs against standard benchmarks.
Choose the best AI model for your GenAI-powered apps.

Best LLMs Per Task

Best LLMs Per Task

Best LLMs Per Task

Discover the best LLMs for common tasks like multilingual Q&A, multi-task reasoning, and math problem-solving.

Reasoning (GPQA Diamond³)

84.8%

Claude 3.7 Sonnet (64K extended thinking)

84.6%

Grok 3 Beta (Extended thinking)

79.7%

OpenAI o3-mini¹

78%

OpenAI o1¹

71.5%

DeepSeek R1 (32K extended thinking)

Multilingual Q&A (MMMLU)

87.7%

OpenAI o1¹

86.1%

Claude 3.7 Sonnet (64K extended thinking)

82.1%

Claude 3.5 Sonnet

83.2%

Claude 3.7 Sonnet (No extended thinking)

79.5%

OpenAI o3-mini¹

Math Problem-Solving (MATH 500)

97.9%

OpenAI o3-mini¹

97.3%

DeepSeek R1 (32K extended thinking)

96.4%

OpenAI o1¹

96.2%

Claude 3.7 Sonnet (64K extended thinking)

82.2%

Claude 3.7 Sonnet (No extended thinking)

Fastest Models

2100

Llama 70b

723

Llama 8b

237

O-1 mini

150

Nova lite

11

GPT-4.0 mini

tokens/second

Lowest Latency

(TTFT)

0.3s

Nova Micro

0.3s

1.5 Flash

0.3s

Llama 8b

0.4s

Nova Pro

0.4s

Nova Lite

Cheapest Models

Input Cost

Output Cost

$0.6

$0.45

$0.3

$0.15

$0

Nova Micro

1.5 Flash

LIama 70b

GPT-4o mini

Model Comparison

Model

Release Date

Context Window

Input Cost / 1M tokens

Output Cost / 1M tokens

Average

MMLU (General)

GPQA (Reasoning)

HumanEval (Coding)

Math

BFCL (Tool Use)

AWS Nova Lite

03/12/2024

300000

$0

$0

N/A

80.50%

42%

85.40%

73.30%

66.60%

AWS Nova Micro

03/12/2024

300000

$0

$0

N/A

77.60%

40%

81.10%

69.30%

56.20%

AWS Nova Pro

03/12/2024

300000

$0

$0

N/A

85.90%

46.90%

89%

76.60%

68.40%

Claude 3 Haiku

13/03/2024

200000

$0.25

$1.25

62.90%

75.20%

35.70%

75.90%

38.90%

74.65%

Claude 3 Opus

14/3/2024

200000

$15

$75

76.70%

85.70%

50.40%

84.90%

60.10%

88.40%

Claude 3.5 Haiku

22/10/2024

200000

$0.80

$4

68.30%

65%

41.60%

88.10%

69.40%

60%

Claude 3.7
Sonnet

24/02/2025

200000

$3

$15

N/A

83.20%

68%

N/A

82.20%

N/A

Claude 3 Sonnet
(Reasoner)

20/06/2024

200000

$3

$15

N/A

N/A

N/A

N/A

N/A

N/A

DeepSeek R1

20/01/2025

128000

$0.55

$2.19

N/A

90.8%

71.5%

N/A

97.3%

N/A

DeepSeek V3

26/12/2024

128000

$0.27

$1.10

76.24%

88.50%

59.10%

82.60%

90.20%

57.23%

GPT-4.5

27/02/2025

128000

$25

$150

N/A

89.60%

71.4%

76%

36.7%

N/A

GPT-3.5 Turbo

30/11/2022

16000

$0.50

$1.50

59.20%

69.80%

30.80%

68%

34.10%

64.41%

GPT-4

14/03/2023

8000

$30

$60

75.50%

86.40%

41.40%

86.60%

64.50%

88.30%

GPT-4o

13/05/2024

128000

$5

$15

80.50%

88.70%

53.60%

90.20%

76.60%

83.59%

GPT-4o mini

18/07/2024

128000

$0.15

$0.60

N/A

82%

40.20%

87.20%

70.20%

N/A

Gemini 1.5 Flash

14/05/2024

1000000

$0.35

$0.70

66.70%

78.90%

39.50%

71.50%

54.90%

79.88%

Gemini 1.5 Pro

24/09/2024

128000

$7

$21

74.10%

85.90%

46.20%

71.90%

67..70%

84.35%

Gemini 2.0 Flash

30/01/2025

1000000

$0.15

$0.60

N/A

76.40%

62.10%

N/A

89.70%

N/A

Gemini Ultra

24/09/2024

32000

N/A

N/A

No

83.70%

35.70%

N/A

53.20%

N/A

Grok-2

13/08/2024

128000

$5

$15

N/A

87.50%

56%

88.40%

76.10%

N/A

Grok-2 mini

14/08/2024

128000

$2

$10

N/A

86.20%

51%

85.70%

73%

N/A

Llama 3.1 405b

23/07/2024

128000

$1.79

$1.79

80.40%

88.60%

51.10%

89%

73.80%

88.50%

Llama 3.1 70b

23/07/2024

128000

$0.23

$0.40

75.50%

86%

46.70%

80.50%

68%

84.80%

Llama 3.1 8b

23/07/2024

128000

$0.09

$0.09

62.50%

73%

32.80%

72.60%

51.90%

76.10%

Llama 3.3 70b

23/07/2024

128000

$0.23

$0.40

74.50%

86%

48%

88.40%

77%

77.50%

Mistral Large

26/02/2024

32000

$8

$24

N/A

81.20%

N/A

N/A

N/A

N/A

Mistral Medium

09/12/2023

32000

$2.70

$8.10

N/A

75.30%

N/A

N/A

N/A

N/A

Mistral Small

17/09/2024

16000

$2

$6

N/A

70.6%

N/A

N/A

N/A

N/A

OpenAI o1

05/12/2024

128000

$15

$60

85.39%

91.80%

75.70%

92.40%

96.40%

66.73%

OpenAI o1-mini

12/09/2024

64000

$1.10

$4.40

80.07%

85.20%

60%

92.40%

90%

62.89%

OpenAI o3-mini

31/01/2025

128000

$1.10

$4.40

N/A

86.90%

79.70%

N/A

97.90%

N.A

Qwen2.5-70b

19/09/2024

128000

$0.90

$1.20

N/A

N/A

N/A

88%

N/A

N/A

Qwen2.5-72b

19/09/2024

131000

$0.40

$0.75

No

86.1%

45.9%

59.1%

62.1%

61.31%

Model

Release Date

Context Window

Input Cost / 1M tokens

Output Cost / 1M tokens

Average

MMLU (General)

GPQA (Reasoning)

HumanEval (Coding)

Math

BFCL (Tool Use)

AWS Nova Lite

03/12/2024

300000

$0

$0

N/A

80.50%

42%

85.40%

73.30%

66.60%

AWS Nova Micro

03/12/2024

300000

$0

$0

N/A

77.60%

40%

81.10%

69.30%

56.20%

AWS Nova Pro

03/12/2024

300000

$0

$0

N/A

85.90%

46.90%

89%

76.60%

68.40%

Claude 3 Haiku

13/03/2024

200000

$0.25

$1.25

62.90%

75.20%

35.70%

75.90%

38.90%

74.65%

Claude 3 Opus

14/3/2024

200000

$15

$75

76.70%

85.70%

50.40%

84.90%

60.10%

88.40%

Claude 3.5 Haiku

22/10/2024

200000

$0.80

$4

68.30%

65%

41.60%

88.10%

69.40%

60%

Claude 3.7
Sonnet

24/02/2025

200000

$3

$15

N/A

83.20%

68%

N/A

82.20%

N/A

Claude 3 Sonnet
(Reasoner)

20/06/2024

200000

$3

$15

N/A

N/A

N/A

N/A

N/A

N/A

DeepSeek R1

20/01/2025

128000

$0.55

$2.19

N/A

90.8%

71.5%

N/A

97.3%

N/A

DeepSeek V3

26/12/2024

128000

$0.27

$1.10

76.24%

88.50%

59.10%

82.60%

90.20%

57.23%

GPT-4.5

27/02/2025

128000

$25

$150

N/A

89.60%

71.4%

76%

36.7%

N/A

GPT-3.5 Turbo

30/11/2022

16000

$0.50

$1.50

59.20%

69.80%

30.80%

68%

34.10%

64.41%

GPT-4

14/03/2023

8000

$30

$60

75.50%

86.40%

41.40%

86.60%

64.50%

88.30%

GPT-4o

13/05/2024

128000

$5

$15

80.50%

88.70%

53.60%

90.20%

76.60%

83.59%

GPT-4o mini

18/07/2024

128000

$0.15

$0.60

N/A

82%

40.20%

87.20%

70.20%

N/A

Gemini 1.5 Flash

14/05/2024

1000000

$0.35

$0.70

66.70%

78.90%

39.50%

71.50%

54.90%

79.88%

Gemini 1.5 Pro

24/09/2024

128000

$7

$21

74.10%

85.90%

46.20%

71.90%

67..70%

84.35%

Gemini 2.0 Flash

30/01/2025

1000000

$0.15

$0.60

N/A

76.40%

62.10%

N/A

89.70%

N/A

Gemini Ultra

24/09/2024

32000

N/A

N/A

No

83.70%

35.70%

N/A

53.20%

N/A

Grok-2

13/08/2024

128000

$5

$15

N/A

87.50%

56%

88.40%

76.10%

N/A

Grok-2 mini

14/08/2024

128000

$2

$10

N/A

86.20%

51%

85.70%

73%

N/A

Llama 3.1 405b

23/07/2024

128000

$1.79

$1.79

80.40%

88.60%

51.10%

89%

73.80%

88.50%

Llama 3.1 70b

23/07/2024

128000

$0.23

$0.40

75.50%

86%

46.70%

80.50%

68%

84.80%

Llama 3.1 8b

23/07/2024

128000

$0.09

$0.09

62.50%

73%

32.80%

72.60%

51.90%

76.10%

Llama 3.3 70b

23/07/2024

128000

$0.23

$0.40

74.50%

86%

48%

88.40%

77%

77.50%

Mistral Large

26/02/2024

32000

$8

$24

N/A

81.20%

N/A

N/A

N/A

N/A

Mistral Medium

09/12/2023

32000

$2.70

$8.10

N/A

75.30%

N/A

N/A

N/A

N/A

Mistral Small

17/09/2024

16000

$2

$6

N/A

70.6%

N/A

N/A

N/A

N/A

OpenAI o1

05/12/2024

128000

$15

$60

85.39%

91.80%

75.70%

92.40%

96.40%

66.73%

OpenAI o1-mini

12/09/2024

64000

$1.10

$4.40

80.07%

85.20%

60%

92.40%

90%

62.89%

OpenAI o3-mini

31/01/2025

128000

$1.10

$4.40

N/A

86.90%

79.70%

N/A

97.90%

N.A

Qwen2.5-70b

19/09/2024

128000

$0.90

$1.20

N/A

N/A

N/A

88%

N/A

N/A

Qwen2.5-72b

19/09/2024

131000

$0.40

$0.75

No

86.1%

45.9%

59.1%

62.1%

61.31%

Model

Release Date

Context Window

Input Cost / 1M tokens

Output Cost / 1M tokens

Average

MMLU (General)

GPQA (Reasoning)

HumanEval (Coding)

Math

BFCL (Tool Use)

AWS Nova Lite

03/12/2024

300000

$0

$0

N/A

80.50%

42%

85.40%

73.30%

66.60%

AWS Nova Micro

03/12/2024

300000

$0

$0

N/A

77.60%

40%

81.10%

69.30%

56.20%

AWS Nova Pro

03/12/2024

300000

$0

$0

N/A

85.90%

46.90%

89%

76.60%

68.40%

Claude 3 Haiku

13/03/2024

200000

$0.25

$1.25

62.90%

75.20%

35.70%

75.90%

38.90%

74.65%

Claude 3 Opus

14/3/2024

200000

$15

$75

76.70%

85.70%

50.40%

84.90%

60.10%

88.40%

Claude 3.5 Haiku

22/10/2024

200000

$0.80

$4

68.30%

65%

41.60%

88.10%

69.40%

60%

Claude 3.7
Sonnet

24/02/2025

200000

$3

$15

N/A

83.20%

68%

N/A

82.20%

N/A

Claude 3 Sonnet
(Reasoner)

20/06/2024

200000

$3

$15

N/A

N/A

N/A

N/A

N/A

N/A

DeepSeek R1

20/01/2025

128000

$0.55

$2.19

N/A

90.8%

71.5%

N/A

97.3%

N/A

DeepSeek V3

26/12/2024

128000

$0.27

$1.10

76.24%

88.50%

59.10%

82.60%

90.20%

57.23%

GPT-4.5

27/02/2025

128000

$25

$150

N/A

89.60%

71.4%

76%

36.7%

N/A

GPT-3.5 Turbo

30/11/2022

16000

$0.50

$1.50

59.20%

69.80%

30.80%

68%

34.10%

64.41%

GPT-4

14/03/2023

8000

$30

$60

75.50%

86.40%

41.40%

86.60%

64.50%

88.30%

GPT-4o

13/05/2024

128000

$5

$15

80.50%

88.70%

53.60%

90.20%

76.60%

83.59%

GPT-4o mini

18/07/2024

128000

$0.15

$0.60

N/A

82%

40.20%

87.20%

70.20%

N/A

Gemini 1.5 Flash

14/05/2024

1000000

$0.35

$0.70

66.70%

78.90%

39.50%

71.50%

54.90%

79.88%

Gemini 1.5 Pro

24/09/2024

128000

$7

$21

74.10%

85.90%

46.20%

71.90%

67..70%

84.35%

Gemini 2.0 Flash

30/01/2025

1000000

$0.15

$0.60

N/A

76.40%

62.10%

N/A

89.70%

N/A

Gemini Ultra

24/09/2024

32000

N/A

N/A

No

83.70%

35.70%

N/A

53.20%

N/A

Grok-2

13/08/2024

128000

$5

$15

N/A

87.50%

56%

88.40%

76.10%

N/A

Grok-2 mini

14/08/2024

128000

$2

$10

N/A

86.20%

51%

85.70%

73%

N/A

Llama 3.1 405b

23/07/2024

128000

$1.79

$1.79

80.40%

88.60%

51.10%

89%

73.80%

88.50%

Llama 3.1 70b

23/07/2024

128000

$0.23

$0.40

75.50%

86%

46.70%

80.50%

68%

84.80%

Llama 3.1 8b

23/07/2024

128000

$0.09

$0.09

62.50%

73%

32.80%

72.60%

51.90%

76.10%

Llama 3.3 70b

23/07/2024

128000

$0.23

$0.40

74.50%

86%

48%

88.40%

77%

77.50%

Mistral Large

26/02/2024

32000

$8

$24

N/A

81.20%

N/A

N/A

N/A

N/A

Mistral Medium

09/12/2023

32000

$2.70

$8.10

N/A

75.30%

N/A

N/A

N/A

N/A

Mistral Small

17/09/2024

16000

$2

$6

N/A

70.6%

N/A

N/A

N/A

N/A

OpenAI o1

05/12/2024

128000

$15

$60

85.39%

91.80%

75.70%

92.40%

96.40%

66.73%

OpenAI o1-mini

12/09/2024

64000

$1.10

$4.40

80.07%

85.20%

60%

92.40%

90%

62.89%

OpenAI o3-mini

31/01/2025

128000

$1.10

$4.40

N/A

86.90%

79.70%

N/A

97.90%

N.A

Qwen2.5-70b

19/09/2024

128000

$0.90

$1.20

N/A

N/A

N/A

88%

N/A

N/A

Qwen2.5-72b

19/09/2024

131000

$0.40

$0.75

No

86.1%

45.9%

59.1%

62.1%

61.31%

Frequently Asked Questions

Frequently Asked Questions

Frequently Asked Questions

What is the LLM Leaderboard, and how does it work?
What is the LLM Leaderboard, and how does it work?
What is the LLM Leaderboard, and how does it work?
What benchmarks are used to evaluate models on the LLM Leaderboard?
What benchmarks are used to evaluate models on the LLM Leaderboard?
What benchmarks are used to evaluate models on the LLM Leaderboard?
How does the LLM Leaderboard rank models based on speed, cost, and latency?
How does the LLM Leaderboard rank models based on speed, cost, and latency?
How does the LLM Leaderboard rank models based on speed, cost, and latency?
What details are included in the model comparison table?
What details are included in the model comparison table?
What details are included in the model comparison table?
How can I use the LLM Leaderboard to choose the best model for my use case?
How can I use the LLM Leaderboard to choose the best model for my use case?
How can I use the LLM Leaderboard to choose the best model for my use case?

Start building AI apps with Orq.ai

Take a 7-day free trial. Start building AI products with Orq.ai today.

Start building AI apps with Orq.ai

Take a 7-day free trial. Start building AI products with Orq.ai today.

Start building AI apps with Orq.ai

Take a 7-day free trial. Start building AI products with Orq.ai today.