Science & technology | On your marks

GPT, Claude, Llama? How to tell which AI model is best

Beware model-makers marking their own homework 

An illustration showing a lineup of robots of different shapes and sizes.
Illustration: George Wylesol

When Meta, the parent company of Facebook, announced its latest open-source large language model (LLM) on July 23rd, it claimed that the most powerful version of Llama 3.1 had “state-of-the-art capabilities that rival the best closed-source models” such as GPT-4o and Claude 3.5 Sonnet. Meta’s announcement included a table, showing the scores achieved by these and other models on a series of popular benchmarks with names such as MMLU, GSM8K and GPQA.

The Economist today

Handpicked stories, in your inbox

A daily newsletter with the best of our journalism

More from Science & technology

How to reduce the risk of developing dementia

A healthy lifestyle can prevent or delay almost half of cases

How America built an AI tool to predict Taliban attacks

“Raven Sentry” was a successful experiment in open-source intelligence


Gene-editing drugs are moving from lab to clinic at lightning speed

The promising treatments still face technical and economic hurdles, though


How Ukraine’s new tech foils Russian aerial attacks

It is pioneering acoustic detection, with surprising success

The deep sea is home to “dark oxygen”

Nodules on the seabed, rather than photosynthesis, are the source of the gas

Augmented reality offers a safer driving experience

Complete with holograms on the windscreen