GPT, Claude, Llama? How to tell which AI model is best
Beware model-makers marking their own homework
When Meta, the parent company of Facebook, announced its latest open-source large language model (LLM) on July 23rd, it claimed that the most powerful version of Llama 3.1 had “state-of-the-art capabilities that rival the best closed-source models” such as GPT-4o and Claude 3.5 Sonnet. Meta’s announcement included a table, showing the scores achieved by these and other models on a series of popular benchmarks with names such as MMLU, GSM8K and GPQA.
Already have an account?Log in
Continue with a free trial
Explore all our independent journalism for free for one month. Cancel any time
Get startedExplore more
More from Science & technology
How to reduce the risk of developing dementia
A healthy lifestyle can prevent or delay almost half of cases
How America built an AI tool to predict Taliban attacks
“Raven Sentry” was a successful experiment in open-source intelligence
Gene-editing drugs are moving from lab to clinic at lightning speed
The promising treatments still face technical and economic hurdles, though
How Ukraine’s new tech foils Russian aerial attacks
It is pioneering acoustic detection, with surprising success
The deep sea is home to “dark oxygen”
Nodules on the seabed, rather than photosynthesis, are the source of the gas
Augmented reality offers a safer driving experience
Complete with holograms on the windscreen