⬤ A chart gaining traction online shows AI model performance and efficiency advancing rapidly in tandem. AI systems are now exceeding human PhD-level scores on complex scientific benchmarks while becoming dramatically cheaper to run per million tokens. The log-scale visualization tracks several models entering and surpassing PhD-level performance on the GPQA Diamond metric as running costs have fallen consistently from early 2023 through late 2025.
⬤ The chart identifies three distinct progress frontiers: capability, balanced performance-to-cost, and low-cost performance. All three show rising capability paired with declining costs, with no plateauing in sight. Models are increasingly handling advanced reasoning tasks while operational costs per million tokens continue dropping significantly.
⬤ The GPQA Diamond benchmark, previously considered a strong test of advanced reasoning, may be approaching its ceiling as more models reach top scores. This suggests new testing methods will be needed to measure continued progress. A follow-up observation from Ethan Mollick reinforces that capability gains show no signs of slowing despite plummeting operational costs.
⬤ This simultaneous improvement in performance and affordability means advanced AI capabilities are becoming accessible across more use cases. If current development continues at this pace, powerful AI tools will likely become ubiquitous, fundamentally changing how technology is applied in research, business, and daily life.
Victoria Bazir
Victoria Bazir