OpenAI is reportedly struggling to improve its next big AI model. It's a warning for the entire AI industry.
- OpenAI's next model is showing a slower rate of improvement, according to The Information.
- It's prompted a Silicon Valley debate about whether AI models are hitting a performance plateau.
- The AI boom has moved at pace because new releases have wowed users with huge leaps in performance.
OpenAI's next flagship artificial intelligence model is showing smaller improvements compared to previous iterations, according to The Information, in a sign that the booming generative AI industry may be approaching a plateau.
The ChatGPT maker's next model, Orion, only showed a moderate improvement over GPT-4, according to some employees who have used or tested it, The Information reported. The leap in Orion has been smaller than that made from GPT-3 to GPT-4, especially in coding tasks, the report added.
It reignites a debate about the feasibility of developing increasingly advanced models and AI scaling laws — the theoretical rules about how the models improve.
OpenAI CEO Sam Altman posted on X in February that "scaling laws are decided by god; the constants are determined by members of the technical staff."
The "laws" Altman cited suggest AI models become smarter as they size up and get access to more data and computing power.
Altman may still subscribe to the view that a preordained formula decides how much smarter AI can get, but The Information's report shows technical staff are questioning those laws amid a fierce debate in Silicon Valley over growing evidence that leading models are hitting a performance wall.
OpenAI did not immediately respond to a Business Insider request for comment.
Have scaling laws hit a dead-end?
While Orion's training has not yet been completed, OpenAI has nonetheless reverted to additional measures to boost performance, such as baking in post-training improvements based on human feedback, The Information said.
The model, first unveiled a year ago, could still see dramatic improvements ahead of its release. But it's a sign that future generations of AI models that have helped companies raise billions of dollars and command lofty valuations may look less impressive with each new iteration.
There are two main reasons this could happen.
Data, one vital element of the scaling law equation, has been harder to come by as companies have quickly exhausted available data online.
They have scraped vast amounts of human-created data — including text, videos, research papers, and novels — to train the models behind their AI tools and features, but the supply is limited. Research firm Epoch AI predicted in June that firms could exhaust usable textual data by 2028. Companies are trying to overcome constraints by turning to synthetic data generated by AI itself, but that, too, comes with problems.
"For general-knowledge questions, you could argue that for now we are seeing a plateau in the performance of LLMs," Ion Stoica, a co-founder and chair of enterprise software firm Databricks, told The Information, adding that "factual data" is more useful than synthetic data.
Computing power, the other factor that has historically boosted AI performance, is also not limitless. In a Reddit AMA last month, Altman acknowledged that his company faces "a lot of limitations and hard decisions" about allocating its computing resources.
It's no wonder that some industry experts have been starting to note that new AI models released this year, as well as future ones, show evidence of producing smaller leaps in performance than their predecessors.
'Diminishing returns'
Gary Marcus, a New York University professor emeritus and outspoken critic of the current AI hype, argues AI development is destined to hit a wall. He has been vocal about it showing signs of "diminishing returns" and reacted to The Information's reporting with a Substack post headlined, "CONFIRMED: LLMs have indeed reached a point of diminishing returns."
When OpenAI rival Anthropic released its Claude 3.5 model in June, Marcus dismissed an X post showing Claude 3.5's performance with marginal improvements over competitors in areas like graduate-level reasoning, code, and multilingual math. He said it was in the "same ballpark as many others."
The AI market has spent billions of dollars trying to upend the competition, only for it to deliver evidence for "convergence, rather than continued exponential growth," Marcus said.
Ilya Sutskever, a cofounder of OpenAI and now Safe Superintelligence, has suggested a similar notion. On Monday and following The Information's report, he told Reuters that results from scaling up pre-training had plateaued, adding, "Scaling the right thing matters more now than ever."
The AI industry will keep looking for ways to spark huge jumps in performance. Anthropic CEO Dario Amodei has predicted that AI model training runs will enter a new era next year, in which they could cost $100 billion. Altman has previously said it cost more than $100 million to train ChatGPT-4. It remains to be seen how smart an AI model could get when it has that much capital thrown at it.
Scaling optimism
Other Silicon Valley leaders, including Altman, are still publicly optimistic about AI's current scaling potential. In July, Microsoft chief technology officer Kevin Scott dismissed concerns that AI progress had plateaued. "Despite what other people think, we're not at diminishing marginal returns on scale-up," Scott said during an interview with Sequoia Capital's Training Data podcast.
There could also be strategies to make AI models smarter by enhancing the inference portion of development. Inference is the work done to refine AI outputs once they've been trained, using data it hasn't seen before.
The model OpenAI released in September — called OpenAI o1 — focused more on inference improvements. It managed to outperform its predecessors in complex tasks, achieving a level of intelligence similar to Ph.D. students on benchmark tasks in physics, chemistry, and biology, according to OpenAI.
Still, it's clear that, like Altman, much of the industry remains firm in its conviction that scaling laws are the driver of AI performance. If future models underwhelm, expect a reassessment of the current boom.