China's DeepSeek faces questions over claims after rattling US tech market
Tech leaders cast doubts on the Chinese startup’s claimed budget and chip use
After shaking up the AI scene with a model that rivals the creations of Google and OpenAI, China's DeepSeek is now facing scrutiny over the validity of its bold claims.
The Hangzhou-based startup recently announced that it developed its R1 model at a fraction of the cost of Silicon Valley's latest AI advancements, prompting questions about the US dominance in the field and the inflated valuations of its top tech firms, reports Al Jazeera.
Some critics, however, are skeptical about DeepSeek's story of operating on a shoestring budget, suggesting that the company may have had access to more advanced chips and greater funding than it has publicly acknowledged.
"It's still very much an open question whether DeepSeek's claims can be fully trusted. The AI community will be digging into them, and we'll find out soon enough," said Pedro Domingos, professor emeritus of computer science at the University of Washington, told Al Jazeera.
"I think it's plausible they could train a model with $6 million," Domingos added. "But it's also possible that this figure only covers fine-tuning and post-processing, with the actual model's development relying on more expensive models built by others."
In a research paper released last week, the DeepSeek team revealed they used 2,000 Nvidia H800 GPUs – a less advanced chip designed to comply with US export controls – and spent $5.6 million to train R1's foundational model, V3.
By contrast, OpenAI's CEO Sam Altman has stated that it cost over $100 million to train GPT-4, with analysts estimating that it used as many as 25,000 H100 GPUs, which are more powerful than the H800.
DeepSeek's announcement, made by serial entrepreneur Liang Wenfeng in late 2023, has shaken up the belief that leading AI companies must invest billions in data centers and expensive high-end chips. The startup's claims have also raised doubts about the effectiveness of Washington's attempts to constrain China's AI sector by restricting exports of the most advanced chips.
In response, shares of California-based Nvidia—dominant in the GPU market for generative AI—plunged 17% on Monday, wiping nearly $593 billion from its market value, roughly equivalent to Sweden's GDP.
While DeepSeek's release of the R1 model is generally seen as a significant milestone, some prominent figures are urging caution, questioning the veracity of its claims.
Palmer Luckey, founder of Oculus VR, dismissed DeepSeek's claimed budget as "bogus," accusing "useful idiots" of falling for "Chinese propaganda." He further argued that DeepSeek's actions were being pushed by a Chinese hedge fund to undermine US AI investments and hide potential sanction evasions.
"America is a fertile bed for psyops like this because our media hates our tech companies and wants to see President Trump fail," Luckey wrote on X.
Alexandr Wang, CEO of Scale AI, also expressed skepticism in a CNBC interview, suggesting DeepSeek likely had access to 50,000 advanced H100 chips that it could not openly discuss due to US export controls. However, Wang did not provide evidence to support his claim.
Elon Musk, a close ally of former US President Donald Trump, sided with DeepSeek's critics, replying "Obviously" to a post on X discussing Wang's assertions.
DeepSeek did not respond to requests for comment, but Zihan Wang, a PhD candidate who worked on a previous DeepSeek model, defended the company, saying, "Talk is cheap." In a response on X, Wang suggested that critics should focus on reproducing DeepSeek's work instead of simply criticizing it. He did not directly address whether he believed DeepSeek's claims about the $6 million budget or its use of less advanced chips in training R1.
In a 2023 interview with the Chinese media outlet Waves, Liang Wenfeng, the founder of DeepSeek, revealed that his company had stockpiled 10,000 of Nvidia's A100 chips—older than the H800—before the Biden administration imposed an export ban on them.
Users of the R1 model also highlight limitations it faces due to its Chinese origins, such as its censorship of topics deemed sensitive by Beijing, including the 1989 Tiananmen Square massacre and the status of Taiwan.
However, in a sign that early fears about DeepSeek's potential to disrupt the US tech sector are starting to subside, Nvidia's stock price rebounded nearly 9% on Tuesday. Meanwhile, the tech-heavy Nasdaq 100 rose 1.59%, recovering from a 3% drop the previous day.
Tim Miller, an AI professor at the University of Queensland, said it's difficult to gauge how much weight should be placed on DeepSeek's claims. "The model itself reveals a few details, but the costs associated with the changes they claim don't really show up in the model itself," Miller told Al Jazeera.
He noted that while he didn't see any "alarm bells," there are valid arguments both for and against trusting DeepSeek's research paper. "The breakthrough is incredible—almost 'too good to be true.' The breakdown of costs is unclear," Miller said.
Still, he acknowledged that breakthroughs do occasionally happen in computer science. "Given how new these large-scale models are, it makes sense that efficiencies are bound to be found," he said. "They would have known that if they were misleading everyone, it would quickly become obvious, and there's already a team trying to reproduce their work."
Lucas Hansen, co-founder of the nonprofit CivAI, suggested that while it's hard to determine if DeepSeek bypassed US export controls, the startup's claimed training budget pertains to V3, which is similar to OpenAI's GPT-4, rather than the R1 model itself. "GPT-4 finished training in late 2022.
Since then, both hardware and algorithm improvements have reduced the cost of training a GPT-4-level model," Hansen explained. "It's a bit like GPT-2, which was once a serious investment to train, but now you can do it for just $20 in 90 minutes."
Hansen added that DeepSeek likely created R1 by taking a base model like V3 and using innovative techniques to refine it. "This process of teaching the base model is much cheaper than training the base model itself," he said. "Now that DeepSeek has shared how to enhance a base model into a more intelligent one, we can expect a surge in new thinking models."