01/20 2025
531
China will not trail forever!
Text | Hua Shang Tao Lue, Zhang Jingbo
Winter 2024, Hangzhou.
As the West reveled in Christmas festivities, a Chinese coder stood by a colossal floor-to-ceiling window, gazing out at the Grand Canal.
Moments later, he accomplished something that sent shockwaves through Silicon Valley.
[Mysterious Eastern Force]
'A novel model that made the entire valley buzz!'
This was CNBC's assessment of the event's impact on Silicon Valley.
On December 26, a Hangzhou-based Chinese startup named 'DeepQuest' unveiled a groundbreaking large model: DeepSeek-V3.
In various benchmarks, DeepSeek-V3 outperformed other open-source models and rivaled the top closed-source large model, GPT-4o.
Particularly in mathematical reasoning, DeepSeek-V3 excelled.
Remarkably, while matching GPT-4o's performance, DeepSeek-V3's R&D cost was only $5.58 million, less than one-twentieth of GPT-4o's training cost.
This left Americans profoundly unsettled.
Previously, Google and OpenAI had invested billions, even tens of billions of dollars, and utilized tens of thousands of state-of-the-art GPUs to achieve similar results.
This stark contrast prompted Americans to reflect: Is it still worthwhile investing in large models and computational power?
While many Silicon Valley tycoons lauded the achievement, they also tasted the bitterness of Chinese technological prowess: While Americans rested, China was catching up!
Interestingly, this event coincided with the unveiling of China's sixth-generation fighter jet.
Many Americans believed this was more akin to the Sputnik moment when the Soviet Union launched the first artificial satellite in the 1950s than the fighter jet's reveal.
However, what truly shocked Silicon Valley about DeepSeek-V3 wasn't its high performance or low cost, but the pioneering spirit demonstrated by the Chinese.
Historically, almost all Chinese AI companies followed Silicon Valley, leading to a prevalent perception that the US excels in technological breakthroughs from 0 to 1, while China excels in application implementation from 1 to 100.
DeepSeek-V3 shattered this stereotype, significantly enhancing model performance and training efficiency with groundbreaking technologies such as MLA and DeepSeekMoE.
Americans were astonished to discover that Chinese companies could also contribute to innovation, setting their own rules outside their traditional game. This was exceedingly rare in the past.
Due to its superior performance, DeepSeek was hailed in Silicon Valley as a mysterious force from the East.
Remarkably, the funder behind this mysterious Eastern force wasn't an internet giant like Tencent or Alibaba, but a low-key private equity fund, Huantang Quant.
Currently, there are no more than five enterprises in China with 10,000 GPUs, and Huantang is one of them.
In 2023, it established a subsidiary, 'DeepQuest', to commence R&D on the DeepSeek large model, with a team of only 139 members, significantly fewer than OpenAI's 1,200.
Leading this team is Liang Wenfeng, an 80s post-graduate, and the founder of Huantang Quant.
[Curiosity-Driven Madness]
Liang Wenfeng and Huantang's journey began in 2008.
That year, after graduating from Zhejiang University with a major in software engineering, instead of joining a large company like his peers, he ventured to Chengdu and lived in a tiny rented room.
There, Liang Wenfeng began exploring various ways to monetize computers.
After several attempts, he decided to delve into quantitative investment. This decision wasn't easy, as quantitative investment was still novel in China at the time.
Many doubted its profitability.
Whenever he faced challenges, Liang Wenfeng would recall a quote from the father of quantitative investment, Jim Simons: 'There must be a way to model prices.'
Driven by this belief, Liang Wenfeng persevered for two years and finally saw a breakthrough. In 2010, the launch of CSI 300 Index Futures ushered in a golden age for quantitative investment.
Riding this wave, Liang Wenfeng and his team amassed a fortune, with self-operated funds exceeding 500 million yuan.
It was also during this period that, with the breakthrough of deep learning algorithms, artificial intelligence experienced an explosion. Liang Wenfeng, who studied artificial intelligence at Zhejiang University, was fired up with ambition.
In 2015, he co-founded Huantang Quant with Zhejiang University alumni.
These spirited young individuals aimed to use mathematics and artificial intelligence to create a world-class quantitative hedge fund in China, akin to Renaissance Technologies.
Just one year later, they launched their first AI-driven real-money trading and subsequently AI-ized all trading strategies.
With the support of new technologies, the return rate of Huantang Quant funds far surpassed the CSI 300 Index over the same period.
This fueled the continuous growth of Huantang Quant's fund size, which exceeded 100 billion yuan in 2021, ranking it among the top four domestic quantitative funds.
However, as the fund size swelled, Liang Wenfeng faced a tricky problem.
AI trading strategies required substantial computational power, and with the surge in model parameters, the demand for GPU computational power continued to grow.
How to address this issue? Liang Wenfeng's solution was to amass computational power!
Starting in 2019, Huantang Quant began large-scale deployment of AI computational power.
That year, it invested 200 million yuan to build the 'Firefly I' AI computing cluster, equipped with 1,100 GPU cards. At that time, Tesla had just proposed the concept of the Dojo supercomputer.
A few months later, when NVIDIA released its latest A100 chip, Liang Wenfeng once again raced ahead and became one of the first in the Asia-Pacific region to obtain this card.
Then, in 2021, he invested 1 billion yuan to build 'Firefly II', equipped with 10,000 A100 cards, with a computational power equivalent to 760,000 personal computers.
Its footprint was larger than 10 basketball courts.
In an era when AI large models hadn't yet exploded, Liang Wenfeng's actions puzzled many.
What was a private equity fund doing stockpiling so much computational power? Some media even complained that Huantang Quant scared away retail investors in the A-share market.
The outside world's perception of Huantang Quant was still confined to the capital market.
But Liang Wenfeng's vision had already reached the stars and the sea.
In 2017, a Google research team first proposed the Transformer architecture in a groundbreaking paper. This is a neural network entirely based on the attention mechanism, which overturned traditional algorithms.
An American startup called OpenAI continuously trained its large models based on this new architecture. Ultimately, in 2022, it ignited the era of AI large models with ChatGPT.
Since then, global internet giants have followed OpenAI's path, with few questioning it.
But a group of young and daring individuals, led by Liang Wenfeng, did something incredibly bold: they attempted to improve the Transformer architecture.
In fact, from the day DeepQuest was founded in 2023 to venture into large models, Liang Wenfeng and his team began rethinking the algorithm framework.
While others fell into the inertia of simply imitating OpenAI, this group of young individuals took an unconventional path.
They daringly tried various groundbreaking technologies such as MLA (Multi-head Latent Attention) and DeepSeekMoE (Mixture of Experts) at the risk of failure.
The massive amount of computational power chips amassed years ago gave wings to their dreams.
Ultimately, this group of young individuals made history: DeepSeek-V3 emerged overnight, shocking Silicon Valley.
['China Will Not Trail Forever!']
Comparing the Chinese and American tech industries, we often lament:
Why can't China produce great entrepreneurs like Steve Jobs, Elon Musk, or Jen-Hsun Huang?
Steve Jobs had one goal in life: to live to change the world.
Jen-Hsun Huang set ambitious goals in his youth: to do something different and revolutionize computing.
Elon Musk even famously proclaimed: to colonize Mars and find a second home for humanity.
In contrast, it seems that Chinese entrepreneurs place more focus on making money and survival, rarely looking up at the stars and paying insufficient attention to innovation.
In fact, over the past 30 years, we have become accustomed to Moore's Law showering us with better hardware and software every 18 months while we sit at home.
This has led us to barely participate in true technological innovation amid waves of IT.
But this situation has quietly changed in recent years, as a new generation of Chinese entrepreneurs is making breakthrough innovations and starting new games outside the Western framework.
'China must gradually become an innovation contributor rather than always a free rider,' said Liang Wenfeng.
As early as his university days, Liang Wenfeng was convinced that AI would change the world. After graduation, he made enough money in quantitative investment.
This gave him the capital to follow his heart and do what he loved, without first weighing the pros and cons.
From its inception, DeepSeek established its core mission: to explore the essence of general artificial intelligence!
In the Chinese AI community, few enterprises dare to propose such an ambitious goal.
Therefore, over the past few years, while many large model vendors were busy acquiring users and commercializing, Liang Wenfeng was struggling with seemingly unprofitable basic research.
'Innovation is not entirely commercially driven; it also requires curiosity and creativity,' he said.
In Liang Wenfeng's view, Chinese enterprises have been constrained by the inertia of being commercially driven in the past. He hopes that DeepSeek can break free from this constraint.
This business philosophy seems somewhat unorthodox in the current Chinese business community.
Numerous industry insiders have stated:
Liang Wenfeng is a very rare individual in the Chinese AI community. He possesses terrifying learning ability, combines strong infrastructure engineering and model research capabilities, and can mobilize resources.
To internal employees, Liang Wenfeng doesn't seem like a boss at all, but more like a geek.
To this day, he maintains his low-key style, reading papers, writing code, and participating in group discussions like any other researcher in the company.
This low-key tycoon even selects and employs people in a way that goes against the mainstream.
While many large model companies are eager to recruit overseas talent, Liang Wenfeng goes against the grain, insisting on recruiting locally and boldly stating:
'The top 50 talents in the world may not be in China, but perhaps we can cultivate such talents ourselves.'
There are no overseas talents or industry leaders. Liang Wenfeng prefers inexperienced young individuals because they are not constrained by rules and regulations.
At DeepSeek, the criteria for selecting people have always been passion and curiosity.
In fact, this startup isn't, as rumored, composed of a group of esoteric geniuses, but rather young individuals who have graduated only a few years ago.
Many are even Ph.D. candidates or fifth-year Ph.D. students from top universities like Peking University and Tsinghua University.
Due to the cutting-edge nature of their work, these young individuals have almost no reference materials when carrying out their tasks. But it is precisely this blank slate that allows them to dare to break through traditions.
For example, one of the most significant innovations of DeepSeek-V3, the MLA architecture, came from a young individual's sudden inspiration.
There is also no hierarchical division of labor within DeepSeek.
During the research process, if someone has an idea, they can pull others in for discussion and use the company's training cluster cards at any time without approval or limits.
This seemingly loose management style greatly mobilizes everyone's curiosity and creativity, enabling the birth of DeepSeek-V3.
In Liang Wenfeng, we faintly see the shadows of Steve Jobs, Elon Musk, and Jen-Hsun Huang.
'Chinese AI cannot always be in a following position!'
'The real gap is not one or two years, but the difference between originality and imitation.'
These two sentences shouted out by Liang Wenfeng are not only pertinent to the AI industry but also the breakthrough direction that Chinese enterprises must face after decades of following and imitating the West.
All the low-hanging fruits have been picked, and only by daring to break through can we find a new path forward.
Liang Wenfeng is not alone.
Today, from Wang Tao of DJI drones to Wang Xingxing of Unitree Robotics, a plethora of new-generation entrepreneurs are leading China's tech industry into uncharted territories.
[Reference Materials]
[1] 'Unveiling DeepSeek: A Story of More Extreme Chinese Technological Idealism' by Waves
[2] 'The Crazy Path of Huantang Quant: The Large Model Journey of a Hidden AI Giant' by Waves
——END——
Welcome to follow [Hua Shang Tao Lue] to recognize influential figures and read tales of strategy.
All rights reserved. No private reprinting allowed.
Some images are sourced from the internet.
If infringement is involved, please contact us for deletion.