"AI Large Model War Yearly Review: Basic Facts and Trends in the Most Important Technological Competition of the 21st Century

12/19 2024 550

This article, based on publicly available information, aims solely for information exchange and does not constitute investment advice.

This grand large model war is undoubtedly the most significant technological competition of the 21st century so far.

The competition is deeply rooted in the characteristics of the 21st century:

1. Timely Information Disclosure: Any competitor's move is instantly known globally, with technological advantages lasting mere days.

2. Highly Interconnected Ecosystem: Developing a unique skill alone is insufficient; connecting it with upstream and downstream partners to reach end-users and infrastructure is crucial. Lack of advantage in any aspect can lead to being outpaced.

Visible in the mainstream, this competition has been ongoing for two years. A review is necessary to understand its current stage, identify key players, and speculate on its outcome and ultimate winner.

01 Three Stages

Dividing the competition into stages helps trace its progression. Traditionally, stages are determined by landmark products, with OpenAI's ChatGPT serving as the best reference.

As both the initiator and current leader, ChatGPT sets the benchmark for others to catch up. Observing OpenAI's technology and product dynamics, the competition can be broadly divided into three—or even four—stages, considering large models' entry into end-user applications.

Stage 1: Parameter Competition, Entering the Game

A model without billions of parameters is hardly viable.

In 2023, large models were often introduced with parameter indicators and scores on large-scale multitask language understanding benchmarks. GPUs became a hot commodity, with reselling hotter than mining.

While comparing model parameters and scores seems straightforward, it's also a competition of computing power, ultimately a GPU race. This created winners and losers.

Graduates of this stage generally develop their model's "tone" and specialty.

Reviewing OpenAI's GPT models shows a process of eliminating issues, improving accuracy, enhancing intelligence, and adding capabilities.

In February 2019, GPT-2, an unsupervised Transformer language model with 1.5 billion parameters, was released. Then, in June 2020, GPT-3 exploded with 175 billion parameters, marking a leap in NLP technology and setting a threshold for subsequent large models.

In November 2022, OpenAI released ChatGPT, a dialogue product based on GPT-3, along with GPT-3.5, stunning the world with near-human language generation.

In March 2023, GPT-4, with 1.8 trillion parameters and a reported training cost of $63 million per run, was released. This version introduced image processing capabilities, but user complaints about verbosity and nonsensical storytelling persisted.

In May 2024, GPT-4o, capable of processing and generating text, images, and audio, joined the large model arena. Scoring 88.7% on large-scale multitask language understanding benchmarks (higher than GPT-4's 86.5%), it significantly reduced complaints about nonsensical storytelling.

In July 2024, GPT-4omini, a smaller version of GPT-4o, reduced application costs at the expense of some model effectiveness. Its API costs $0.15 per million input tokens and $0.60 per million output tokens, compared to $5 and $15 for GPT-4o, respectively. OpenAI considered large models fundamental for enterprise and developer product features.

In September 2024, OpenAI released the o1-preview and o1-mini models, further improving accuracy.

In December 2024, the full version of GPT-o1 was released. Depending on the prompt, o1 offers varying response speeds and more intelligent answers to complex questions. Testing shows o1's answers are more logical, partially resolving large model hallucinations. Additionally, o1's image processing accuracy improved, recognizing numbers from blurry bar charts and generating analyses.

OpenAI's journey became a reference for other vendors, with its improvement directions setting goals for competitors.

Traditional internet giants like Meta, Microsoft, Google, and Amazon didn't lag behind OpenAI in AI model research but achieved breakthroughs slower.

Meta launched and open-sourced its large model. On December 9, 2023, Meta released Llama3.370B, optimizing multilingual support and extending context length to 128k. Meta also introduced Andromeda, an ad retrieval engine enhancing ad placement personalization and ROI.

Google, a deep learning and AI research leader, seemed to stumble with large language models. On December 12, 2023, Google released Gemini2.0 Flash, its first model with native multimodal input and output, significantly improving accuracy over 1.5pro and generating images directly.

Microsoft and Amazon lagged in large model development but completed layouts through indirect methods. As OpenAI's largest investor, Microsoft obtained exclusive licensing and uses Azure Cloud as its service provider. Microsoft365 and CopilotAI launched AI products based on GPT models. In Q3 2024 earnings, Microsoft projected Azure Cloud revenue of $24.5 to $25 billion, a 35-36% YoY increase, with AI applications contributing ~13% to revenue.

Amazon invested $8 billion in Anthropic, making Claude for Amazon akin to ChatGPT for Microsoft. As an InfoInfra giant, Amazon promotes large models across its chain. At the "re:Invent" conference on December 3, 2023, Amazon unveiled six large models (planning two more in 2024) and introduced the AI training chip Trainum3 and AI server Trn2UltraServer, completing the layout from model training to application.

Despite its late entry, Amazon thoroughly applied large models. Its Q3 earnings report revealed Rufus, a generative AI shopping assistant, and Project Amelia, an AI assistant for B2B merchants.

After Q2 2024 iterations, Anthropic's Claude rivaled GPT-4 in conversation and transformation abilities. PreplexityAI, founded in August 2022, overturned traditional search engine hyperlink display by presenting AI-summarized search results, eliminating the need for link clicks. PreplexityAI even experimented with ad monetization on AI results pages in Q4.

Musk's xAI launched the open-source large model Grok and the image generation model Aurora...

In 2023-2024, not only did foreign large models make significant progress, but domestic competition was intense. Almost all internet companies developed large models.

Six vertical startups focused on large models: WisdomAI, MiniMax, Dark Side of the Moon, BaiChuan Intelligence, Zero-One, and LeapStar. Established BAT companies also developed models: Alibaba's Tongyi Qianwen, Baidu's ERNIE Bot, and Tencent's Hunyuan Model.

Due to rich business and data accumulation, BAT's large model products quickly gained users. Baidu, an early entrant, proposed the concept of "Model as a Service" (MAAS), defining the domestic large model research paradigm.

Internet newcomers ByteDance and Kuaishou launched their large models, Doubao and Keling, respectively. Doubao, in particular, gained traction, with a reported daily active user base of 7.6 million and monthly active users exceeding 40 million in September 2023.

Unlike general-purpose large models, some internet companies developed targeted models based on business characteristics, such as Bilibili's Index, Netease's Ziyue, and 360's Qiyuan. Among major companies, only Meituan and Pinduoduo lack clear large model products or strategies.

Model capability descriptions resemble internet jargon, summarized as powerful, powerful, and powerful.

Wang Xiaochuan once predicted that only five companies would survive in the domestic large model market, with large companies dominating and few startups surviving. Today, this seems increasingly likely, as an effective commercial monetization mechanism lacks, and sustained investment enthusiasm declines. Few of these six companies directly rely on large model capabilities to cover costs.

Overall, after two years, remaining vendors have solid technology and products. Their biggest strategic challenge is monetization, while the technical challenge is expanding model boundaries and developing multimodal capabilities. Fortunately, this stage shows a positive signal: the large model market is no longer winner-takes-all, and no single company can monopolize technology and the market.

Stage 2: Parallel Multimodal Expansion and Monetization

Besides large language models (LLMs), text-to-image, text-to-video, voice dialogue, and even 3D generation have greatly expanded large model applications.

In multimodal competition, video generation holds the most potential. OpenAI launched video generation model Sora and image generation model DALL-E. Meta released text-to-video tool MovieGen, and Google's Gemini2.0 can directly generate videos from text.

Domestically, Kuaishou officially launched video generation model Keling AI, ByteDance introduced PixelDance, Seaweed, and Jimeng AI. Among startups, MiniMax released its first AI HD video generation model technology, abab-video-1.

Baidu stands out in multimodal competition. Despite rumors that Robin Li disapproved of developing a video generation model like Sora while emphasizing multimodal capabilities, Baidu's actions have been slow.

Parallel to multimodal development, monetizing large model capabilities is crucial. For C-end users, the mainstream monetization model adopts a "daily limited use + subscription" model, with $20 per month becoming the entry price for most large models.

OpenAI offers a Team version, a Plus version for $20 per month, and a Pro version for $200 per month. Domestically, Kimi innovatively adopted a "tipping" model, where different tipping amounts grant different peak-hour priority usage times.

B-end monetization models are more diversified, representing large models' true strength. Meta and Google apply large model capabilities to online advertising, driving revenue growth. Domestically, besides Tencent, which hasn't disclosed revenue efficiency improvements from large models, Alibaba and Baidu's cloud businesses have applied AI large models, generating benefits.

On October 31, 2024, Google released its Q3 earnings report, showing Google Cloud revenue increased from $8.411 billion in the same period last year to $11.353 billion, a nearly 35% YoY increase. Google attributed its strong cloud business performance to revenue growth driven by AI products like subscription services for enterprise customers.

Meta, in its Q3 earnings report for the same period, revealed that its core advertising business benefited from revenue growth driven by large model improvements, with over one million advertisers using Meta's generative AI advertising tools.

Large model startups' revenues more directly demonstrate their earning capabilities. While OpenAI and Perplexity AI's revenues aren't impressive yet, their large user bases still give investors patience for monetization.

OpenAI currently boasts 250 million weekly active users, with consumer-end paying users contributing approximately 75% of its revenue. In 2024, the company generated approximately $3.4 billion in total revenue but incurred a loss of $5 billion after accounting for operating, labor, and management costs. In June, OpenAI welcomed its first CFO, who revealed the company's aspiration to boost consumer subscriptions, aiming to convert 5%-6% of weekly active users into paying customers.

The AI search firm Perplexity is actively seeking a new round of funding. According to The Information, the company disclosed in its funding materials that it anticipates its annualized revenue to reach $127 million by 2025, doubling from current levels.

In China, Baidu revealed in its third-quarter financial report that its Wenxin large model was used an average of 1.5 billion times daily, marking a 30-fold increase from 50 million times in the fourth quarter of 2023. Compared to the 50 million times disclosed in Q4 last year, this represents a 30-fold growth within a year. Baidu's smart cloud revenue reached 4.9 billion yuan, up 11% year-on-year, with the proportion of AI-related revenue continuing to climb, exceeding 11%. Alibaba Cloud's quarterly revenue grew to 26.549 billion yuan, a 6% year-on-year increase, with AI-related product revenue experiencing triple-digit growth.

Two years is a relatively short timeframe for technological application. Crucially, the model still requires refinement and integration into existing businesses to reach users. This endeavor necessitates both short-term skills and long-term patience.

Phase 3: Transitioning Recommendations to the Application Layer

While the large model arena may ultimately belong to powerful giants, after 3-5 years of technological advancement, companies will inevitably face the challenge of enabling more downstream enterprises to operate large models, achieve cost recovery, and, most importantly, subject large models to rigorous end-user scrutiny.

Several practical applications of models have garnered significant interest from companies, such as AI Coding, which emphasizes the model's logical thinking and coding capabilities, significantly lowering the barrier to entry for internet product development.

AI Agents transcend the Chatbot framework, leveraging the capabilities of large models more broadly in practical applications. Google's Project Mariner is an AI agent that assists users in finding flights and hotels, purchasing household items, and discovering recipes.

The concept of AI agents is widely acknowledged within the industry, although there is no consensus on a specific definition. A prevalent view is that, in addition to answering questions, AI agents must also be capable of performing complex tasks across multiple systems. AI robots assist the disabled and replace labor-intensive positions through human-machine interaction interfaces.

Among these applications, AI Agents are particularly intriguing, with Microsoft and Google already deploying related technologies. With agents, users can be significantly liberated from Prompt Engineering, enabling the utilization of model capabilities without being constrained by input methods.

Just like the Olympics, some venues attract widespread attention, while others captivate niche groups. However, gold medals can be won in any venue.

Beyond the competition among giants in the large model arena, there is also a notable "application arena" worth observing.

AI education, modeled after star companies like Duolingo and Speak, adeptly replaces foreign teachers with AI voice and large models, providing users with exceptional oral training and vocabulary memorization functionalities.

AI companionship has emerged as the sector with the greatest benefits in terms of revenue and user volume. While it may not garner as much publicity, companies in this sector are thriving. AI Dating (Rizz, Blush), Talkie, and Character AI have achieved both fame and fortune.

AI Marketing: Even standalone LLMs are sufficient to greatly liberate marketers from content creation. Meta has long utilized AI creative generation in its marketing products, and Pinterest has also launched its large model product, Pinterest Canvas, to aid advertisers with creative and content generation.

Beyond content generation, large models can also free advertisers from the complexities of setting up marketing activities. Automated ad placement products from AppLovin and Meta allow advertisers to set basic marketing parameters such as promotional products, budgets, targeting regions, and demographics. The large model automatically orchestrates marketing activities, ad placements, and final ad data analysis. Even sophisticated AB testing can be performed using the model, significantly freeing up advertisers' human resources.

The most promising direction lies in SAAS. If one were to choose the biggest beneficiary of the second arena, small and medium-sized startups would undoubtedly be on the list. On forums like Reddit and HackerNews, individual developers and small teams constantly leverage large model technology. These applications are straightforward and cover a narrow scope, typically relying on mature large models to address specific efficiency issues such as ad copy editing, script polishing, and story idea expansion.

There may also be a fourth phase in the future where the application of large models has advanced to the terminal, sparking a top-down efficiency revolution across various application levels. However, this may not be achievable within just three to five years.

02 Constraints to Takeoff: Computing Power and Cost

While we have classified the development stages of large models, we have yet to address the resurfacing issue of computing power.

In 2023, OpenAI's Altman noted that global AI compute volume doubles every 18 months. In 2024, NVIDIA's Huang Renxun announced that Moore's Law had faltered, and GPU performance would more than double every two years.

Apart from computing power, there is also the matter of model training costs.

How high are the costs of training large models? According to reports, in 2024, the cost of model training and expansion at Anthropic surpassed $2.7 billion. Despite the commonality of funding for large models, and funding amounts consistently reaching new highs, as the foreseeable future becomes clearer and major model vendors encounter computing power and application issues almost simultaneously, many enterprises struggle to obtain funding without thresholds, leading to financial strain and operational difficulties.

The introduction of the text-to-image model Stable Diffusion brought widespread recognition to Stability AI, but the company also faced financial hardships in 2024, struggling to survive.

03 Concerns About Domestic Large Models

One concern is the prolonged investment period – should we persist or not?

The domestic large model arena can be succinctly described using a few words: late start, hurried pace, rushed progress, and rapid decline.

Today, it can be said that the large model competition has entered its third stage, with the race for multimodal capabilities gradually drawing to a close. It can be argued that domestic manufacturers are not lagging behind at this juncture.

However, we can also observe that even basic large models abroad are still undergoing iterations, including parameter enhancements and computing power optimizations. Even after Gemini received various criticisms, Google still launched Gemini 2.0, achieving native multimodal input and output, much to the admiration of users.

Based on our extensive comparisons and usages, Gemini 2.0 represents a qualitative leap from 1.5 and is even more satisfying than ChatGPT-o1 in some applications, truly showcasing the allure of a "reasoning model" by providing next-level references alongside response results.

Returning to China, whether it's the Six Dragons or the established and emerging BAT companies, they all appear to have hit a bottleneck simultaneously – should pre-training continue, and for how long should investment in reasoning models persist? This lengthy and seemingly endless investment period has caused major companies to hesitate after prioritizing cost reduction and efficiency enhancement in previous years.

Will the domestic business environment and shareholders of listed companies tolerate nearly unrewarded investments in large models?

As early as the second half of 2023 and the first half of 2024, during multiple earnings call conferences, analysts from companies such as Meta, Microsoft, and Google repeatedly inquired about the return on investment (ROI) of large models and whether investments were sufficient. The management of each company withstood investor pressure and did not curtail budgets.

But can domestic companies withstand the pressure from investors? It's noteworthy that no domestic giant has explicitly stated the revenue generated by large models in their financial reports.

The second concern revolves around cost recovery.

In the domestic market, there is a lack of effective application scenarios for large model training and application to recoup investment costs. Although this is also prevalent abroad, the issue of cost recovery is particularly worrisome in China. The recent departure of Hong Tao, Chief Marketing Officer of Baichuan Intelligence, may be an indirect reflection of this concern.

Taking the internet industry as an example, there is a dearth of a mature online advertising industry application scenario in China.

Meta and AppLovin have already demonstrated the immense potential of large models in advertising and marketing, gradually reigniting this mature and vast market from the ground up. Firstly, there is a lack of an advertising platform with substantial coverage in China, with most platforms serving as both players and referees, resulting in poor transparency of marketing effectiveness.

Secondly, the SAAS industry, where large models have proven particularly effective, has also shown lackluster development in China.

Foreign companies like Salesforce, Snowflake, and newly listed ServiceTitan, which provide internet cloud computing, cloud storage, and information data services, are toB companies that can integrate and cater to the cloud service and computing needs of more small and medium-sized enterprises, offering a vast platform for large model applications.

There are several directions for the commercialization of domestic large model manufacturers:

Firstly, membership subscriptions, where after exhausting the daily free quota, additional uses necessitate monthly payments.

Secondly, large model training, charged on a per-token basis. Other enterprises utilize the model interfaces of large model manufacturers to enhance their functionalities and compensate the model manufacturers based on the volume of conversations. For instance, deploying chatbots in social products, such as Weibo's comment robot, or providing users with text-to-image or text-to-video UGC scenarios. These primarily rely on interface call volume, which is the most fiercely contested battleground among major model manufacturers.

Price wars are not unfamiliar; they are perhaps the simplest and most effective tactic in domestic business competitions and equally effective in large model applications. However, the question arises: can the improvement in model effectiveness be guaranteed amidst price wars? We even believe that ByteDance's late start but rapid pursuit in the large model battle was due to timing – it capitalized on the period when other companies temporarily set aside model quality amidst domestic large model price wars.

Based on historical experiences from various "trend battles," without an effective business model to recover model costs, enterprises will not persist in investing. Ideally, the outcome of the domestic large model battle may devolve into another "Chinese Android market" scenario.

04 Basic Conclusions

In summary, here are a few fundamental facts about current AI large models:

1. Since the development of large model technology two years ago, its application directions have permeated core internet industries, with online advertising and online education benefiting the most.

2. Traditional industries are also experiencing efficiency enhancements through technological innovations such as terminal access to models.

3. The bottleneck for further model development lies in overcoming the constraints of computing power. However, currently, computing power is almost concentrated in a single company, NVIDIA, which is an abnormal phenomenon.

4. AI training chips may offer a more direct and efficient path to circumventing computing power bottlenecks.

5. As large models increasingly concentrate in the hands of giants and lack effective third-party business platforms, domestic applications may not be as prevalent as in the United States. It is plausible that traditional industries will witness greater application efficacy than the internet.

6. The ultimate progression of domestic large model applications hinges on whether investors possess the patience to tolerate long-term and sustained investments by enterprises.

The large model competition has transcended a simple algorithm showdown and is poised to spark a new industrial revolution. Unlike recent trends like the metaverse and WEB3, this revolution represents a tangible top-down and bottom-up application competition. The competition for talent, technology, and computing power aligns more closely with the Olympic ideals of "Faster, Higher, Stronger." However, the only irrational aspect of this competition is the speed bottleneck – computing power is still primarily controlled by a single company, NVIDIA. This status quo will undoubtedly be challenged by technology giants, as AI training chips have already been prioritized by Amazon and Intel, aiming to disrupt NVIDIA's monopoly at the chip level.

Fortunately, the large model competition is no longer a winner-takes-all scenario. Even small and medium-sized startups with certain local advantages may share in the spoils. People tend to overestimate the impact of large models in the short term while underestimating their long-term influence. This is a fierce yet enduring competition.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.