After the "price war", what else do big models need to compete on?

09/27 2024 502

Source | Bohu Finance (bohuFN)

At the recent 2024 Yunqi Conference, Alibaba once again became the focus of attention.

In May of this year, Alibaba Cloud announced significant price reductions for multiple commercial and open-source models under its Tongyi Qianwen, with discounts of up to 97%. At the Yunqi Conference, prices for three main Tongyi Qianwen models were further reduced, with the maximum reduction reaching 85%.

Following Alibaba's lead in May, cloud services providers such as ByteDance's Volcano Engine, Baidu Intelligent Cloud, Tencent Cloud, and Iflytek have all announced significant price reductions for their large models, with industry price reductions reaching around 90%.

Not only have domestic large model vendors joined the price war, but industry bellwether OpenAI also launched GPT-4o mini in July this year, with a commercial price that is over 60% cheaper than GPT-3.5 Turbo.

It is foreseeable that after Alibaba reignited the "price war," prices for large models will continue to decline and may even move towards "negative gross margins." In the history of the internet industry, "losing money to gain scale" is not an isolated case. To change the business model of the entire industry, inevitably higher costs must be incurred.

However, in this process, balancing price, quality, and service has become a question that large model enterprises must consider. To "survive," enterprises cannot rely solely on "low-hanging fruit."

01 Scale is more important than profit

Domestic large models have transitioned from a pricing model of "per thousand tokens" to a new era of "per hundred tokens." In May of this year, the API call output price for Alibaba's Tongyi Qianwen large model dropped from 0.02 yuan per thousand tokens to 0.0005 yuan per thousand tokens.

After another price reduction in September, the minimum call prices per thousand tokens for Alibaba Cloud's Qwen-Turbo(128k), Qwen-Plus(128k), and Qwen-Max models set new lows, dropping to 0.0003 yuan, 0.0008 yuan, and 0.02 yuan, respectively.

Regarding the latest price reduction, Alibaba Cloud CTO Zhou Jingren stated that every price reduction is a very serious process, involving considerations from various aspects such as the overall industry development, feedback from developers, and enterprise users. The price reduction is not simply a "price war," but rather a recognition that large model prices are still too high.

As an industry matures, price reductions are inevitable. For example, Moore's Law in the semiconductor industry states that processor performance roughly doubles every two years, while cost decreases due to process improvements.

However, the current pace of price reductions in the large model industry has far surpassed Moore's Law, with reductions approaching 100%. Can large model enterprises still make a profit in this context? Perhaps for the large model industry, scale is currently more important than profit.

On the one hand, temporarily ceding profits has become a consensus in the large model industry, with insiders believing that the industry may have already entered the "era of negative gross margins."

According to Caijing magazine, several responsible individuals from Alibaba Cloud and Baidu Intelligent Cloud revealed that before May of this year, the gross margin on large model inference computing power in China was over 60%, in line with international peers. However, after successive price reductions in May, the gross margin fell into negative territory.

After large model prices are reduced, the number of users will continue to increase. The more calls made in the short term, the greater the losses for large model enterprises, as each model call consumes costly computing power. This means that large model enterprises must not only reduce selling prices but also face higher cost inputs.

On the other hand, the effects of large model price reductions are significant. Taking Alibaba Cloud as an example, after price reductions, the number of paid customers on Alibaba Cloud's Bailian platform increased by over 200% compared to the previous quarter. More enterprises abandoned private deployments and chose to call various AI large models on Bailian, which now serves over 300,000 customers.

Over the past year, Baidu's Wenxin large model prices have also been reduced by over 90%. However, Baidu disclosed in its Q2 2024 earnings call that the daily average number of Wenxin large model calls exceeded 600 million, with a more than tenfold increase within six months.

It appears that large model enterprises are willing to sacrifice profits for price reductions in pursuit of "expectations" – sacrificing short-term interests for long-term returns.

Industry insiders estimate that current revenues from model calls by large model enterprises will not exceed 1 billion yuan, which is just a "drop in the ocean" compared to total revenues in the tens of billions of yuan range.

However, over the next 1-2 years, the number of large model calls is expected to increase exponentially by at least tenfold. In the short term, the larger the user base, the higher the computing costs for large models. However, in the long run, computing costs in the cloud services sector are expected to gradually decrease as customer demand grows, ushering in a "return period" for enterprises.

As the industry continues to develop, AI's impact on computing power will become increasingly evident. Alibaba CEO Wu Yongming once stated that over 50% of new demand in the computing power market is driven by AI, with large models accelerating commercialization.

On the one hand, price reductions significantly lower the barriers to entry and trial costs for enterprise customers, especially in traditional industries such as government, manufacturing, and energy, where business scales are larger and there is greater potential for growth.

When large models become accessible to everyone like other infrastructure, their market potential will likely see significant growth. Before that happens, large model enterprises will inevitably need to offer concessions to enterprises and developers.

On the other hand, while existing revenues may decline after large model price reductions, incremental revenues will increase. Taking Baidu as an example, its large models not only generate direct revenues, such as calls to products like the Wenxin large model, but also drive indirect business revenues, such as Baidu Intelligent Cloud services.

Over the past few years, there have been questions about Baidu Intelligent Cloud's strategy, as it does not dominate the public cloud market. However, in the AI public cloud niche market, Baidu has begun to overtake its competitors. Currently, the revenue share of Baidu Intelligent Cloud's large models has increased from 4.8% in Q4 2023 to 9% in Q2 2024.

Therefore, the current consensus in the large model industry is that scale is more important than profit, a viewpoint that is also common in the internet era, as seen in "Groupon Wars," "Ride-hailing Wars," and "E-commerce Wars." Large model enterprises cannot avoid the "price war" and must aim to survive it, hoping to emerge as the ultimate beneficiaries.

02 Alibaba focuses on "AI Infrastructure"

Alibaba is well aware of this, and after recently announcing further price reductions for large models, it also introduced the concept of "AI Infrastructure." Alibaba Cloud Vice President Zhang Qi stated that current AI is akin to the internet around 1996, when internet access fees were expensive, limiting the development of mobile internet. Only by reducing fees can future application explosions be discussed.

Therefore, in addition to announcing further price reductions for large models at the 2024 Yunqi Conference, Alibaba Cloud also released a new generation of open-source large models, making available over 100 models covering various sizes of large language models, multi-modal models, mathematical models, and code models, setting a record for the largest number of open-source large models.

Alibaba Cloud CTO Zhou Jingren stated that Alibaba Cloud is firmly committed to its open-source strategy, hoping to leave choices to developers who can make trade-offs and selections based on their business scenarios to enhance model capabilities and inference efficiency, while also serving enterprises more effectively.

According to Alibaba's statistics, as of mid-September 2024, Tongyi Qianwen open-source model downloads exceeded 40 million, with a total of over 50,000 derivative Qwen series models, making it a world-class model group second only to Llama, which holds the top spot in open-source large models with nearly 350 million global downloads.

After the "Hundred Models War" ends, many industry leaders agree that "competing on models is less important than competing on applications," and major enterprises are beginning to focus on "competing on ecosystems." Baidu Chairman Robin Li once stated that "without a rich AI-native application ecosystem built on top of basic models, large models are worthless."

Currently, over 190 large models have been registered with the State Internet Information Office, with over 600 million registered users. However, it remains difficult to solve the "last mile" problem for large models. The challenge lies not only in the scarcity of large model applications but also in their lack of "groundedness." For example, in specialized fields like healthcare and finance, relying solely on "data-feeding" training makes it difficult for large models to be directly applied.

Major enterprises cannot feasibly enter every niche industry to complete the "last mile." However, by building a comprehensive application ecosystem, downstream enterprises or other developers can create customized model products, optimizing resource allocation and accumulating high-quality data in the process, ultimately feeding back into the development of basic large models.

Alibaba's decision to reduce prices and open source is essentially aimed at lowering the barriers to entry for large models, validating their application value through lower prices, and encouraging more enterprises and creators to participate. Only when large models can truly meet the complex business scenario needs of enterprises can ecosystems develop, and the industry can enter a new phase.

However, the "Hundred Models War" may ultimately leave only 3-5 large model enterprises standing. Currently, the first tier of the industry is emerging, and these enterprises are likely to form the foundation of the future large model industry.

Therefore, leading large model enterprises are unlikely to voluntarily abandon the price war and cede market share. Additionally, many unicorns hope to carve out a "path to survival" through the price war, with some enterprises believing that smaller models may offer better value for money.

In fact, the May 2023 large model price war did not originate with Alibaba but with a "DeepSeek V2" catfish model, which priced its model API supporting 32k contexts at 1 yuan per million tokens (computation) and 2 yuan per million tokens (inference), against the industry norm of hundreds of yuan per thousand tokens.

Currently, the large model elimination race may continue for another 2-3 years. Although few large model enterprises will ultimately remain, to survive, enterprises must pull out all the stops. However, the question remains: when the "low-hanging fruit" has been picked, the solution for the large model industry is no longer simply about being cheap.

03 Model capabilities remain crucial

However, there are differing opinions within the industry regarding the "price war" in large models. Li Kaifu, founder of Innovation Works, once stated that there is no need for a frenzied price war because large models are not just about price but also about technology. If technology is lacking and businesses rely on losses to stay afloat, such pricing strategies are unsustainable.

Tan Dai, President of Volcano Engine, also expressed that the current focus is on application coverage rather than revenue, emphasizing that stronger model capabilities are needed to unlock new scenarios, which are of greater value.

Currently, the essence of the "price war" lies in insufficient product capabilities. With model capabilities tending towards homogenization and temporarily unable to create significant gaps, price reductions are hoped to increase large model adoption and help vendors gain market share.

However, as the market picks the "low-hanging fruit," new challenges will emerge. Enterprises must contend with the next stage of the price war, differentiate their large models from competitors, and determine whether they will ultimately survive. These issues must still be addressed.

Therefore, while engaging in the price war, large model enterprises are well aware of the importance of products, technology, and cash flow. They must withstand pricing pressures, create technological gaps with competitors, and continuously improve model performance and product implementations to form a virtuous business cycle.

On the one hand, large model enterprises do not rely solely on the "price war." Typically, large model inference involves three variables: time, price, and the number of generated tokens. It is not feasible to consider only token prices without taking into account the number of concurrent calls per unit time.

As inference tasks become more complex, the need for increased concurrency is likely. However, most current price-reduced large models use pre-trained models (which do not support increased concurrency). Truly large-scale, high-performance models supporting high concurrency have not seen significant price reductions.

On the other hand, technology can further optimize large model inference costs. Taking Baidu as an example, its Baige Heterogeneous Computing Platform has undergone specialized optimizations for the design, scheduling, and fault tolerance of intelligent computing clusters, achieving an effective training duration ratio of over 98.8% on ten-thousand-card clusters, with linear speedup ratios and bandwidth efficiencies reaching 95%, respectively, helping customers address issues such as computing power shortages and high costs.

Microsoft CEO Satya Nadella once cited an example, stating that GPT-4's performance had improved sixfold over the past year, while costs had dropped to one-twelfth of their previous level, resulting in a 70-fold increase in performance-to-cost ratio. It is evident that advancements in large model technology are the foundation for sustained price reductions in the industry.

Lastly, it is crucial to create more differentiated products. Low-price strategies can help large model enterprises build ecosystems. However, as the AI field continues to evolve and innovation accelerates, shortening technology replacement cycles, the core competitiveness of large model enterprises lies in their ability to consistently provide competitive products and address user pain points in practical applications.

Currently, the business logic of the large model industry has transitioned from focusing on models and costs to emphasizing ecosystems and technology. While low prices remain a vital tool for rapidly establishing ecosystem barriers, reducing costs through technology is the key to advancing large models into the "value creation stage."

Going forward, the new battleground for large model enterprises will be "cost-effectiveness," where they must improve the quality and performance of large models based on current prices, making models more capable and diverse. While this may not necessarily lead to the development of "super apps," attracting more small and medium-sized enterprises and startups can create opportunities for explosive growth for large model enterprises.

The copyright of the cover image and accompanying images belongs to their respective owners. If the copyright holders believe that their works are not suitable for public viewing or should not be used free of charge, please contact us promptly, and we will promptly make corrections.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.