Who should be the 'nourishment' for the growth of large models?

08/06 2024 409

Talk of 'AI replacing humans' is rampant and has moved beyond the realm of science fiction into the real world.

First, the launch of Luobo Express caused dissatisfaction among a large number of ride-hailing drivers. Subsequently, Tomato Novel introduced 'AI Authors' who could simultaneously update three novels a day, posing a significant threat to online fiction writers.

Furthermore, Doubao, which belongs to ByteDance alongside Tomato Novel, was revealed to be scouring the internet for novel materials to 'feed' Tomato's AI authors. Interestingly, this controversy has also spilled over into the online document field. It is reported that some netizens on social media platforms have claimed that Doubao's search range extends beyond online fiction, even reaching unpublished content within online documents.

Among them, WPS bore the brunt of the criticism.

In response, WPS officials had to urgently refute the rumors. However, in addition to WPS, almost all online document apps were subjected to scrutiny, including Shimo, Tencent Docs, Evernote, Moke, Chenggua, and more. Ultimately, many people realized that in the face of the AI trend, no corner of the online world is safe.

These concerns are not unfounded. Last year, WPS was accused of adding AI training clauses to its user agreement. Although the company has consistently denied these allegations, many users remain skeptical, fueling a controversy reminiscent of Rashomon.

Users fear that their hard work will become someone else's gain, while the capital behind AI strives to be the 'Prometheus' gathering 'sparks' for large models everywhere.

Online Documents Rush into AI

In recent years, online documents have faced numerous pressures, including stagnation in research and development, user churn, intense competition, and sluggish advertising revenue growth.

The lack of innovation in online documents is undeniable, and WPS, as an industry representative, is not immune to this dilemma. Prior to its foray into AI, WPS's most significant innovation was in 2018, when Tencent Docs emerged, prompting WPS to announce an upgrade just three months later.

In particular, WPS emphasized its multi-person collaboration feature, which was initially offered as a standalone service. However, cloud collaboration is now ubiquitous, with services like Feishu, DingTalk, and Qiye Weixin, as well as online document players like Shimo and various cloud office document mini-programs, offering similar capabilities.

It is worth noting that the surge in remote work in recent years has indirectly boosted the popularity of many online document apps, with Evernote and Shimo Document seeing significant increases in user engagement. In 2020, when remote work exploded, Evernote's consumer user base grew four to five times, while Shimo Document's new user and enterprise registration numbers surged by approximately six times.

That same year, after Tencent Docs announced monthly active users exceeding 160 million, intense competition became the norm in the industry, which entered a prolonged bottleneck period.

However, the emergence of large models began to shift the landscape.

According to 2023 financial reports, WPS Office's domestic personal office service subscription revenue reached 2.65 billion yuan, a year-on-year increase of nearly 30%, with monthly active devices exceeding 598 million.

The drive towards AI for online documents has become an inevitable trend. According to iMedia Research, with the large-scale application of large language models and AIGC in collaborative office scenarios, the collaborative office market is expected to grow significantly, with a projected market size of 33.01 billion yuan in 2023.

AI has become a crucial tool for online document platforms to solidify their market position and retain users. WPS, in particular, is eager to enter the AI fray. Following the success of ChatGPT, the online office market emerged as one of the most accessible arenas for AI implementation. Microsoft, for instance, quickly integrated its GPT-4 model from OpenAI into Microsoft 365 upon its release.

WPS Office has long been in direct competition with Microsoft Office. According to public data, Microsoft Office and WPS Office have average market penetration rates of 81.5% and 68.7%, respectively, on the Windows platform in the domestic market. While Microsoft Office leads on the PC side, WPS Office has an advantage on mobile devices.

Undeniably, Microsoft's AI strategy has strongly stimulated WPS. However, it's important to note that WPS is not alone in its focus on AI. Domestic companies like Baidu, with its intelligent office platform Ruliu, DingTalk, which leverages Tongyi Qianwen, Evernote with its 'Impression AI,' and Feishu with 'My AI,' are all striving to leverage AI for advancement.

Major players aim to leverage AI for further growth, while smaller players with insufficient funds to enter the AI game are exploring partnerships with larger companies as a workaround. This dynamic explains the collective fallout among online document platforms following the 'Doubao plagiarism' incident.

In summary, online documents are eagerly embracing AI. Regardless of who emerges victorious, users tasked with 'feeding' these systems find themselves in an unfortunate position. As they navigate different platforms, trying to avoid this role, they sadly realize that there seems to be no escape.

The 'Original Sin' Behind Large Model Training

It is reported that Baidu's Wenxin Yiyan has served 85,000 enterprise customers, while Ali's Tongyi Qianwen has served 90,000. As of May 15, 2024, Doubao from ByteDance had surpassed 100 million total downloads, with over 26 million monthly active users across both platforms.

As large models gain popularity, AI training naturally attracts attention. According to public information, large model training typically involves five steps: data collection and processing, model design and testing, model training, evaluation and optimization, and model deployment and maintenance.

Among these, the first step is crucial and has given rise to controversies surrounding platforms like Doubao and WPS AI.

Data serves as the foundation for large model training and evolution. However, in the context of AI development, the legitimacy of data sources becomes a prerequisite for the sustainability of this field. Without it, issues such as copyright and privacy concerns arise. Prior to angering online fiction writers, the artist community had already encountered similar issues.

Late last year, several Chinese illustrators jointly sued Xiaohongshu's AI painting model Trik for using their original works as training data without authorization, generating highly similar images and infringing upon their legal rights. Zhou Hongyi, the founder of 360, was also ridiculed online for his involvement in 'AI image theft.'

The situation is similar overseas. Reports indicate that 16,000 British artists have jointly filed a class-action lawsuit against OpenAI and other AI companies. Even The New York Times has sued OpenAI and Microsoft for copyright infringement.

In the historical journey of technological progress, it seems we must always pay some 'invisible' costs before enjoying the fruits of technology. However, the question of who should bear these costs merits discussion.

In essence, the reason why large model training extends its reach to the general public is that the cost of developing large models for businesses remains high, and practical applications have yet to generate significant revenue. While companies like OpenAI, Midjourney, Wenxin Yiyan, and iFLYTEK's Xinhuo have implemented paid models, they are still some distance from profitability.

Taking OpenAI as an example, surveys show that while its revenue in the first two quarters of this year was impressive, with annual recurring revenue reaching $3.4 billion, the company's losses persisted due to the high costs of building and running its models. While ChatGPT's paid subscription revenue accounted for over 50% of total income, API revenue skewed towards enterprises and developers, contributing only around 15%.

In China, WPS AI currently charges over 12 yuan per customer acquisition and 2.64 yuan for summarizing a 10,000-word document and outputting a 1,000-word summary. However, a price war has suddenly erupted within the industry. Previously, GPT-4o mini was officially launched, charging 15 cents per 1 million input tokens and 60 cents per 1 million output tokens.

Against this backdrop, the large model sector presents a diverse landscape: aggrieved users are outraged, while companies struggle with asymmetrical costs and revenue. Meanwhile, investors are starting to consider withdrawing. According to LaiMi PEVC data, there were 198 financing cases in the AI sector in the first quarter of 2024, a year-on-year decrease of 20.80%.

Historical precedents suggest that when technological advancements clash with ethical considerations, technological development is often viewed as bearing an 'original sin.' But should technology truly bear this burden?

Innovation Coexists with Restraint

The question of whether AI can replace humans originated in the era of science fiction. As large models ignite a global technological frenzy, this question seems to have gained a degree of plausibility beyond mere speculation. The protests by drivers against Luobo Express and the joint petition by online fiction writers against Tomato Novel mark the beginning of a new chapter in modern technological civilization.

On July 6, the 2024 World Artificial Intelligence Conference concluded in Shanghai, with finance, education, and healthcare emerging as key application areas. In fact, when ChatGPT was launched, some individuals overseas specifically analyzed which jobs were most likely to be replaced by AI.

One blogger analyzed freelance job data from Upwork (the world's largest freelance platform) from the month before ChatGPT's launch to February 14, 2024. The study found that writing, translation, and customer service were the most impacted sectors on Upwork due to ChatGPT. In particular, translation suffered the brunt of AI's impact, with job volumes decreasing by 19% and hourly rates dropping by 20%.

However, beneath the surface of data suggesting AI's potential to displace human jobs, there lies a countertrend: from a macro perspective, many jobs have actually increased as a result of ChatGPT.

Statistics show that creative work based on large models has reaped the first wave of AI 'dividends.' On Upwork, post-ChatGPT, video editing/production job postings increased by 39%, graphic design by 8%, and web design by 10%. Software development jobs also saw growth, with backend development up 6% and frontend/web development up 4%.

The dual nature of technology is evident here: AI is not universally reviled. At least when users' basic interests are protected, the benefits of AI in daily life far outweigh the drawbacks. This is also true in China. For instance, in 2023, AI translated 20% of the online novels exported by Qidian International.

Nevertheless, controversies arising from AI applications persist.

Beyond concerns about copyright among online fiction writers and illustrators, the surge in AI-generated research papers in academia is also challenging the practical significance of this field. For instance, since July last year, the number of AI-generated papers submitted to the Chinese Medical Journal has been rising monthly, exceeding 50% at one point.

Currently, the Chinese Medical Journal has issued regulations governing the use of AIGC technology, ranging from rejection or retraction of papers for minor infractions to inclusion in an academic dishonesty list for more serious cases.

These cases underscore that some industries are already recognizing the need for AI applications to adhere to necessary rules. Our heightened vigilance towards AI-powered online documents stems from the inadequate AI regulations currently in place.

In response, governments are introducing more policies. Domestically, China has enacted the Provisional Measures for the Administration of Generative Artificial Intelligence Services. Internationally, the European Parliament passed the Artificial Intelligence Act in March this year. The challenge now is to foster and utilize large models effectively while preserving innovation.

All signs indicate that the emergence of large models marks an indelible milestone in human technological civilization, accompanied by a heavy responsibility that requires more people to take the initiative.

As for who should serve as the 'nourishment' for the growth of large models, it is clear that ordinary users should not bear the sole burden.

Daotmt (formerly known as Waidaodao) is an internet and tech media outlet. Follow us on WeChat with the same name: Daotmt ( Dao is always reasonable ). This is an original article. Unauthorized reproduction without proper attribution to the author is strictly prohibited.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.