06/27 2024 584
Pencil Dao and Lenovo Capital jointly launched the "AI Fusion" column, focusing on new insights, trends, and opportunities in the AI era.
Dialogue丨Li Xiang, Zou Wei
Last year, Google released a concerning internal document: "Although Google and OpenAI are racing neck and neck (on large models), neither party has a real moat, because a third force is rising - the open-source community is the biggest enemy of both Google and OpenAI."
Google's concerns are gradually becoming a reality.
Musk's open-source large model company, xAI, recently completed a $6 billion Series B round of financing, with a valuation of $18 billion; the French AI startup Mistral AI, which insists on open source, received 600 million euros in investment, and its latest valuation has approached 6 billion euros; the well-known large model open-source community Hugging Face has also seen its valuation soar to $4.5 billion. In the field of code open source and hosting, well-known communities such as GitLab and GitHub have long been established.
The open-source ecosystem is gradually becoming a crucial force shaping the development of large models. However, while overseas progress in open-source large models is in full swing, China's open-source communities and ecosystems seem to be lagging behind. There is a popular question on Zhihu: "Why can't China produce a Hugging Face?"
But this situation is changing. In 2023, Chen Ran, a serial entrepreneur in the fields of cloud computing and AI, founded the large model open-source community OpenCSG (Open Collaborative Software Group) to provide customers with open-source large model products and services. Less than half a year after its establishment, OpenCSG received investments from Lenovo Capital and Beijing Guoxin Zhongshu, and the company's valuation has reached hundreds of millions of yuan.
As a veteran with 20 years of experience in open source and AI, Chen Ran has found that large models are bringing unprecedented industry changes, but most enterprise-level users cannot develop applications based on large models, let alone train a model that meets their own needs. An AI version of "GitLab + Hugging Face" has emerged.
"Open source is very important in the field of large models. It is related to the business model and the industrial chain," Chen Ran told Pencil Dao. "Building an open-source community and ecosystem is likely the only way to break OpenAI's market monopoly."
Data shows that the global large model market size will exceed $28 billion in 2024 and exceed $100 billion by 2028. Going abroad to participate in global competition and serve global enterprises is also Chen Ran's dream. "I want to prove that China can also give birth to excellent startups in the open-source field, allowing investors to see the commercial value of open source."
Recently, Pencil Dao had a conversation with Chen Ran on topics such as the business model of open-source communities and the prospects of China's open-source market. Here are the highlights of the dialogue.
- 01 -
Pencil Dao: As an AI veteran, what opportunity prompted you to start another company in the open-source field?
Chen Ran: I saw that open source in the AI era will usher in disruptive opportunities, worthy of another startup.
Working for 20 years, I have always been involved in open source. My previous open-source company mainly provided localized code services and data support for B-end customers, accumulating 16 million users, becoming the largest open-source platform in China.
Hugging Face is an excellent platform for hosting large models and datasets, similar to GitHub, but it mainly targets scientists and algorithm engineers and does not have a particularly strong willingness to do B-to-B business.
In the AI era, my previous accumulations can just fill the market gap. I have experience in building ultra-large online open-source communities, and I am also good at doing B-to-B business, providing localization services and private deployments for enterprises, as well as having 16 million familiar users. The combination of these elements means that I can create a Chinese version of "GitLab + Hugging Face" in the AI era, which is a historic opportunity for open source.
Pencil Dao: After founding OpenCSG, is the market demand consistent with your original设想?
Chen Ran: Basically consistent. After truly delving into the market, I quickly realized that "cost reduction and efficiency enhancement" have become the top priority for large models.
Currently, everyone faces the dilemma of "large models seem to be more important than anything else, but they cannot be truly applied to scenarios." An important reason is that the cost of large models is too high. Computing power, data processing, and algorithmic talent are all expensive. The original intention of OpenCSG is to reduce the usage cost and threshold of large models.
The name OpenCSG represents the company's philosophy of cost reduction and efficiency enhancement. "C" stands for Converge, representing the convergence of computing power. Because the highest priority for the development of large models is computing power, but China's computing power is relatively fragmented, so it is necessary to distribute computing power through a combination of high, medium, and low-end computing power - domestic and foreign computing power - and allocate computing power on demand.
"S" stands for Software Refine, meaning software reshaping. Software reshaping involves using large models to generate code because BAT and other major companies have inflated the salaries of R&D personnel while their efficiency is low. Our philosophy is to use large models for code development and delivery, reducing costs and enhancing efficiency at the software-defined level. Starship, which has recently garnered much attention, is our main product.
"G" refers to Generative, referring to generative AI. Large models are the future, but they must be delivered to customers in an open-source manner, which is what we call Open (open source and open access). While there are excellent open-source platforms such as GitLab and Hugging Face abroad, China has been lacking such platforms until the era of large models, which presents an opportunity for OpenCSG.
Pencil Dao: GitLab is already a globally renowned programmer community, and Hugging Face's valuation has reached $4.5 billion. What is OpenCSG's plan?
Chen Ran: The ultimate goal is certainly to go global and compete with GitLab and Hugging Face because China has the largest and highest-quality R&D talent pool in the world, providing a solid foundation for software talent.
However, from a more realistic perspective, we need to be stable for now, first consolidating our user base and revenue before expanding our scale. This year, we have basically achieved revenue balance.
Pencil Dao: Mainstream large models on the market have not yet achieved profitability. How did OpenCSG manage to do so?
Chen Ran: Large models themselves are not valuable. We deliver the value of large models to customers through open-source methods, allowing customers to truly pay for value.
Pencil Dao: What are the points of customer payment?
Chen Ran: OpenCSG's business model is similar to helping customers build a dedicated cloud, providing customers with the ability to build software architectures through a subscription-based payment model and helping customers customize the "private cloud" in large models.
We have several main products - CSGHub open-source model platform, Wukong pre-trained model, CSGCoder fine-tuned code model, and StarShip, which has set a new record in large model programming.
At the same time, compared to MaaS (Model as a Service) companies that only provide interfaces for open-source large models, we will also provide additional code for open-source large models to help companies with Fine-tune (fine-tuning) and development. After using OpenCSG's architecture services, even customers without development capabilities can quickly generate software based on large models by simply inputting their needs in natural language.
Pencil Dao: It sounds like OpenCSG is providing customized services to a large number of users, earning hard-earned money.
Chen Ran: It's not hard work because we use an open-source approach, and essentially, customers assemble their own models.
Our model is similar to that of CATL (Contemporary Amperex Technology Co., Ltd.) in making batteries. Both NIO and Li Auto can build cars based on batteries, but core technologies like batteries and central control are independently developed by CATL.
OpenCSG has already established industry standards, built an open-source platform, and has mature products. B-end customers will pay for enterprise-level services. As for customized needs, we have a large number of partners and use an open-source collaborative approach. As long as customers provide core data, we can train the models they need.
- 02 -
Pencil Dao: People like Zhou Hongyi, Yang Likun, and Zhu Xiaohu all believe that open-source models are the future. What do you think about the debate between open-source and closed-source large models?
Chen Ran: I believe that the debate between open source and closed source is a competition of business models, similar to the coexistence of iOS and Android systems. There is no good or bad, and both have always coexisted.
However, closed source is generally done by a few large companies, while open source emphasizes global collaboration. Because more people participate and more scenarios are involved, the product can adapt to more people, following a collaborative and win-win approach.
But for China, the priority of doing open source must be higher than closed source.
Pencil Dao: Why do you say that?
Chen Ran: Because most Chinese companies do not have the strength to do closed source. How many people can afford to develop a closed-source model? Another key point is that the critical node of large models in the future is not computing power or even the model itself, but the core dataset.
Since the core dataset is generally in the hands of the party A (the enterprise), if party A pours data into a commercial closed-source model, who owns the trained model - the customer or the model manufacturer? The future ownership of data and large models is unclear. Customers definitely want the model to belong to them because most enterprise data involves core secrets.
Therefore, we insist on building the CSGHub open-source model platform to provide enterprises with integrated online and offline services. What enterprises truly need is an online and offline integrated platform because enterprise data, as the primary element, must be managed offline. For this reason, we firmly adhere to the open-source model, allowing customers to manage their own data assets.
Pencil Dao: The lack of high-quality Chinese datasets. Will the lack of high-quality data constrain the future development of large models?
Chen Ran: Let me tell you why. Because China's digital transformation in the previous era was not thorough. China's Internet era was indeed very glorious, but precisely the glory "delayed" the underlying construction of a generation of basic technologies. Because various application companies were thinking about monopolizing data and accumulating users, leading to the uneven development of open source.
Although there are also many excellent open-source projects and communities in China, there is still a gap compared to the international level. China's open-source culture is not as mature as internationally, lacking sufficient understanding and support. In terms of commercialization, China has not yet blazed a successful path from open-source technology incubation to commercialization (IPO), and an open-source business model suitable for China's national conditions has not yet taken shape.
These deficiencies in basic technologies and software accumulation will be increasingly amplified in later developments, especially in the era of large models. Why have we been catching up? Because the development of any technology follows regular patterns, continuous iteration, and interdependence, ultimately leading to new innovations. Innovation cannot come from nowhere, and China has missed some open-source links.
Pencil Dao: Is the path OpenCSG is taking particularly difficult?
Chen Ran: Three years ago, I wouldn't have wanted to do this because this is something that hasn't been done in China before. You have to patch the holes, so you ask if it's difficult? It's very difficult. But this is also a necessary part of innovation, and you have to do it.
I firmly believe that China needs an open-source ecosystem, and the government is also introducing favorable policies to support the development of the open-source ecosystem, such as the 14th Five-Year Plan advocating support for open source.
However, open source cannot be achieved by just one large model company but requires building an entire open-source ecosystem and community. So I only have to persist and believe that one day the open-source ecosystem will see an explosion, because developers can profit from the community, just like merchants can earn money on Taobao, and they are willing to stay in this community, making you the Taobao of this era.
- 03 -
Pencil Dao: How did you initially attract many developers to OpenCSG and keep the community active?
Chen Ran: There is a professional term called "traction" for active communities, and we mainly have three traction forces.
First, we provide users with real-time and usable computing power. OpenCSG's target audience online is R&D personnel, so we have a computing power trading platform that allows all R&D personnel to use large models at the lowest cost and threshold. And through online computing power commission sharing, we achieve cost reduction and efficiency enhancement for R&D personnel, generating more agents.
Second, we provide various reliable and affordable open-source model options. We have pre-trained many models, such as the Wukong model, and many other open-source models, so customers can always find tailored models.
Third, we are affordable and easy to use.
For example, many of our enterprise customers actually do not have additional funds and do not know how to choose a suitable model, but their pain points are clear - cost reduction and efficiency enhancement. At OpenCSG, due to open-source code and pre-trained models, customers only need to describe their application needs, and OpenCSG can complete the code generation for the corresponding software. For example, if a customer wants to develop a website, by simply describing the various functions of the website, the corresponding software application can be built, which is naturally less costly compared to building a team on their own.
These saved labor costs are then converted into subscription fees for OpenCSG.
Pencil Dao: Many open-source communities hope to create a community atmosphere of "all for one, one for all." Is that the case with OpenCSG?
Chen Ran: "All for one, one for all" is a sentiment. But open source is not a sentiment; it is a business model.
Open-source communities either help others make money or save money. The prosperity of the community must be driven by interests. Many people talk about becoming leaders in open source and contributing to open-source culture. I can only say that this is an academic mindset. More companies do open source for profit.
Pencil Dao: In terms of market competition, are you worried about Hugging Face?
Chen Ran: I'm not worried about Hugging Face at all.
When Hugging Face was still online, I was already promoting a private open-source large model hosting model (licensing the technology to a fully independent local company to help enterprises develop software faster and better). I was essentially creating user demand.
And I have already achieved profitability because I clearly understand that cost reduction and efficiency enhancement are the business model. Just because large models have not yet commercialized does not mean that there cannot be a business model around large models - it can spawn models such as software subscriptions, dataset delivery, services, and commercial distribution.
Including valuation, this is my third startup, and I have always believed that a company's valuation must match its revenue. There is no need to overestimate a company's valuation for the sake of financing. As I said before, customers pay for the product, and they always pay for value points. So I bring value to enterprise customers, obtain income, and take a more stable approach.
Pencil Dao: What if local internet giants also want to develop products similar to OpenCSG?
Chen Ran: There are still significant differences between the approaches of large companies and startups. OpenCSG has been a native large model company since its inception, offering large model open-source products based on Git. Many large companies, while emphasizing large models, cannot possibly cut off all their other mature businesses, making decision-making slow in resource allocation.
Pencil Dao: What is the biggest difficulty in building an open-source ecosystem in China in the era of large models?
Chen Ran: There are too few people who understand the industry, and there is a lack of successful cases of open-source. In the previous era, people did not pay much attention to the open-source ecosystem, and it started late.
Open source has been done in the US for over 30 years, giving birth to many billion-dollar unicorns, and a large number of investors have earned