Decoding the Open vs. Closed Source Debate in Large Models: Who Will Reign Supreme?

02/17 2025 528

On February 14, Baidu unveiled its bold open source initiative! The company stated that it will gradually introduce the ERNIE Bot 4.5 series over the coming months, officially opening its source code from June 30.

Since the global trend of open sourcing large AI models gained momentum with DeepSeek, the choice between open and closed source paths has been a heated topic in the AI field. Even Baidu, previously a staunch supporter of closed source, has now joined the open source camp, signaling that the tide is turning in favor of openness. Today, let's delve into the pros and cons of open vs. closed source, explore the differences between open sourcing large models and software, and discuss how to choose between the two for commercial applications. At the end of this article, you can access 30 free industry research reports on large models.

01 The Pros and Cons of Open Source and Closed Source

Originating from the software realm, 'open source' refers to making source code publicly available during software development, allowing anyone to view, modify, and distribute it. Open source software typically follows principles of reciprocal cooperation and peer production, fostering improvements in production modules, communication pipelines, and interactive communities. Notable examples include Linux, Mozilla Firefox, and Android.

In contrast, closed source software (proprietary software) does not disclose its source code due to business models and other reasons, only providing computer-readable programs (such as binary formats). The source code is exclusively controlled by the developers. Examples include Windows and iOS.

Open source encourages collaboration, technological equity, and continuous technological progress. It advocates that technology should not be monopolized by a few. Closed source, however, often results in a stable and focused product but usually requires a fee and leaves users dependent on developers for issue resolution.

Both open and closed source are technical and business strategies. While they appear to be a dispute over development paths, they are fundamentally a battle over interests.

Open source promotes technological equity and accessibility but may lead to a lack of innovation drive. Closed source, due to technological monopoly, can generate greater profits and invest more in underlying technological innovation, resulting in stable, secure, and focused products.

Consider the rivalry between Apple's iOS and Google's Android. While iOS is often perceived as smoother, Android's open source nature has enabled optimizations like Xiaomi's MIUI, significantly enhancing user experience for a vast audience. Without US sanctions, Huawei might not have had the motivation to invest heavily in HarmonyOS.

Open and closed source coexist, each with its own issues of transparency, compliance, and security. Open source fosters exploratory work, while closed source drives productization and commercialization.

02 Key Differences Between Open Sourcing Large Models and Open Source Software

Open sourcing large models differs significantly from traditional software open sourcing.

Software open sourcing involves making source code public, allowing users to understand how and why it works, and modify or add features. Large models, however, are still largely black boxes with many unexplained phenomena. Thus, the industry has proposed various dimensions for open sourcing large models, ranging from weights, datasets, code, and training processes to the inclusion of frameworks.

Interestingly, only a few companies or institutions have open sourced all parts of their large models simultaneously, such as IBM's recently open-sourced Granite large language model. Others, like the Institute for AI Industry Research (BAAI) and Musk's xAI, have open sourced weights and datasets. According to Lin Yonghua from BAAI, the latest open-sourced datasets are categorized into general open-source instruction fine-tuning datasets and industry-specific datasets covering 18 industries.

Lin Lvqiang from Zero-One AI notes that the industry consensus is to at least open source weights along with some inference code. This allows others to utilize the open-sourced large model, making the current definition somewhat akin to Microsoft's 'freeware' concept. Therefore, companies like Google refer to it as open-sourcing weights rather than large models.

Weights are crucial in open-sourced large models as they represent the model's intelligence, comprising billions of numbers related to data processing, predictions, and text generation.

Unlike traditional software, corporate attitudes towards open sourcing large models have subtly shifted due to their high costs. Many companies have become conservative, reflecting the reality that each company's open sourcing strategy aligns with its business goals. Even if cores are open sourced, most engineers and companies lack the resources to reproduce them.

Industry insiders highlight three core differences:

1. Transparency levels differ significantly. Open source software code explains everything, forming a governance system, whereas large models remain black boxes.

2. The nature of large model communities has changed. Due to computational power limitations, most engineers cannot directly contribute to large models, making many communities one-way.

3. Open source strategies for large model companies have evolved. Significant investments in training large models have led to varied choices in open source strategies, particularly regarding licenses or information disclosure.

The goal of open source is not to surpass closed source but to promote technological equity and transparency, preventing technology from becoming a tool for a few to profit. For AI moving towards AGI, open source aligns more with the interests of humanity.

There is debate on which is safer: open or closed source large models. The open source faction argues that in a closed environment, no one knows if it's being monitored, whereas open source allows the community to inspect security issues. However, some believe open source can bring unexpected problems, such as misuse by "terrorists," highlighting the need for balance.

03 Choosing Between Open and Closed Source for Commercialization

Zhang Peng from BAAI believes that for experimentation, open source models are suitable. However, for commercial applications, most prefer the commercial version due to guarantees and better services, reflecting the industry's view, especially in the ToB sector.

Whether open or closed source, the priority is to consider whether customers need local deployment. Many customers, both domestic and foreign, require autonomous and controllable systems. Using a public cloud model like OpenAI necessitates considering data exposure.

Notably, deploying closed source large models locally requires permission, and open-sourced models must follow agreements and compliance requirements. From a technical perspective, deploying many open source models locally is feasible and offers the advantage of easy fine-tuning for industry-specific models.

The demand for large models is inherently open and mixed, needing to be tailored to specific industries and scenarios. For instance, in the government sector, where data security is paramount, private deployment is necessary, making open source models more flexible and convenient.

Conclusion

DeepSeek's open sourcing prompted Baidu's open source strategy, marking a fundamental shift in the large model industry. As Baidu founder Robin Li stated at the World Governments Summit 2025 in Dubai, "Innovation cannot be planned. You create an environment conducive to innovation."

Whether open or closed source, for large corporations or startups, the best approach to technological changes is to adapt and innovate continuously.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.