10/28 2024 547
Author | RingBell Talk
The wave of enthusiasm has been surging for more than a year, driving the development of the large model track deeper and more substantial.
Nowadays, more and more industry insiders are starting to look back on the rapid development process of large models and reflect on how China's large model development can achieve further breakthroughs in the future, building upon existing industrial progress.
On the one hand, AIGC is driving a continuous increase in overall computing power demand. It is necessary to effectively address the uncertainty of resource support and further enhance the stability of computing power supply.
On the other hand, beyond computing power, large model innovation also faces a critical shift from 'quantitative expansion' to 'qualitative improvement.' As one viewpoint within the industry puts it, more resources should be directed towards exploring the integration of large models with various industries. However, how to achieve this integration and realize 'qualitative improvement' currently necessitates further exploration of a systematic practical methodology to reduce uncertainties in actual implementation.
The superposition of uncertainties is the reality faced by domestic large model innovation, and a breakthrough is urgently needed.
Large model innovation relies on 'the weather'; how to achieve stable and sustainable development in an era of uncertainties
What is the essence of uncertainty? The answer lies in the loss of initiative over 'resources' and 'capabilities,' rendering one unable to control their own destiny.
In terms of 'resources,' only powerful computing power can meet the training demands of larger parameter models, continually enhancing their autonomous learning and generalization capabilities. However, as previously mentioned, while our current computing power sources are relatively diverse, supply remains unstable, necessitating further improvements in stability and certainty upon the achievements of existing industrial chain ecosystem construction.
Yet, this is only the commonly observed aspect of the industry. In reality, the loss of initiative in 'capabilities' is even more deeply concealed.
From a static perspective, when the AI innovation technology system, from hardware to software, is designed by others, being merely a 'user' rather than a 'system builder' makes it difficult to surpass others in terms of system understanding and application. There are significant doubts about whether the rules and systems designed by others can meet the specific development needs of developers and enterprises.
From a dynamic perspective, large models are evolving rapidly, and each technical link requires adaptation. However, there is no controllable continuity regarding when and how the rules and systems designed by others will evolve.
Ultimately, large model innovation relies on 'the weather,' fraught with uncertainties.
To understand its ultimate consequences, we must consider the perspective of industrial competition.
The advantage of foreign AI computing lies not only in underlying hardware but, as recently stated by industry leaders, also in the integrated software and hardware ecosystem they have built over a decade. A notable feature of this system is its ability to rapidly establish the 'flywheel effect' of machine learning, where continuous improvement occurs throughout the entire process of data, hardware, algorithms, training, and inference, enabling rapid feedback from final applications to continuously nourish and strengthen data and model parameters, forming a closed loop of self-reinforcement.
Recently, Elon Musk's team constructed the Memphis supercluster with 100,000 GPUs in just 19 days, aimed at rapidly enhancing AI inference and training capabilities, essentially accelerating this 'flywheel effect.'
Originally, domestic large model innovation already possessed unique advantages in terms of vast markets and data support. However, from the perspective of the 'flywheel effect' of machine learning, we still lack such an integrated software and hardware ecosystem that accelerates this effect, relying instead on passive, fragmented, and unsystematic 'borrowing.' Consequently, it has been difficult to gain a competitive voice.
Additionally, Zheng Yongnian, Professor at The Chinese University of Hong Kong and Dean of the Institute of Frontier International Studies, has stated that the direction of China's large model innovation differs from that of OpenAI, which should be viewed as differentiation rather than a gap. This differentiation refers to building our system and path and rapidly implementing large model applications rather than focusing on algorithmic capabilities.
Therefore, it becomes particularly important to establish and strengthen our own software and hardware ecosystem.
Encouragingly, the development of some domestic computing industry ecosystems is moving in this direction – not merely aiming for parity but striving for further openness in ecosystems. For instance, Ascend, introduced in 2018, initially focused on AI computing power infrastructure. Since March of this year, Ascend AI's basic software and hardware have undergone comprehensive upgrades centered on the development of operators, models, and applications, marking a new acceleration in Ascend native innovation.
During the Huawei Connect conference, the Ascend Industry Summit was successfully held. In addition to announcing numerous industry solutions jointly launched with industry leaders, testimonials from partners and sharing of Ascend native development practices by guest speakers showcased Ascend AI's native innovation technologies and industrial achievements over the past period.
This native development system, with a more open stance, is comprehensively addressing the 'uncertainties' faced by domestic large model innovation. While the process may seem complex from a technical perspective, with many logics and systems to be dissected and presented, it is clear and straightforward from the perspective of how developers and partners gradually address pain points and foster large model innovation.
Ascend Native: Embracing Certainty in the Era of Large Models with Dual Ecosystems of Technology and Business
Large model innovation is a highly complex and demanding field, fraught with numerous challenges throughout the process. For a developer or innovative enterprise, engaging in large model innovation is akin to a quest filled with obstacles, requiring perseverance at every turn. Any failure at any stage prevents reaching the destination.
Currently, domestic large model innovation faces various pain points and challenges at each stage. While native innovation has emerged, whether it can build a software and hardware ecosystem that garners competitive discourse within the industry depends on effectively addressing developers' issues – these stages essentially serve as tests for Ascend native innovation in various dimensions.
First Stage: Preparation of Technical Resources
Developers face issues beyond just the development phase. Securing computing power, various OSs, firmware, complete machines, hardware platforms, and more – the first step in large model innovation is preparing various technical resources.
If the instability of computing power resource supply (especially for high-performance computing power resources) can be considered an 'inherent deficiency' of the industrial environment, then the compatibility issues existing within the current software and hardware system constitute an 'acquired deformity,' jointly presenting developers with a daunting challenge from the outset of resource preparation.
Specifically, with the development of diverse heterogeneous computing power, compatibility issues among different OSs, firmware, complete machines, and hardware platforms are prominent. Some computing power service providers' resource procurement is influenced by the hardware ecosystems of various vendors, leading to tight coupling between applications and hardware, making migration difficult. For instance, some hardware vendors construct relatively closed ecosystems to safeguard their interests, restricting access by other vendors or third-party developers.
This closed nature forces application developers to optimize and customize for specific vendor hardware, exacerbating the 'tight coupling' between applications and hardware. ISVs can only play by established rules, limiting development flexibility and efficiency and ultimately impacting the effectiveness of AI application deployment in real-world scenarios.
Thus, the first criterion for assessing our native innovation emerges: Can it address this pain point?
With heterogeneous computing architecture CANN, the all-scenario AI framework MindSpore, distributed acceleration suite MindSpeed, inference engine MindIE, full-process development toolchain MindStudio, and CCAE cluster self-intelligent engine, Ascend AI continuously promotes hierarchical openness and ecosystem compatibility, providing compatibility development support at the tool level to truly break the rules and enable efficient and flexible development.
'Native innovation' addresses developers' concerns from their perspective. Ascend initially shields developers from uncertainties in the procurement process (such as unavailability, insufficient quantities, and incompatibility) and leverages its unique advantage of a comprehensive computing power supply chain, achieving software and hardware decoupling through an open ecosystem, thereby ensuring flexibility and efficiency in the application development process.
Ascend has met this criterion.
Second Stage: Acquiring Technical Depth
Even if an enterprise has prepared technical resources, it is still insufficient for large model innovation.
Industry insiders have compared a well-known domestic large model product with GPT-4 and found numerous technical similarities, such as the use of multi-stage training strategies. It is widely recognized that 'large models possess no pure technical barriers.' Especially with the emergence of MoE models, the focus of competition has shifted to engineering implementation, like large-scale distributed training. For developers, this means acquiring sufficiently deep technical resources across 'operators, training, and inference' is crucial for determining ultimate business success.
Thus, the second criterion for assessing native innovation emerges: Can it 'lower barriers to accessing technical resources?'
Ascend has been rapidly iterating and deepening its capabilities at various levels.
For instance, CANN was further opened up in May this year, with over a dozen general-purpose fusion operators like NB 2.0 added during HC 2024, essentially covering developers' needs. MindSpore 2.4 has evolved into a natively compatible super-node architecture, further enhancing model training efficiency.
In terms of training, Ascend introduced the MindSpeed distributed acceleration suite this year, targeting large model training acceleration workflows. It provides over 100 pre-trained models, over 60 acceleration algorithms and operators, and over a dozen fine-tuning algorithms, reducing distributed development costs from pre-training to incremental training and improving training performance by over 30%.
Moreover, in the increasingly crucial inference stage, the Ascend inference engine MindIE, launched in March this year, supports adaptive PD separation deployment, significantly enhancing inference efficiency and experience. As planned, it will evolve to cater to scenarios like trillion-parameter MoE inference and million-length sequences in the future.
Native innovation involves not only moving beyond the role of a 'follower' but also actively breaking through various technical challenges to unlock the ceiling of industrial development.
Ascend has passed this stage, but to achieve higher scores in the future, continuous in-depth efforts are required.
Third Stage: Enterprise Growth and Business Success
Mastering technology alone does not guarantee success. In reality, countless enterprises struggle for survival, development, and market penetration. Sustained financial support, capability catch-up, and application Fruit Transformation … These represent the final hurdles for developers. Large model innovation is not merely a technical ideal; it confronts numerous operational and developmental challenges.
For native innovation, the third assessment criterion is also clear: Can it establish a broad and multi-faceted business ecosystem beyond technical support to aid developers and partners in achieving success?
In this regard, Ascend has consistently aided industrial partners through a series of initiatives over the years.
For talent development, Ascend collaborates with universities and research institutions to foster a large number of outstanding native talents through industry-education integration and industry-research integration.
For skill enhancement, Ascend continually enriches its community ecosystem, offering mentorship and training programs to reach a wider audience. Multi-level training, ranging from general education to targeted empowerment, enables developers to comprehensively master native skills. Huawei also invests RMB 1 billion annually in nurturing an ecosystem that supports Kunpeng and Ascend native development, covering over 80% of computing scenarios.
For business success, Ascend continuously innovates its business models and actively introduces industrial ecological resources for connectivity. It also provides incentives like computing power, NRE, MDF, and community contributions to promote mutually beneficial business outcomes.
With talent security, business skills no longer a barrier, and full support throughout industrial transformation, developers can overcome the final hurdle with ease. Ascend has submitted an excellent answer, though the complexity of the business ecosystem suggests that this test will persist.
Native Innovation Penetrates Diverse Industries
The computing ecosystem exhibits a typical 'threshold' phenomenon, where success is achieved only after crossing the participation threshold for industrial applications and entering a stage of self-reinforcement without significant investment.
For developers benefiting from Ascend native innovation, they must also anticipate crossing this 'threshold.'
The good news is that increasing practices and data indicate that the path of native innovation is becoming viable.
To date, Ascend AI has cultivated over 30,000 native development contributors, with over 50 ecological partners releasing native development outcomes based on Ascend.
Moreover, practical cases span diverse sectors.
In fundamental large model innovation, iFLYTEK's Feixing 1 platform, built on the Ascend AI platform, efficiently supports the training of iFLYTEK's Spark large model, surpassing most competitors in the market in terms of training and inference performance.
In enterprise digital and intelligent integration services, DingTalk and Ascend have collaborated to introduce an AI all-in-one machine, facilitating intelligence in R&D production, product sales, and other business scenarios. It has already incubated various AI applications like smart Q&A and smart business trips.
In frontline production and operation scenarios, Elite Intelligence has leveraged Ascend native capabilities to create a large model application for mines – the 'Coal Mine Brain' safety management and control platform solution, safeguarding over 300,000 miners in over 2,000 coal mines nationwide.
Progressive layers of enhancement are reinforcing each other. As native innovation ascends the slope, its industrial implementation is becoming more colorful and feasible, marking a promising start on the journey into diverse industries.
*All images in this article are sourced from the internet