Big model hybrid cloud, the critical moment for systematic innovation breakthrough

08/16 2024 518

Written by Intelligent Relativity

Signals from Amazon and Anthropic, Microsoft and OpenAI, and Huawei Cloud's concept of big model hybrid cloud all indicate that the trend of deep integration between cloud computing and big models is a consensus in the industry.

Currently, driven by generative AI, more and more companies are accelerating the deployment and application of big model technology on cloud computing platforms, which in turn promotes the iteration and upgrade of the cloud computing industry.

From the underlying infrastructure to the middle-tier platform services and finally to the top-level scenario applications, cloud computing is undergoing significant changes. Taking Amazon Web Services (AWS) as an example, as a global cloud computing giant, its cloud products are fully preparing the technical groundwork for the deployment and application of big models.

I. At the bottom layer, build infrastructure represented by GPUs and self-developed chips for training base models and running inference in production environments.

II. At the middle layer, introduce Amazon Bedrock, a fully managed service that allows users to easily access rigorously screened third-party branded big models such as AI21 Labs, Anthropic, Cohere, Meta, Mistral AI, Stability AI, and Amazon's own branded big model, Amazon Titan.

III. At the top layer, build out-of-the-box generative AI applications such as Amazon Q based on base models, enabling users to quickly get started with generative AI without any professional knowledge.

From this perspective, the development of generative AI is changing the industrial landscape of cloud computing, and competition in the entire market has reached an unprecedented stage of transformation.

Generative AI, Reconstructing the Cloud

Local cloud vendors share similar views and actions as AWS.

Hou Zhenyu, Vice President of Baidu Group, previously proposed that big models will drive innovation in cloud computing and reshape its industrial landscape, driving the reconstruction of underlying IT infrastructure and bringing changes to upper-layer application development models.

Huawei Cloud's Stack 8.3, the first big model hybrid cloud in the domestic industry, combines hybrid cloud with big models, specifically reflecting the new direction of cloud service development. By providing a complete AI production chain including computing platforms, cloud services, development kits, and professional services, it connects the technology path from infrastructure to application development, helping government and enterprise customers establish dedicated big model capabilities in one stop, becoming an advanced capability of cloud services.

This advanced capability, based on the technical development of big model hybrid clouds, is essentially a specialization of application scenarios.

For example, the continued application and innovation of cloud-edge collaboration technology on big model hybrid clouds aims to address the real-time edge inference needs exposed when AI big models are gradually applied to industrial scenarios such as coal mine production, power inspection, and industrial quality inspection.

On the one hand, industrial scenarios have more demand for AI applications than other scenarios, and efficiency requirements are also higher, necessitating effective edge management and application for low latency, high agility, and widespread deployment. On the other hand, as AI big models move towards industrial scenarios, their version iterations and functional upgrades inevitably enter a phase of learning and upgrading while in use.

As such, cloud-edge collaboration becomes crucial, balancing computing resources, optimizing data flow, improving processing efficiency, and enhancing service quality to support diverse and demanding AI application scenarios.

Huawei Cloud's cloud-edge collaboration solution, based on the hybrid cloud's ModelArts AI development platform and Pangu big model, forms a one-stop scenario-based model training workflow. By collecting original production sample data and suspicious sample data generated during model operation, the workflow efficiently trains models and uniformly manages model versions, effectively enabling AI models to learn, iterate quickly, and continuously upgrade while in use, adapting to new operating conditions and data changes.

Regarding data issues, data storage hinders the efficiency of AI big model training. As the number of parameters in AI big models increases, the scale of training clusters also expands, and traditional storage architectures struggle to meet the demands of fast reading and writing of ultra-large-scale AI cluster data, rapid Checkpoint storage, and rapid fault recovery.

Faced with these specific needs, Huawei Cloud had to seek breakthroughs in storage architecture, relying on an innovative three-tier architecture comprising OBS data lakes, SFS Turbo high-performance parallel file systems, and AI Turbo acceleration to systematically address the challenges of big model training scenarios.

In summary, to address various scenario issues related to AI big models, cloud services require comprehensive innovation from underlying infrastructure to top-level applications, proposing corresponding solutions to further drive the development of AI big models. The technical challenges facing big model hybrid clouds are numerous, and in recent years, Huawei Cloud has been systematically innovating to address key bottlenecks in big model applications. Its ten innovative technologies for big model hybrid clouds include not only cloud-edge collaboration and data storage but also enhanced AI networks, operator acceleration, unified data encoding, diverse computing power scheduling, and more.

In fact, regardless of Huawei Cloud's systematic concept of big model hybrid clouds, the industry has reached a consensus on integrating cloud and big models and is committed to providing various technical solutions to address big model training, inference, and application needs in the cloud.

For instance, JD Cloud has launched a complete set of tools for big models, including core products such as the Yanxi AI development computing platform supporting big model applications, vector databases, the mixed multi-cloud operating system Cloud Ark, the high-performance storage platform Cloud Sea, and the software and hardware integrated virtualization engine Jingang, which promote the industrial development of big models based on cloud computing.

Systematic Breakthrough of Big Model Hybrid Clouds

Facing the era of big models today, vendors such as Amazon Web Services, Huawei Cloud, JD Cloud, and Baidu Intelligent Cloud are all committed to creating comprehensive technical solutions that encompass the entire range of processes and services from the underlying layer to the middle layer and top layer, enabling continuous deployment and application of big models in the cloud and unlocking their value.

The concept of big model hybrid clouds further elevates cloud vendors' comprehensive solutions to a more systematic stage. This systematic development not only requires cloud vendors to focus technically but also to explore a wide range of scenarios.

"For governments, their concerns may not be simply about saving one customer service representative or operation and maintenance personnel but more about promoting the development of the entire industry through big models," said Wu Bingkun, founder and CEO of Zhongshu Xinke, in an interview with the media.

Based on the development trend of the cloud service industry, this systematic development of the industry inherently requires systematic upgrades in cloud technology – a manifestation of a broader perspective. In this regard, compared to traditional multi-cloud strategies, the proposal of big model hybrid clouds better showcases the broader perspective of future cloud services.

Multi-cloud strategies focus on using services from multiple cloud service providers to avoid vendor lock-in while optimizing costs or leveraging the strengths of each provider. Although big model hybrid clouds may also involve multiple cloud environments, their core lies in constructing a highly integrated and optimized environment for large-scale data processing and AI model operation, not simply to decentralize service sources but to achieve specific technical and business goals.

For example, Huawei Cloud Stack's multi-cloud collaboration architecture allows industry big models to be trained on public clouds while being fine-tuned with enterprise local data in hybrid clouds and then performing inference on edge clouds to meet computing needs in different scenarios.

The essence of this approach is not to decentralize "clouds" but to leverage native hybrid cloud capabilities, enabling users to extend big models from on-premises to edge and public clouds, achieving cross-cloud deployment across all scenarios and optimizing application efficiency, security performance, and other outcomes.

In summary, big model hybrid clouds are optimized hybrid cloud architectures for specific domains, particularly those requiring large-scale data processing and complex AI models. They integrate the elasticity of public clouds, the security controls of private clouds, and possibly multi-cloud services to meet the unique needs of high-performance computing and AI applications.

The technical systematicness achieved through this integration will foster the systematic development of the industry in the coming years, addressing concerns beyond simply saving customer service or operation and maintenance personnel and more broadly driving industry development through big models.

Based on this systematic development, several notable trends will emerge in the development of big models + hybrid clouds.

I. In computing power scheduling, the training and inference of big models typically require substantial computing resources. As model sizes grow, so does the demand for computing power. Computing power scheduling and optimization technologies in hybrid cloud environments will continue to evolve to support more efficient big model training and inference.

II. In cloud-edge collaboration, as Internet of Things (IoT) devices become more prevalent, edge computing gains importance. Hybrid cloud architectures will support closer cloud-edge collaboration, enabling real-time inference of big models at the edge, reducing latency, and improving response speed.

III. In infrastructure, AI-Native storage and networking technologies will continue to evolve to support more efficient model training and inference processes. For example, high-performance storage supports multi-level caching mechanisms, enabling Checkpoint access in seconds and training failure recovery in minutes.

IV. In model applications, enterprises can fine-tune pre-trained big models with local data in hybrid cloud environments to adapt to specific business scenarios while maintaining data privacy.

V. In business deployment, different industries (such as finance, healthcare, and manufacturing) will leverage big models on hybrid clouds to address specific business challenges, fostering business innovation and process automation. Meanwhile, big models are more easily scalable in hybrid cloud environments, particularly in widely distributed enterprises and industries such as energy, transportation, and manufacturing.

VI. In ecosystem development, the combination of big models and hybrid clouds will attract more ecological partners to join in developing solutions and services, expanding the entire ecosystem. As the use of big model hybrid clouds increases, relevant standards and protocols will gradually be formulated and improved to enhance interoperability and compatibility among different systems.

Final Thoughts

Today, big models possess hundreds of millions or even billions of parameters, providing unprecedented computational scale and complexity for the development of generative AI. More parameters mean models can learn deeper and finer data features, generating higher-quality and more diverse content in fields such as text generation, image synthesis, and audio creation, significantly promoting the high-quality development of generative AI.

This capability is crucial for industrial transformation and upgrading in the future. Cloud computing, as an underlying technology for industrial upgrading, collaborates with generative AI to achieve this goal in a more comprehensive and holistic manner. However, how cloud computing integrates with the big models behind generative AI will be a crucial issue in this process.

Cloud vendors such as Amazon Web Services, Huawei Cloud, JD Cloud, and Baidu Intelligent Cloud all want to ride the "free ride" of generative AI, but doing so requires careful consideration.

*All images in this article are sourced from the internet

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.