03/24 2026
386
When it comes to the most significant challenge in training large AI models today, the scarcity of high-quality data tops the list. Regardless of how advanced the model or sophisticated the algorithm, without ample and high-quality "data fuel," the AI engine simply cannot reach its full potential.
This industry-wide pain point is now being addressed systematically at the national level. On March 24, at a press conference held by the State Council Information Office regarding the 9th Digital China Summit, Liu Liehong, a member of the Party Leadership Group of the National Development and Reform Commission and Director of the National Data Bureau, sent a clear message: China will thoroughly implement a new round of action plans for high-quality dataset construction. The core objective is to create truly "AI-ready" datasets, providing a solid data foundation for AI’s innovative development.
Transforming the Supply Side of "AI Fuel"
Director Liu Liehong highlighted that the new round of action plans will revolve around six major initiatives, forming a complete closed loop from infrastructure construction to value realization.
"Strengthening Foundations and Expanding Capacity" and "Annotation Breakthroughs" lay the groundwork. The former aims to expand the scale and coverage of datasets to tackle fragmentation issues, while the latter focuses on overcoming bottlenecks in professional data annotation, especially in deep fields such as healthcare, law, and scientific research. This ensures that data is not just "available" but also "usable."
"Quality Improvement and Efficiency Enhancement" and "Application Empowerment" represent key transformations. Their essence is to ensure that data can effectively boost model performance. Guided by real-world scenario demands, these initiatives closely integrate dataset construction with specific industry pain points, such as industrial manufacturing, smart finance, and autonomous driving, avoiding "construction for construction's sake."
"Management Services" and "Value Release" establish long-term mechanisms. China will accelerate the construction of a unified dataset management platform and commit to nurturing a healthy data commerce ecosystem and market rules. The ultimate goal is to fully unlock data’s value through circulation and utilization.
These six initiatives point to a clear objective: transforming data from raw, unrefined "ore" into polished, ready-to-use "high-grade fuel."
Director Liu further elaborated on the three levels of "AI-Ready": technical feasibility, practical convenience, and quality assurance. Ultimately, only datasets that achieve "quality assurance" can effectively enhance model performance—these are truly "high-quality."
Token Call Volume Skyrockets a Thousandfold
"Average daily Token call volume" is a key metric for measuring large model usage activity. Data reveals that this figure in China surged from approximately 100 billion in early 2024 to 100 trillion by the end of 2025, exceeding 140 trillion in March this year—a thousandfold increase in just two years.
This numerical leap signals new commercial transformations. As the basic unit of information processing for large models, Tokens are giving rise to a new, measurable, and priceable commercial logic. Director Liu disclosed that some model companies have set records like "20 days of revenue surpassing last year's total," a snapshot of the smart economy's current acceleration.
This also indicates that a flywheel of "high-quality data supply → enhanced model capabilities → commercial value realization → feedback into data construction" has begun to spin.
Since its pilot launch in August 2025, the high-quality dataset construction action plan has been rolled out across 25 provinces, covering 23 key and frontier fields. From traditional industrial manufacturing and financial services to cutting-edge areas like the low-altitude economy and embodied AI, the first batch of 140 pilot projects explores models and standards for high-quality dataset construction in specific scenarios.
Using scenarios as a guide is crucial to ensuring that data is "usable and used well." Pilot projects drive AI from being "ubiquitous" to being "in-depth," penetrating from general dialogue into the intricate details of various industries. For instance, in healthcare, a high-quality, multimodal medical imaging dataset can directly aid in developing more accurate auxiliary diagnosis models; in meteorology, rich spatiotemporal data forms the cornerstone for training more powerful climate prediction AIs.
The National Data Bureau has designated 2026 as the "Year of Data Value Release," its determination evident. The goal is to systematically transform China's vast data resource advantages into decisive advantages for AI industry development through data supply system reforms.
When every industry can easily access "AI-Ready" data in its field, the "AI+" initiative will deliver tangible results and productivity gains.
As Kevin Kelly said, "The future is already here—it's just not evenly distributed." Clearly, the flow and sharing of high-quality data are the strongest light waves to even out distribution and illuminate every industry with the future.