10/09 2024 566
Recently, another breakthrough has been made in the field of artificial intelligence, as OpenAI officially launched its latest masterpiece - model o1. The highlight of this model lies in its integration of reinforcement learning (RL) training methods and the adoption of a more in-depth internal chain of thought (CoT) technology during model reasoning. This innovative combination has enabled o1 to achieve significant performance improvements in disciplines such as physics, chemistry, and mathematics, which require strong logical reasoning abilities.
OpenAI's achievement undoubtedly sets a new benchmark in the field of artificial intelligence. The RL+CoT paradigm not only significantly enhances the model's strong logical reasoning capabilities but also provides new insights for subsequent research and development directions for large model manufacturers both domestically and internationally. It is foreseeable that in the future, along the new route of RL+CoT, major manufacturers will continue to iterate their models, pushing artificial intelligence technology to new heights.
01. Shift in focus from pre-training to post-training and reasoning
In 2020, OpenAI's proposed Scaling Law laid an important theoretical foundation for the iteration of large models. Prior to the release of the o1 model, the Scaling Law primarily focused on the pre-training phase, enhancing the model's intelligence by increasing the number of model parameters, expanding the training dataset, and improving computational power. However, with the launch of the o1 model, OpenAI demonstrated that by introducing reinforcement learning (RL) during the post-training phase and incorporating longer internal chains of thought (CoT, implying more computational steps) during reasoning, significant performance improvements can also be achieved, building upon the pre-training Scaling Law. This indicates that the Scaling Law is not limited to the pre-training phase but can continue to play a role in the post-training and reasoning phases of large models.
Specifically, the o1 model has seen significant improvements in programming, mathematics, and science. In Codeforces programming competitions, the o1 model outperformed 83% of professionals. In mathematics competitions, taking AIME 2024 as an example, GPT-4o could only solve an average of 12% of the problems, whereas the o1 model solved an average of 74%, reaching 83% with a consensus of 64 samples. In terms of scientific capabilities, for doctoral-level science questions (GPQA Diamond), GPT-4o achieved an accuracy of 56.1%, compared to 69.7% for human experts, while the o1 model surpassed human expertise with an accuracy of 78%.
The introduction of the o1 model provides a new reference paradigm for the training and iteration of the next generation of large models - RL+CoT. Qualitatively, RL+CoT requires more computational power for training and reasoning. Prior to the o1 model, models such as GPT-4o primarily underwent pre-training and post-training (RLHF based on human feedback), with reasoning involving single-shot or short CoT. However, the o1 model may not see significant changes in computational power during the pre-training phase, primarily aiming to ensure the model's general capabilities. During the post-training phase, the introduction of RL requires the model to iteratively optimize its outputs through continuous search, potentially increasing computational power consumption. During the reasoning phase, the o1 model learns long internal CoT through RL training, significantly increasing the number of tokens required for reasoning, thereby also increasing computational power requirements compared to previous single-shot or short CoT reasoning.
In summary, under the new large model training paradigm, qualitatively, models require more computational power for training and reasoning to support performance improvements.
02. Computational power and application endpoints deserve attention
Currently, upgraded AI large models are primarily focused on enhancing logical reasoning capabilities, achieving more coherent and structured responses by implementing a complete step-by-step reasoning process. This upgrade signals the emergence of an initial framework for Agent Network, potentially benefiting B-end users requiring more rigorous logical processing first. Additionally, as the system's ability to handle edge scenarios in complex real-world environments improves, its application scope and effectiveness will also enhance.
HT Securities analysis points out that the RL+CoT training paradigm not only continues the Scaling Law from the pre-training phase but also extends it to the post-training and reasoning phases. While computational power remains relatively stable during pre-training, RL post-training and CoT reasoning will spur new demands for computational power. The specific scale of these demands will depend on the balance between the depth of RL search, the intrinsic length of CoT, and reasoning effectiveness. Since RL+CoT essentially sets the basic framework for the next generation of model iterations for other model developers in the industry, this paradigm is expected to be widely adopted, driving a significant increase in training computational power demands. In this context, investors are advised to pay attention to companies related to computational power, such as Broadcom, Huelian Electronics, Foxconn, etc.
Moreover, although the o1 model currently primarily addresses reasoning problems in mathematics, coding, and science, its core lies in building the model's CoT capabilities. As a crucial means of reasoning, CoT has the potential to be applied in combination with more private user data on the client side. Apple's AI Agent is considered an ideal computing platform for realizing CoT capabilities. Therefore, investors are advised to focus on companies related to the Apple supply chain, including Luxshare Precision, Pegatron, Crystal Optoelectronics, Goertek, Lens Technology, Dongshan Precision, JCET, etc.
Finally, the strong logical reasoning capabilities demonstrated by the o1 model are expected to extend to broader and more general fields, with significant improvements in reasoning performance compared to previous models. This implies that AI applications and Agents based on the o1 and subsequent large models are poised to achieve substantial capability enhancements. Consequently, investors are advised to pay attention to core AI application enterprises such as Microsoft, Adobe, Kingsoft Office, Weaver Network, Hive Network, etc.