GPT-5.2 Now Available!

12/12 2025 340

OpenAI has just unveiled GPT-5.2, a version specifically designed for professional, knowledge-intensive tasks.

Enterprise users report that the AI saves them between 40 to 60 minutes each day, with heavy users saving over 10 hours weekly.

GPT-5.2 shines in creating spreadsheets, designing presentations, writing code, identifying images, comprehending lengthy text contexts, utilizing tools, and managing complex, multi-step projects.

In the benchmark test GDPval, GPT-5.2 outperformed industry experts in tasks requiring explicit knowledge across 44 professions.

Users have noted GPT-5.2's exceptional long-term reasoning and tool-calling abilities, particularly in agent data science and document analysis tasks.

Additionally, GPT-5.2 has reached industry-leading standards in agent coding, providing measurable enhancements in interactive programming, code review, and defect localization.

Starting today, GPT-5.2 Instant, Thinking, and Pro versions will be available, with the API now accessible to all developers.

GPT-5.2 Thinking is especially well-suited for real-world and professional applications. In the GDPval evaluation (spanning 44 professions and assessing tasks involving explicit knowledge), GPT-5.2 Thinking matched or outperformed top industry professionals in 70.9% of comparative tasks. These tasks included creating presentations, spreadsheets, and other professional outputs.

GPT-5.2 Thinking's speed in GDPval tasks is over 11 times faster than human experts, at less than 1% of the cost. This suggests that, with human oversight, GPT-5.2 can effectively support professional tasks.

A GDPval judge remarked, "The layout design is impressive, and the suggestions for the two deliverables are spot-on, though there are still minor errors that need correction."

Moreover, in an internal spreadsheet modeling task benchmark for junior investment banking analysts (e.g., creating formatted, fully-referenced three-statement models for Fortune 500 companies or constructing leveraged buyout models for privatization deals), GPT-5.2 Thinking improved its average task score by 9.3% compared to GPT-5.1, increasing from 59.1% to 68.4%.

Comparative analysis shows that spreadsheets and slides generated by GPT-5.2 Thinking are notably more complex and better formatted.

GPT-5.2 Thinking scored 55.6% in the SWE-bench Pro test. SWE-bench Pro encompasses four languages and is designed to be more resistant to contamination, challenging, diverse, and reflective of real-world industrial scenarios.

In the SWEvbench Verified test, GPT-5.2 Thinking achieved a record-high score of 80%.

This indicates that the model can more reliably debug code in production environments, implement functional requirements, refactor large codebases, and complete end-to-end repairs with minimal human intervention.

GPT-5.2 Thinking also surpasses GPT-5.1 Thinking in front-end software engineering. Early testers found it to be more adept at front-end development and handling complex or non-traditional UI tasks (especially those involving 3D elements).

Jeff Wang, CEO of Windsurf, stated, "GPT-5.2 marks the most significant advancement in agent coding since GPT-5 and is the leading coding model in its price range."

GPT-5.2 Thinking has a lower hallucination rate than GPT-5.1 Thinking. In a set of anonymized queries from ChatGPT, the frequency of incorrect responses decreased by 38%.

In deep document analysis, GPT-5.2 Thinking's accuracy is significantly higher than GPT-5.1 Thinking, achieving near-perfect accuracy in the 4-needle MRCR evaluation variant (up to 256k Tokens).

GPT-5.2 Thinking scored 98.7% in the Tau2 bench Telecom test, demonstrating its ability to reliably use tools in extended, multi-round tasks.

In latency-sensitive scenarios, GPT-5.2 Thinking also shows significant improvements in reasoning, outperforming GPT-5.1 and GPT-4.1 in 'effort=none' mode.

This means end-to-end workflows will be more reliable, such as managing customer support cases, extracting data from multiple systems, performing analysis, and generating final results with fewer interruptions between steps.

In the Google Question Answering benchmark test GPQA Diamond, GPT-5.2 Pro scored 93.2%, with GPT-5.2 Thinking close behind at 92.4%.

In the expert-level mathematics evaluation FrontierMath (Tier 1–3), GPT-5.2 Thinking solved 40.3% of the problems.

In the ARC-AGI-1 (Verified) benchmark test, which measures general reasoning ability, GPT-5.2 became the first model to surpass the 90% threshold, significantly improving from last year's o3-preview score of 87% while reducing the cost to achieve this performance by approximately 390 times.

In the more challenging and fluid reasoning-focused ARC-AGI-2 (Verified), GPT-5.2 Thinking set a new record for chain-of-thought models with a score of 52.9%; GPT-5.2 Pro performed even better, reaching 54.2%, further expanding the model's reasoning capabilities in tackling entirely new abstract problems.

GPT-5.2 is priced at $1.75 per million input Tokens and $14 per million output Tokens.

Overall, GPT-5.2 has seen significant improvements in general intelligence, long-context understanding, agent tool calling, and vision, making it ideal for executing complex, real-world tasks from start to finish.

References:

https://openai.com/zh-Hans-CN/index/introducing-gpt-5-2/

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.