Musk Unveils New Model: The Cost-Effectiveness Champion! Mimics Gemini 2.5 Performance at a Fraction of the Cost

11/17 2025 379

Just the day before, xAI introduced Grok 4 Fast. Musk took to his X platform to announce: "A 2M context window!" Beyond supporting this extensive context window, the new model delivers performance on par with Gemini 2.5, yet at roughly one-tenth the cost, truly living up to its reputation as the cost-effectiveness king.

xAI officially declared this as their latest breakthrough in economically efficient inference models. Grok 4 Fast builds upon the insights gained from Grok 4, showcasing top-tier performance in both enterprise and consumer settings, all while maintaining exceptional token efficiency.

This model redefines the possibilities of smaller, swifter AI, granting a wider array of users and developers access to high-quality inference. Grok 4 Fast boasts state-of-the-art (SOTA) cost-effectiveness, advanced web and X search functionalities, a 2M token context window, and a unified architecture that seamlessly integrates reasoning and non-reasoning modes within a single model framework.

Grok 4 Fast ventures into uncharted territory in cost-effective intelligence, outperforming Grok 3 Mini in inference benchmarks while drastically slashing token costs.

The team harnessed large-scale reinforcement learning to maximize the intelligence density of Grok 4 Fast. In evaluations, Grok 4 Fast matched the performance of Grok 4 in benchmarks, all while utilizing an average of 40% fewer thinking tokens.

With a 40% boost in token efficiency, paired with a significant drop in the price per token, Grok 4 Fast achieves a staggering 98% price reduction, all while maintaining the same level of performance as Grok 4 in cutting-edge benchmarks. Independent reviews, conducted through AI analysis, confirm that Grok 4 Fast boasts a state-of-the-art (SOTA) price-to-intelligence ratio when compared to other publicly available models on the AI Index.

Grok 4 Fast is trained end-to-end using reinforcement learning (RL) for tool utilization, excelling at discerning when to invoke tools like code execution or web browsing.

For example, Grok 4 Fast demonstrates advanced agentic search capabilities, effortlessly navigating the web and X while enriching queries with real-time data. It swiftly traverses links, extracts media (including images and videos on X), and integrates search results at breakneck speed.

In LMArena's search evaluation, grok-4-fast-search clinched the top spot with a score of 1163, outpacing o3-search by 17 points. Its exceptional reasoning efficiency and intelligence density enable it to outperform larger models in real-world search-related tasks.

In LMArena's Text Arena, grok-4-fast secured the 8th position, performing on par with grok-4-0709 and underscoring its remarkable intelligence density. Notably, it significantly outperforms its peers in the same weight class, with all equally sized models ranking 18th or lower.

Here are some practical use cases of Grok 4 Fast:

Grok 4 Fast introduces a unified architecture where reasoning (long-chain thinking) and non-reasoning (quick responses) are managed by the same model weights and controlled through system prompts. This integration minimizes end-to-end latency and token costs, making Grok 4 Fast perfect for real-time applications.

Currently, Grok 4 Fast is accessible to all users, with complex queries in Auto mode automatically triggering Grok 4 Fast for unlimited use.

The team also plans to roll out two variants of Grok 4 Fast: grok-4-fast-reasoning and grok-4-fast-non-reasoning, each equipped with a 2M token context window. The pricing details are as follows:

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.