12/01 2025
397
Moments ago, Elon Musk took to social media to announce the launch of Grok 4.1. He remarked, "You'll notice enhancements in both speed and quality."
According to officials, Grok 4.1 stands out in terms of creativity, emotional intelligence, and collaborative capabilities. It boasts a heightened ability to discern nuanced intentions, making interactions smoother while maintaining its sharp intellect and dependability.
The development team has carried forward the large-scale reinforcement learning framework from Grok 4, leveraging it to refine the model's style, personality, practicality, and coherence.
To fine-tune these often-intangible reward signals, the team pioneered new techniques, employing state-of-the-art intelligent reasoning models as reward benchmarks for extensive autonomous evaluation and iterative feedback.
In the LMArena text rankings, Grok 4.1 Thinking claims the top spot with an impressive 1483 Elo score, outperforming the highest-ranked non-xAI model by 31 points.
Grok 4.1's non-reasoning mode, which delivers instant responses without any deliberation tokens, secures the second position with a score of 1465 Elo. In contrast, Grok 4 occupies the 33rd place overall.
The team utilized EQ-Bench3 to gauge Grok 4.1's interpersonal prowess. EQ-Bench is an assessment tool judged by Large Language Models (LLMs) to evaluate active emotional intelligence, comprehension, insight, empathy, and social skills.
The test battery comprises 45 challenging role-playing scenarios, appraising the model's performance by scrutinizing its responses against multiple criteria. Grok 4.1 emerged as the top performer in this evaluation.
The 4.1 iteration also fared well in the Creative Writing v3 benchmark test. Here, the model was tasked with generating responses to 32 distinct writing prompts, with three iterations per prompt.
Given the model's limited reasoning depth and constrained tool invocation budget, it may occasionally produce factual inaccuracies.
However, researchers noted a marked decrease in the hallucination rate for sampled information retrieval prompts.
Additionally, the team conducted a stratified sample evaluation of real information search queries from production environment traffic. They also assessed FActScore, a publicly accessible benchmark test encompassing 500 biographical questions about individuals.
Grok 4.1 is now accessible to all users and can be reached via the Grok official website, X, as well as iOS and Android applications.
References: