03/03 2026
368
Just two days into another Gulf military conflict, pitting the United States and Israel against Iran, discussions erupted across cyberspace regarding the utilization of AI large models in warfare. On the Chinese-language internet alone, March 1st saw a surge of articles confidently asserting that 'the United States employed Claude and Grok in its strikes,' even going so far as to claim that 'AI large models played a pivotal role.' However, these assertions were swiftly debunked, with critics arguing that AI's significance had been grossly exaggerated and that the success of the U.S. 'decapitation strike' was not AI-dependent. So, which narrative holds more water?
To provide a balanced perspective, it's worth noting that there's a substantial body of original English-language content supporting the notion that 'the United States leverages AI in warfare,' not just from domestic self-published articles. Yet, the sources are diverse and can be broadly categorized into three groups:
1. Assertions that AI large models, particularly Claude, played a crucial role in intelligence analysis during the U.S. 'decapitation strikes,' primarily based on an in-depth article from The Wall Street Journal, supplemented by reports from The Guardian and Reuters.
2. Claims that the Grok large model accurately predicted U.S. actions on February 28th. While this prediction did materialize, its significance has been vastly overstated and misconstrued as 'the United States relying on Grok for intelligence.'
3. Statements that AI has become 'essential' to the U.S. military apparatus, even classifying the autonomous navigation of drones as AI. This is largely hyperbolic rhetoric originating from English-language social media platforms like Reddit.
Let's delve into the second point first: As of now, there's no evidence to suggest that the Grok large model, developed by xAI, was utilized in the U.S. military's intelligence analysis and targeting operations. Prior to the conflict, the U.S. White House had instructed the Defense Department to prohibit the use of Claude, but the outbreak of hostilities precluded implementation. xAI's subsequent attempts to 'secure a contract' with the U.S. military to replace Anthropic came too late to be relevant at the onset of the conflict. It's almost certain that the primary large model employed by the U.S. military in this strike was Claude.
So, how did the Grok myth originate? On February 25th, someone posted Q&A records from four major models—GPT, Gemini, Claude, and Grok—feeding them public information and using hypothetical, leading questions to inquire about when 'the United States or Israel plans to act.' The other three models offered vague responses, but Grok provided a specific date: February 28th. When this date proved accurate, Grok was subsequently 'deified.'
This suggests that Grok may indeed possess robust capabilities in integrating public information, or perhaps it's more inclined to provide a definitive answer (which happened to be correct). Regardless, 'predicting correctly' and 'being utilized by the U.S. military' are two distinct matters. Anyone can use Grok to speculate on the timing and location of the next conflict. Incidentally, I personally believe that GPT and Claude's refusal to provide a specific date may stem from their 'alignment mechanisms.' It's well-known that these models are excessively politically correct and often reluctant to engage with topics they perceive as risky.
Now, let's address the third point: Undoubtedly, all military deployments, aircraft takeoffs, missile launches, etc., by the U.S. military (and the Israeli military) were human-driven decisions and executions. Claude, like other large models, did not intervene in the decision-making processes of the U.S. military's command chain (only offering suggestions) and certainly did not command the activation or launch of weapon systems. Common sense dictates that truly 'targeted killings' of specific individuals carry political responsibility, and such actions cannot be entrusted to AI. Even if AI were to develop independent judgment capabilities in the future, the ultimate responsibility would still likely rest with humans. Moreover, current AI large models have not been integrated into any weapon systems themselves.
Some also cite the U.S.-Israeli coalition's 'deployment of the most advanced drone swarm in history' (note: debatable) for automated bombing as evidence of 'AI intervention in warfare'—this reflects a fundamental misunderstanding. The AI we're discussing now refers to Generative AI (Gen AI), an AI solution based on natural language processing (NLP) and utilizing the Transformer architecture as its underlying technology. We're examining the extent to which Gen AI intervenes in the warfare process. If we were to label all automated or intelligent solutions as AI, we'd be surprised to find that contemporary aircraft's fire control systems are computer-controlled, and even the entire cruise process is computer-controlled. 'Fire-and-forget' missiles are also computer-controlled... Has human warfare not already been significantly influenced by computer technology long ago? Would we have to wait until now to acknowledge this?
In the financial reports of Silicon Valley giants, a clear distinction is made between Gen AI and Core AI (traditional AI): Recommendation algorithms, search algorithms, and content moderation, which are already mature AI technologies, are classified as Core AI. While they are certainly influenced to some extent by Gen AI, the two cannot be conflated. Otherwise, when we observe that long-distance civil airliners require minimal human intervention except during takeoff and landing, would we exclaim, 'AI is mighty, making pilots obsolete'—unfortunately, this exclamation comes several decades too late and only reveals our lack of nuance... (Incidentally, the computer flight control systems in civil aviation do not even qualify as Core AI; that's a separate discussion.)
Now, let's discuss the first point, which is also the most credible: According to reporting by The Wall Street Journal (WSJ) and corroborated by multiple authoritative English-language media outlets, Claude did indeed play an indispensable role in the intelligence system of the U.S. military's Central Command (CENTCOM), primarily in areas such as intelligence organization, target identification and selection, and battlefield simulation. Humans remained the primary decision-makers in this process, responsible for providing information, informing military action objectives, and making final decisions by selecting from options provided by AI. Although the White House announced a ban on Claude before the war began, considering its deep integration with the Pentagon's intelligence system, such a ban was impractical at the time and remains uncertain for the future.
Please note: No English-language media outlet claims that Claude accurately located senior Iranian officials. Their successful location was the result of months of work by CIA intelligence personnel. Claude may have organized and verified this intelligence, but it was neither responsible for acquiring it nor for the final location. Additionally, reports about Palantir playing a significant role generally do not come from serious media but from social media platforms like Reddit. There is currently no evidence that Palantir played a role; even if it did, it was likely peripheral and indirect.
The claim that 'Claude successfully located senior Iranian officials through analysis of various clues and, together with Palantir, formed a kill chain' is not just domestic self-media sensationalism but American netizen sensationalism that has been translated and further exaggerated in China. These sensationalist pieces are then uniformly labeled as 'reported by WSJ' or 'reported by The Guardian,' which is simply taking advantage of the fact that most people do not subscribe to the WSJ's electronic edition and cannot access the original articles...
However, can we conclude that Claude's role was insignificant and did not alter the nature of warfare at all? That would be an overcorrection. If AI were truly insignificant, why would the Pentagon deeply integrate Claude into its intelligence system? Why would the White House be so concerned about Claude's security? In fact, the greatest function of AI large models in intelligence work is to 'simplify the complex'—to identify 'potentially useful' clues and patterns through a vast and intricate web of intelligence.
Anyone who has engaged in intelligence or research work understands that the biggest challenge in reality is not a scarcity of intelligence but an abundance of it! Take the greatest intelligence disaster of World War II—Operation Barbarossa—as an example. Stalin was completely unprepared for Hitler's betrayal because he lacked intelligence? No, he received an overwhelming amount of contradictory intelligence while ignoring the fact that Hitler was not a rational actor. Similarly, by the eve of the Normandy landings, it was Hitler's turn to be inundated with intelligence—the Allies successfully carried out a large-scale deception, concealing the only truth with a vast amount of false intelligence. We can cite numerous similar examples.
Claude's analysis of vast, multimodal intelligence, including textual information, satellite imagery, signals intelligence (SIGINT), and intercepted communications, is indeed noteworthy. However, this credit must be divided into two dimensions:
The identification, decryption, and integration of complex multimodal information, such as extracting useful information from satellite imagery, performing noise reduction and recognition on voice information, and summarizing vast amounts of textual information... These tasks do not deviate significantly from the realm of traditional tools. Compared to human recognition or traditional computer recognition, AI recognition primarily enhances efficiency without introducing anything 'unprecedented.'
Analysis based on raw or already organized information to uncover certain 'clues' or 'patterns' and even provide actionable suggestions—this is where AI truly makes a difference! For example, feeding Claude a pile of public news and internal intelligence, can it identify important blind spots that no one has noticed? And explain them to intelligence personnel?
Unfortunately, the extent to which Claude's role this time falls into the former category versus the latter remains unclear. From common sense, even if it can independently discover clues, intelligence personnel would likely not trust them blindly and would still reach conclusions after repeated verification combined with personal judgment. The WSJ report mentions that Claude helped analyze the 'behavioral patterns' of senior Iranian officials and provided suggestions, but how detailed were these suggestions? Were they proven correct afterward? The WSJ does not provide clear sources, and probably no media outlet does.
Here's an interesting anecdote: I recently discussed with GPT, Gemini, and Grok separately about 'what role AI large models played in U.S. military actions.' Among them, Gemini was the most assertive, claiming that 'AI has become indispensable to the U.S. military' and that 'without AI, the U.S. military strikes could not have succeeded' (but providing very weak evidence); GPT was relatively conservative, stating that AI has a certain role but is completely different from the 'fully automated killing machine' that everyone imagines; Grok was the most cautious, first explaining at length that 'Grok did not participate in the U.S. military's intelligence system' and then providing a very conservative analysis of the application of large models in military affairs. However, all three responses had some merit and were well-supported, allowing me to follow the leads and obtain many original links.
Currently, Grok is my preferred large model to use and the one I engage with most frequently. Last month, it was Gemini; for more than two years before that, it was always GPT. I especially appreciate Grok 4.20 Beta's 4 Agents mode, which is suitable for analyzing highly complex long texts. This demonstrates that the progress of large models is far from over, and even in the relatively mature field of text-to-text chat, everyone can still showcase their unique strengths. I can't wait to see what innovative applications will emerge next—it's truly exhilarating!