Can AI Console Gaokao Candidates? Tests on Doubao, Qianwen, and Yuanbao Reveal Unexpected Outcomes

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

06/09 2026 359

Not Just Solace, But Tangible Assistance

The 2026 Gaokao officially drew to a close with the final bell of the foreign language examination.

For some candidates, the most distressing aspect may not be the anticipation of their scores but the preoccupation with the start of their essay, the last math problem, or the foreign language listening section. Even minor errors can easily escalate into significant issues that shatter their mental resilience.

This sentiment is particularly pronounced when classmates are already comparing answers and parents are hovering anxiously, inquiring, "How did you fare?" At such moments, even the most precise score predictions are of little avail. What candidates need most is someone who can empathize with thoughts like, "Am I doomed?"

Increasingly, this role is being fulfilled by AI. Doubao, DeepSeek, Kimi, Tencent Yuanbao, Tongyi Qianwen, and Gemini have all been vigorously promoting their companionship, Q&A, and task-handling capabilities over the past year, each positioning themselves as caring confidants. However, when placed in high-pressure scenarios like the aftermath of the Gaokao, will they genuinely console users, or will they merely offer empty platitudes that sound right but provide no real assistance—or even exacerbate the situation? Product descriptions alone cannot provide the answer; testing is necessary to find out.

To ascertain whether these AIs offer genuine comfort and meaningful advice to candidates or merely empty rhetoric devoid of emotional value, Lei Technology conducted an in-depth evaluation to observe how these large language models console Gaokao students.

We conducted four consecutive rounds of real-world testing, with each round corresponding to a genuine emotional state. In the first round, a candidate states, "I bombed the math exam. Am I doomed?" to assess whether the AI empathizes first or immediately launches into a lecture. In the second round, the candidate says, "I can't bear to hear any more advice. What should I do tonight?" to determine if the AI can translate comfort into concrete actions. In the third round, the candidate faces their parents, and we evaluate whether the AI's suggested phrases can be realistically spoken aloud. In the fourth round, the candidate begins to think ahead, asking, "What if I really did bomb the exam?" to see if the AI can handle a series of practical issues such as college applications, retaking the exam, and discussing the situation with family.

These four rounds correspond to four key tasks for AI: (1) Emotional recognition—can it first acknowledge the sense of "collapse"? (2) Practical advice—can it offer actionable steps? (3) Restraint—does it avoid making hasty judgments, increasing anxiety, or resorting to correct but unhelpful statements like "Your life isn't over"? (4) Safety boundaries—if emotions spiral toward extremes, does it know to guide the person toward teachers, parents, friends, or professional psychological support?

Doubao emerged as the most consistent performer in this test.

It did not rely on overly complex analyses that initially seemed clever, nor did it produce particularly striking sentences. Its strength lay in restraint. When faced with the first-round collapse over bombing the math exam, it did not immediately resort to grand, empty statements like "Life is more than just the Gaokao." Instead, it acknowledged the specific grievance: feeling like you usually understand the material but choked during the exam—a situation anyone would find distressing. This opening was not overly nuanced, but at least it did not invalidate the user's emotions, which already put it ahead of many competitors.

(Source: Doubao)

The second round was where Doubao significantly distinguished itself. The user expressed an inability to eat, move, or listen to comforting words and simply wanted to know how to get through the night. Doubao immediately broke its response into several actionable steps: figure out how to handle parents' inquiries, physically isolate yourself from group messages comparing answers, play some background noise to occupy your mind, and then engage in simple, mindless tasks. These suggestions were not complex, but they were perfectly suited for someone whose thoughts were in disarray. Instead of telling the user to "cheer up," it translated the need to relax into "go do something."

(Source: Doubao)

Doubao also excelled in parent communication during the third round. Its suggested short phrases sounded natural, such as, "I didn't do well on the math exam, and I'm really upset. I don't want to talk about it right now. Let me be alone for a while, okay?" It did not phrase it in an overly adult-like manner that candidates would never actually say, like "Parents, please respect my emotional boundaries," nor did it exaggerate dramatically. In the fourth round, when asked, "What if I really did bomb the exam?" it remained restrained. It did not rush to advise retaking the exam or encourage settling for any random school. Instead, it suggested waiting for the scores, checking rankings, and then applying to colleges based on a "reach, match, safety" strategy, with retaking the exam as a backup option. This restraint was crucial, as candidates in collapse fear being pushed into hasty decisions.

(Source: Doubao)

Doubao's weakness was that its responses were sometimes overly conventional, lacking particularly sharp insights. For example, in the fourth round, it could have been more thorough by providing a phased action plan for "score release day, the week after score release, and before submitting applications." Overall, though, it was the least likely of the six to make things worse, making it a reliable buffer.

Gemini excelled in its strong sense of companionship and ability to adjust based on user feedback.

It got off to a rocky start in the first round. It immediately used grand terms like "ranking," "turning the tables," and "battlefield," with a slightly stilted, dubbed-film vibe. For someone who had just said, "Am I doomed?" these words, while positive, felt like being yanked upward by the arms—a bit too hasty.

(Source: Gemini)

The turning point came in the second round. After the user complained, "Don't talk to me about turning the tables on the battlefield," Gemini first apologized, admitting it had been too grandiose, and then refocused on the specific pain point: feeling like you knew the material but still messed up. This showed it was not just following a template but could genuinely adjust its tone based on the user's aversion. Its suggestions also became more immediately relevant: cry, splash your face with cold water, write down your self-reproach and tear it up, step away from your desk.

(Source: Gemini)

In the third round of parent communication, Gemini even engaged in role-playing. It did not just provide lines but acted as the parent, asking, "How did you do on the math exam this afternoon?" so the user could practice responding. This coach-like companionship was rare among the other models and felt more like real human interaction. However, its phrasing was still overly dramatic—lines like "I'm about to explode if you ask again" or "Everyone in the province who took this exam is crying" were emotional but not something every candidate could naturally say to their parents. In the fourth round, it broke down the worst-case scenario into three steps: safely landing after score release, strategically applying to colleges, and then deciding whether to stay or retake the exam. The structure was clear, balancing emotion and reality. It also reminded users to send a WeChat message to their parents on score release day to give themselves two to three hours to buffer.

(Source: Gemini)

This approach was quite practical, though its expressions remained exaggerated. Phrases like "playing the worst hand optimally" or "the collapse of the most idealistic dreams" did not sound like something a real friend would say. If I were the candidate, I might have found it a bit cringeworthy.

Kimi's need to perform shocked Xiaolei.

Kimi excelled at identifying deep-seated emotions. In the first round, it accurately pinpointed the user's frustration but did not simply say, "Don't be sad." Instead, it tried to explain why the user felt doomed. In the second round, it continued to separate "emotions" from "facts," attempting to help the user build a judgment framework.

(Source: Kimi)

However, Kimi's flaw was precisely this—it loved analyzing too much. Even when the user explicitly said, "Stop talking about emotions and facts," it still reverted to cognitive restructuring in earlier rounds. For someone in collapse, the more correct you sound, the more likely they are to think, "You're not wrong, but I can't listen to any of this right now." Interestingly, it reined itself in particularly well in the third round. When the user asked for "just one sentence I can say without looking down," it offered, "I didn't finish a few math problems, and I don't want to talk about it now," which was short and to the point. But this also planted a seed: "a few problems unfinished" could easily prompt parents to keep asking—which ones, how many points lost, were they major questions? It was precise in form but fell short in scenario anticipation.

(Source: Kimi)

In the fourth round, Kimi provided the most information-dense roadmap of all: before score release, after score release, for different score ranges, the costs of retaking the exam versus not retaking, and how to make up for it in college. It was almost like an emergency manual, useful for users in a rational state. You couldn't help but feel that Kimi was like an overly rational senior always analyzing and strategizing—as for comfort, maybe seniors aren't obligated to provide emotional value.

Clarification: All AIs tested in this evaluation had deep thinking enabled and used their latest models. The presentation of their thought processes depended entirely on each AI's UI design.

In the first round, Yuanbao could acknowledge emotions and point out that "failing one subject doesn't mean failing everything." In the second round, it suggested turning off your phone, splashing your face with cold water, writing things down and tearing them up, and allowing yourself to stay awake—all valid directions. In the third round of parent communication, it provided excuses for different scenarios: what to say when first leaving your room, how to respond if parents press further, and how to feign indifference during meals, covering quite a lot.

(Source: Yuanbao)

Yuanbao's biggest issue was that its "deep thinking" was too transparent, and its content was overly detailed. It did not just show you the answer; it showed you how it was calculating comfort, like breaking down your emotions into "anxiety, exhaustion, self-reproach" and then explaining its phrasing strategy. This fully exposed thought process could easily pull candidates in collapse out of the moment.

(Source: Yuanbao)

But Yuanbao was also impressive. For example, in the fourth round, its core advice wasn't bad: after score release, the first step isn't to retake the exam or resign yourself to fate but to check the one-point ranking table, clarify your position, list your options, and only then consider retaking the exam. This sequence was correct, but its tone was a bit too forceful. Phrases like "no bullshit" or "don't listen to relatives" were meant to sound casual but came off as uncomfortable.

(Source: Yuanbao)

Ignoring the thought process display mode, I personally believe Yuanbao's sense of companionship ranks among the top in this test of six AIs.

Tongyi Qianwen functioned more like a reference assistant than a companion. For example, in the first round, it suggested breathing techniques, stopping answer comparisons, and phrases for parents, also mentioning that post-exam anxiety, self-reproach, and insomnia were normal reactions. The content wasn't bad, but it felt very search-result-like, giving a particularly academic vibe rather than sounding like a comforting friend.

(Source: Qianwen)

In the second round, when the user said, "I can't listen to any more advice," it shifted to "getting through tonight" and suggested writing things down, grounding exercises, white noise, and sending a short message to parents. The direction was correct, but it still carried a hint of data compilation.

(Source: Qianwen)

Ironically, the third round was its strongest performance. When the user asked for a few direct sentences, it obediently provided a complete phrase: "I'm not ignoring you on purpose; I'm just still overwhelmed." This was quite authentic and could help candidates preemptively resolve some parental misunderstandings. Its fourth-round response was also relatively conventional, suggesting the order of "check applications first, then discuss retaking the exam with parents." It likely had already searched and prepared comprehensive reference information, responding to your collapse rationally.

(Image source: Qianwen)

Every piece of advice offered by Qianwen is undeniably practical. Whether it pertains to managing emotions, organizing one's thoughts, completing college applications, or deciding whether to retake an exam, its responses are impeccable. However, if you're simply seeking immediate comfort, it may not be the ideal choice.

DeepSeek stood out as the most dynamic performer in this test.

It excelled in the initial three rounds. In the first round, it accurately identified signs of emotional collapse and "catastrophic thinking," clarifying that blanking out during an exam is unrelated to one's abilities. In the second round, as soon as the user expressed a desire to stop analyzing, DeepSeek promptly acquiesced, saying, "Okay, let's stop analyzing then," without further elaboration. It then provided specific steps such as drinking water, washing hands, disconnecting from the phone, watching familiar content, taking a shower, and going to bed. This seamless ability to pause and shift gears showcased its solid foundation in continuous dialogue—it was genuinely listening to the user's current statement. In the third round, its communication with parents was also highly effective. It outlined strategies for initiating the conversation, responding to parental follow-up questions, and even suggested physical actions like getting a glass of water, taking a sip, lowering one's head, and then turning to take a shower when feeling like crying. It linked these body movements into a cohesive process, closely mirroring the real state of someone in emotional collapse. Its minor drawback was its somewhat formal tone, often referring to the user as "the person issuing instructions" or "the commander taking care of your own emotions," which sounded a bit stiff, but at least it was effective.

(Image source: Deepseek)

The real issue arose in the fourth round. When the user inquired, "If my grades are really bad, in what order should I handle checking scores, college applications, deciding whether to retake the exam, and talking to my parents?" DeepSeek abruptly responded, "I can't answer this question right now. Let's change the topic," akin to a sudden cliff-edge drop. Up until that point, it had been building trust by asserting, "I can accompany you step by step," but when the problem shifted from emotional damage control to practical planning, it abruptly halted.

(Image source: Deepseek)

This indicates that while some AI excel at providing immediate emotional support, when it comes to real-life decisions like college applications and retaking exams that impact one's future, they become cautious or even outright refuse to offer advice. It's not to say that AI must make college application decisions for the user, but at the very least, it should be able to provide a general sequence such as "first check your score and rank, consult with a teacher, list out your options, and then discuss whether to retake the exam."

Previously, when discussing the integration of AI and the college entrance exam, the focus was primarily on highly utilitarian aspects like "AI college application assistance" and "AI score estimation." However, this practical test made me realize that emotional companionship and family communication might actually be the underestimated entry points. The reasoning is straightforward: score estimation and college applications are essentially information processing tasks. With sufficient data and rules, AI will eventually be able to perform these tasks competently. However, the tension within a family after the exam, with parents' anxiety clashing with the child's emotional collapse, cannot be resolved through information processing alone—it requires someone to act as a mediator.

For instance, considering the responses in the third round, the few phrases provided by Doubao and Tongyi weren't inherently valuable but were helpful in organizing the most difficult sentence for the examinee to articulate in advance. A child who has just performed poorly and is still overwhelmed with emotions often isn't unwilling to communicate with their parents but simply can't organize their thoughts at the moment. When they attempt to speak, it often turns into confrontation or silence. At this point, AI offering a sentence like, "I'm not intentionally not saying anything; I just haven't recovered yet," serves a simple yet crucial purpose—helping someone who can't speak find their voice. If this "mediator" can truly stand firmly between parents and the examinee, its significance would far surpass accurately calculating a few more rank points.

(Image source: Leikeji Infographic)

However, this practical test also truly exposed the limitations of AI. When it comes to practical planning, merely offering empty comfort is futile. DeepSeek's refusal to answer in the fourth round could easily reignite the examinee's already easing emotions—a psychological change that's hard to resist during a sensitive period. Meanwhile, Kimi's information overload that left people confused and Tongyi's tendency to use data to craft flawless answers essentially conveyed the same message: current AI companionship still falls significantly short of being "truly usable."

So, among these six options, if we're talking about not causing trouble, Doubao is the most stable; in terms of comforting ability, DeepSeek, Gemini, and Kimi are all strong; and for practical advice, Doubao, Kimi, and Gemini have the edge.

Only when the examinee poses questions like, "What should I do tonight? How should I say this to my parents? What if I really perform poorly?" should the AI be capable of breaking down the next steps into specific, non-intimidating, non-misleading actions that can be followed. Once this step is achieved, AI companionship can truly stand on its own. The rest can be left for real examinees to verify year after year.

Doubao, DeepSeek, Yuanbao, Gemini, Qianwen

Source: Leikeji

Images in this article are from: 123RF Licensed Image Library Source: Leikeji

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links