Is the Reasoning Process Too Inefficient? Meta Presents a Solution!

11/17 2025 483

When tackling complex reasoning tasks, Large Language Models (LLMs) often get mired in inefficient "repetitive labor". For instance, they repeatedly elaborate on the process of finding a common denominator when adding fractions, reiterate discriminant conditions while solving quadratic equations, and constantly spell out conversion formulas during unit conversions.

These redundant steps not only slow down the reasoning speed and consume a substantial amount of computing power but also drive up the usage costs.

A recent paper by Meta highlights that LLMs typically rederive the same intermediate steps for a given problem. This leads to increased token usage and results in inefficient reasoning processes.

To tackle this issue, the research team proposes a novel approach called "behavior compression". This method involves packaging common reasoning routines into callable "named behaviors". When needed, the model can simply invoke these pre-validated behaviors instead of starting from scratch. This significantly reduces the number of output tokens, conserves computational resources, and enhances the stability and accuracy of the results.

The researchers suggest that this mechanism could make large models more efficient in various practical scenarios in the future. Businesses could deploy intelligent customer service systems or search engines at lower costs. Researchers could obtain faster responses in mathematical reasoning or code generation tasks. And AI teachers in educational applications could minimize verbose explanations and provide more precise answers.

With the introduction of "behavior-conditioned fine-tuning" technology, these compressed behaviors can even be internalized by the model. This allows for stable invocation without the need for external retrieval, further boosting the model's reasoning ability and reliability.

Reasoning-type LLMs generate a long chain of thought (CoT), which represents a reasoning trajectory.

The research team divides Large Language Models into three complementary roles. First, the "Strategist" (LLM A) is responsible for refining reusable behaviors from its own reasoning trajectories. Second, the "Teacher" (LLM B) is used to generate training data for supervised fine-tuning (SFT). Third, the "Student" (LLM C) has its reasoning process optimized with the support of "behavior-conditioned reasoning" or "behavior-conditioned SFT".

In the specific workflow, the Strategist takes center stage by generating solutions that encompass complete reasoning chains and final answers for given problems. Subsequently, researchers re-input the problem and solution into the Strategist to initiate a "reflection" phase. This reflection process not only scrutinizes the reasoning logic and answer correctness but also attempts to identify whether new, reusable behavior patterns can be extracted. Finally, the Strategist converts the problem, solution, and reflection into a set of behavior entries, each with a "name" and "instructions", continuously expanding them into a behavior manual.

Given a problem Q, the method first retrieves relevant behaviors B from the behavior manual. These behaviors, along with their corresponding instructions and the problem, are then input into the LLM to generate a solution.

The model feeds behaviors back into the context based on the reasoning trajectory of a specific problem, serving as valuable lessons or prompts for solving the same or new problems.

Fine-tuning a given model with data generated through a brain-computer interface (BCI) to internalize behaviors is termed Behavior-Conditioned Supervised Fine-Tuning (BC-SFT). The specific workflow is as follows:

The Strategist extracts behaviors for each problem. Then, the Teacher utilizes BCI to generate behavior-conditioned responses for each problem.

The Student model is fine-tuned based on the resulting pairs of (problem, behavior-conditioned response).

Compared to the original model, the BCI method achieves similar or even higher accuracy with fewer tokens during reasoning. Moreover, performance continues to improve as the token budget increases, indicating that the proposed method does not unduly compromise the model's existing capabilities.

The reduction in generation length has the potential to significantly lower reasoning costs, thanks to two key factors that help mitigate expenses. First, the input representation of behaviors can be precomputed and reused across different queries, amortizing costs over multiple reasonings. Second, there is no need for autoregressive generation on the input side, enabling faster token processing.

Experimental results also demonstrate that, compared to the original model, the BC-SFT model is not only more token-efficient but also more effective in transforming non-reasoning models (such as Qwen2.5 14B-Base and Qwen2.5-32B-Instruct) into reasoning models.

Meta's work introduces a mechanism that enables large language models to leverage their metacognitive abilities to refine their cyclical reasoning patterns into concise behaviors.

Instead of rederiving the same intermediate results, the model simply invokes relevant behaviors and applies them to new reasoning tasks.

However, this work still has some limitations. In the realm of brain-computer interfaces (BCIs), behaviors are retrieved based solely on the problem itself. Once the list of behaviors is fixed, new ones cannot be added to the context.

It remains to be seen whether the framework can be extended to the following aspects:

Establishing a comprehensive library that covers a wide range of behaviors.

Rewriting a large corpus to perform SFT on a larger scale, thereby improving smaller models and enabling self-improvement in models used for curating behaviors and rewriting responses.

In summary, transforming slow chains of thought into fast, reusable behaviors empowers LLMs to perform efficient and scalable reasoning. This means LLMs can not only learn to solve problems but also remember how to solve them effectively.

References: https://arxiv.org/pdf/2509.13237

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.