06/01 2026
564
Have you noticed a shift in focus when new AI models are unveiled? People are increasingly less concerned with benchmark rankings and scores. There's a growing consensus that the value of isolated capabilities is on the decline. The true test of a model's worth lies in its ability to empower agents to perform effectively. For providers of large-scale models, developing foundational models tailored for agents presents a significant opportunity—the next battleground.
JieYue XingChen is among the model providers that swiftly recognized this trend and took decisive action. Their newly released model, Step 3.7 Flash, is touted as an efficient solution specifically designed for production-level agents.

To evaluate its real-world performance, I integrated Step 3.7 Flash into my workflow using crayfish and will share my daily process for gathering AI information.
The process unfolds in three stages: Broad AI information capture -> In-depth topic exploration -> Creation of visual infographics.
Step 1: Capturing Cutting-Edge AI Information
This task may seem straightforward, but it's actually complex, tedious, and often yields a low success rate.
I provided crayfish with a list of dozens of AI news websites, categorized into three groups based on crawling strategies: RSS parsing, Crawl4AI, and paywall or other solutions. Different websites have varying security mechanisms and crawling difficulties, which are also influenced by network conditions. Consequently, only a small fraction of these websites typically provide usable data.
This time, I instructed Step to capture AI news from the past three days.
After issuing the command, I planned to browse my phone while waiting for the task to complete. To my surprise, a few minutes later, I glanced at the computer and saw that it was finished, having successfully captured 349 items.

Previously, using Claude and Gemini, the results were satisfactory, but each task took at least 20 minutes, so I usually ran them in the background.
Looking at JieYue's development platform, the cost was only two yuan! That's about 1/20th of what Claude costs, and it was through a third-party channel—you understand the implication.
Let's examine the output file.

Firstly, the quantity of 349 items far exceeds my previous results. However, there's a minor issue: some news items not precisely related to AI, such as general technology or other disciplines, are included.
Examining each field individually, the results are commendable with no hallucinations. Especially the original links—the few I clicked on were all correct. In previous tests with moderately capable models, link addresses were often fabricated.
This step demands high tool-calling capabilities. It's not just simple retrieval; it involves switching back and forth between RSS, script writing, program execution, web crawling, link reading, and file writing.
I've run this task with several models before, and the success rate and results were generally mediocre. Issues like missed steps, repeated calls, getting stuck on invalid links, or forgetting the goal midway were common when entering multi-tool, multi-step processes.
Step 3.7 Flash's tool-calling chain is relatively stable and didn't scatter the task.
Step 2: Diving Deeper into Topics of Interest
This is divided into two steps: first, retrieve relevant article links on the topic, and then summarize and aggregate all information points.
The benefit of this approach is gaining a more comprehensive understanding of the entire event, including related and similar events. Organizing information points logically and listing the corresponding original text for each point allows for faster, clearer reading and avoids hallucinations.
For example, I wanted to explore this topic:
"Anthropic's co-founder visits the Vatican and tells the Pope they've found something 'disturbing' in their AI model."
I gave the topic to crayfish and asked it to first collect links and then summarize the information.

It completed the task quickly again. Below are the contents of the two files:


Clear and accurate. These files can be read independently or used as sources for writing articles.
A truly capable agent can't rely solely on the training data of large models to answer questions. It must actively find information, judge sources, cross-verify, and transform search results into citable, traceable structured information.
Here, Step's built-in retrieval capabilities made the task seamless.
Step 3: Generating Visual Infographics
I created four infographics in different styles. No specific style was specified, and no complex prompts were given—Step improvised entirely on its own.

Here are the results:




Not bad, right? They're suitable for articles, making it easier for readers to understand and save.
It's worth mentioning that Step 3.7 Flash has native multimodal capabilities. So, tasks like image understanding and visual retrieval don't require the agent to call external tools.
For example, I took a screenshot of part of the first infographic and asked Step to adjust the text.

Step located the corresponding area and made the correct correction. 
After completing the workflow, the overall results exceeded my expectations.
To be honest, Claude opus 4.7 does have a slight edge in terms of effectiveness. However, for most daily tasks, Step 3.7 Flash is more than sufficient and offers a highly competitive execution speed and cost.
Take the first step of capturing AI news, for example. Using Claude opus 4.7 daily would be a bit expensive.
Many people joke online that AI hasn't made our lives easier—it's made us more tired. I've felt that way too, but now AI genuinely reduces my workload. The turning point was having a user-friendly agent framework paired with a stable, efficient, and affordable foundational model.
For instance, the work I just demonstrated didn't require much human intervention. In the past, it would often occupy my entire day, but now it's done in half an hour. The saved time and energy allow me to focus on higher-value tasks.
This AI information-gathering workflow is useful not just for AI bloggers but also for product managers, investors, researchers, and entrepreneurs.
What we lack isn't information but the efficiency to gather, organize, and absorb it.
Now, let's circle back to the Step 3.7 Flash model itself.
Just by looking at the model name, you might think it's just a faster and cheaper Flash model. But that's not how JieYue XingChen positions it.
According to JieYue XingChen, Step 3.7 Flash is an efficient Flash model designed for production-level agents. It's built for agent, coding, search, and multimodal workflows, open-source, and deployable, optimized for efficiently completing real-world tasks.

What is a production-level agent?
Production-level tasks aren't about answering a single question—they involve a series of continuous actions. Understanding the goal, breaking down tasks, searching for real-time information, reading documents, filtering sources, organizing evidence, generating results, and checking for omissions or strict adherence to instructions.
If any step is slow, off-track, or missed, it all translates into costs.
The next phase of model competition won't just focus on single-point capabilities but the overall efficiency within the agent loop.
Cheap models might be inefficient in execution, and completing the entire task may not save money.
Smart models might be slow or expensive, making them impractical for real production environments.
So, what people expect from models now—or rather, what their agents expect—is the ability to complete entire workflows with lower latency, lower costs, and greater stability.
This is exactly where Step 3.7 Flash delivers value.
Additionally, when we talk about production-level models, open-source is a must-mention.
For ordinary users, open-source might not seem exciting. But for those working in production environments, open-source means security.
Many enterprises building agents prioritize data boundaries, stability, version control, business system integration, and long-term maintenance.
As an open-source, locally deployable model, Step 3.7 Flash offers a different sense of control.
Greater controllability, more flexible deployment, deeper integration, and a foundation for trust.
And user trust is the biggest asset for model vendors.
Another key feature of Step 3.7 Flash is its native multimodality.
In many scenarios, data isn't always in text form. Screenshots, PDFs, web pages, and videos all enter the workflow.
In the past, developers might have needed to arrange additional visual modules, passing images to OCR first, then to another model for understanding, and finally feeding the results back into the agent process.
In engineering, the most expensive part isn't always the modules themselves but the connections between them.
This is where Step 3.7 Flash's native multimodality shines. Visual understanding can directly integrate into the agent workflow alongside code generation, search, and tool-calling.
For developers, this saves not just a prompt but also orchestration costs.
Finally, if you're still searching for the right model for your agent, Step 3.7 Flash is worth a try. Don't get hung up on paper specifications—put it into your real workflow to truly appreciate its efficiency and advantages.
Meanwhile, as AI industrialization accelerates, I hope more vendors will step out of the parameter arms race, focus on real-world scenarios, and release more high-quality models that meet production-level needs and can actually get things done.
If you have any thoughts, feel free to discuss them in the comments.
If you found this helpful, please like, share, and recommend the article. Follow "AI Robot Tea House."