Actual Test of Domestic Large Model iFLYTEK Spark V4.0: Foundation Capabilities as the "Base" and Personal Space for "Height"

07/03 2024 525

The progress speed of domestic large models is being refreshed.

In January this year, when iFLYTEK Spark V3.5 was released, its overall level was already very close to GPT-4 Turbo and performed excellently in various evaluations. Just half a year later, iFLYTEK Spark V4.0 achieved a leap from close to surpassing.

On June 27, iFLYTEK released iFLYTEK Spark Large Model V4.0 and related landing applications in Beijing, announcing the comprehensive upgrade of seven core capabilities, surpassing GPT-4 Turbo overall. As usual, "New Position" conducted a comprehensive evaluation from base capabilities to product applications.

First, let's look at the base capabilities that determine the scope and depth of AI functionality. We referenced the test results of external authoritative test sets announced at the press conference. Among 12 mainstream test sets for large models at home and abroad, iFLYTEK Spark ranked first in 8 test sets in a horizontal comparison, surpassing international large models like GPT-4 Turbo and Claude 3 Opus, with comprehensive performance considered the strongest in China.

Taking the most prominent example of Chinese language understanding, we selected a real question from the past civil service exam question bank on "verbal comprehension and expression".

Military auxiliary ships in naval vessels are the "grain officials" on the ocean. Although they do not possess powerful combat capabilities, they are directly related to long-distance support. However, China currently only has four comprehensive replenishment ships serving in the navy, which seems somewhat inadequate to maintain the increasingly _____ long-distance training, escort, and exercises.

The most appropriate option to fill in the blank is: A. Long; neglecting one for another B. Complex; powerless C. Heavy; strained D. Difficult; miserable

The reference answer is "C," and iFLYTEK Spark provided a complete analysis logic and correct answer. The strong base capabilities provide more room for product design.

A set of latest data shows that since its full launch in September last year, the cumulative downloads of iFLYTEK Spark App on the Android open market have reached 131 million, ranking first among domestic tool-type general large model Apps. During this year's 618 period, the sales of smart hardware supported by Spark Large Model increased by over 70% year-on-year, with an average monthly usage of over 40 million times.

This means that as the C-end flagship, iFLYTEK Spark App has initially accumulated market recognition and user loyalty. Therefore, looking at the highlight of this press conference - the latest progress of product refinement, "New Position" believes that "personalization" is the keyword to understand this round of product iteration of iFLYTEK Spark.

Focusing on specific scenarios of concern to users such as work, learning, and healthy living, the newly launched "Personal Space" extracts the value of personal data from the homogeneous generation, making consumers' perception of large model base capabilities more concrete.

The enhancement of the base capabilities of the large model raises the upper limit, while the creation of personal space strengthens the details, allowing for more advanced performance in applications such as office work and education. The logical connection between the two is smooth and natural.

01. Extracting personalized value from homogeneous generation

From a market perspective, users' enthusiasm for AIGC is quite high. According to QuestMobile data, in January 2024, the aggregated active user scale of the TOP 10 AIGC Apps reached 53.76 million, a year-on-year increase of 3725%, and the deduplicated user scale of the TOP 10 Apps increased by 37 times year-on-year.

While demand is soaring, the issue of homogeneity is emerging. Major players are all betting on large models. Among the Top 10, in addition to iFLYTEK Spark, Baidu's Wenxin Yiyan, Douyin's Doubao, and Kunlun Wanwei's Tiangong have all experienced a rapid rise in monthly active users over the past year, but their applications basically focus on the generation of text and image information modalities. It's not an exaggeration to say that opening different App interaction pages presents a "thousand models with one face" scenario.

When the generated content of each company is similar and not practical enough, there is a risk that large model C-end applications will lose their recognizability on the consumer side. The solution is still to return to the ToC product design logic, starting from the needs of "individual" users, from being a general assistant to being a personal assistant. This is why "New Position" believes that "personalization" is the underlying logic of this round of iFLYTEK Spark iteration.

The prerequisite for becoming a "personal assistant" is to fully "understand" the individual. As iFLYTEK Chairman Liu Qingfeng said at the press conference, AI assistants should be able to express themselves individually based on user profiles, learn from memory based on usage history, and enhance learning based on personal data. Therefore, building a database for user data and training personal models with personal data is the first step for AI assistants and a prerequisite for generating coherent and exclusive content.

In the revamped iFLYTEK Spark App and desktop version, users can upload various materials related to their work, study, life, health, etc., in the "Personal Space" to form a personal knowledge base. At the same time, in the personal settings interface, users can adjust the "AI persona" by themselves, and the AI will also make the generated content closer to the individual style based on past conversations and historical dynamics, thus breaking away from homogeneous generation.

In addition, the "My Center" entry of the App and desktop version serves as a master control panel for various practical functions provided by the large model. From here, users can access personalized and interesting services such as persona tags, schedule management, information subscriptions, and creating voice actors. It is worth mentioning that the "Intelligent Agent" function accessible from "My Center" directly. The first batch of 14 intelligent agents are tailored to specific scenarios to create dedicated assistants.

The editorial team of "New Position" tried to have different authors upload their own articles to create customized intelligent agents. The content generated by Spark varied, and one could see the differences in expression and language style. As we add more dimensions and upload more personal recordings, spreadsheets, and manuscripts, the entire intelligent agent becomes increasingly user-friendly and personalized.

Combining the information from the press conference, the current product layout of Spark Large Model is developing towards "depth." In terms of breadth, the entire iFLYTEK C-end hardware and software product ecosystem has been integrated, allowing millions of smart hardware users to have a "Spark package" with one click. For example, previous star products such as iFLYTEK smart office notebooks and smart recording pens can sync user data to Spark Personal Space with one click, making it easier for users to build their personal spaces.

In terms of depth, with the support of a stronger base, multiple products have updated their functional breakthrough progress. For example, the "Spark Speech Large Model" was tested at the press conference for speech recognition issues in strongly interfering scenarios, and the results were impressive. Three researchers from iFLYTEK Research Institute spoke simultaneously in a noisy scene, and while the human ear could not hear clearly, Spark achieved overlapping speech separation and real-time transcription.

Coupled with the extraction of the keyword "personalization," strengthening base capabilities + adjusting and designing more personalized functions may become the main theme of future iFLYTEK product iterations.

We are already able to experience the charm of "personal assistants" on iFLYTEK Spark App/Desk. Some more vertical landing scenarios, such as work environments that urgently need to rely on large models to improve productivity, or education and daily life with demand for intelligence, have also undergone similar innovations.

02. More "professional" office personalized assistants

QuestMobile data shows that the activity of AIGC Apps is higher during weekdays (Monday to Friday) than weekends (Saturday and Sunday), suggesting that AIGC Apps are more compatible with office scenarios at this stage. This also aligns with the natural transition from high-dimensional technology to efficiency tools.

However, the key point that "New Position" wanted to test was whether iFLYTEK Spark could truly integrate into the workflow and not just provide an office template. Therefore, we divided the test into three directions to examine whether iFLYTEK Spark considers the details in the actual workflow.

First, whether it is close to practical work. Here, practicality refers to the tendency to produce templated results in common large model content generation, i.e., using fixed formats and narrative tones to generate a rigid "stereotyped essay." Such content cannot be delivered as a work result at all.

Therefore, we selected the finished intelligent agent "Work Report Expert" in iFLYTEK Spark Desk. The AI's prompt was to "fill in the work objectives, strategies, and a summary of achievements, and the assistant will provide a work report for reference."

The prompt we provided included, "I am a sales manager, and my work objectives for this year are to achieve sales of 30 million, a collection rate of 50%, and organize more than 5 customer activities, obtaining at least 20 valid business opportunities. The report should highlight the strategy section."

The above figure is the result (partially excerpted) feedback by iFLYTEK Spark, a report with personal work content, with accurate formatting and writing standards. It elaborates on the design of strategy implementation for this year's work objectives. Based on this result, we also invoked the "iFLYTEK Smart Text" intelligent agent to generate a complete work report PPT with one click.

Currently, the Intelligent Agent Center of iFLYTEK Spark includes finished intelligent agents in multiple practical directions such as the workplace, creation, learning, and programming, and supports free creation. At this level, its practicality can be said to be unique. All of this stems from the improvement of base capabilities such as complex instruction understanding, logical reasoning, and content generation, making the delivered content appear more "professional".

Regarding how to make the delivered content more coherent and compliant with organizational operation procedures, combined with the previous development of personal assistants, iFLYTEK Spark has some unique performances in the advanced direction of "close to personal habits".

Here, personal habits refer to language use. In daily writing such as emails, speeches, and compositions, the author's professional identity, position, or tone will affect the writing style, and these styles often embody the most critical information of the project. The second refers to the material library accumulated from past personal works, which is often needed to be repeatedly accessed in work. Both points have been reflected in the previous evaluation of personal space and are areas where the updated version of iFLYTEK Spark excels.

Finally, let's look at whether the work content we handle on the App can form a fixed cycle and schedule. The latest version of My Center has basically achieved this function. AI will help us manage to-do schedules, subscribe to specified news. If you have detailed settings for your AI persona, voice, and intelligent agents, all the customized office assistant functions mentioned in the evaluation can be accessed with one click from My Center.

The office scenario well demonstrates how the extreme utilization of personal data on top of base capabilities makes efficiency tools more practical and user-friendly.

In the daily work of the editorial team of "New Position," we sometimes use various AI tools to assist in searching for materials, data, etc., but due to the homogeneous generation and scattered functions, we have not found an office assistant that can be used consistently. Judging from this actual test, iFLYTEK Spark has found the pain points of workplace "workers," and the formula of base capabilities + personalization also applies to more "profound" education and medical scenarios.

03. Advanced educational functions, starting with grasping scenario "pain points"

As the "national team" of artificial intelligence, iFLYTEK has been deeply involved in the education field for many years and achieved remarkable results, proposing advanced AI education concepts such as precision teaching, academic analysis, and personalized learning.

During the Double 11 period last year, iFLYTEK AI learning machines, powered by Spark Large Model, first surpassed brands like Xiaodu, Bubugao, Xiwo, and Zuoyebang, taking the top spot in sales of learning machines on JD.com and Tmall. In June this year, relevant responsible persons of iFLYTEK said that since the first upgrade of large model capabilities in May 2023, iFLYTEK AI learning machines have continued to maintain a high growth momentum, with sales growth exceeding 150% from January to May this year.

After this year's college entrance examination, the composition topic of the new curriculum standard I paper on artificial intelligence attracted attention. In a related report, the "Chongqing Daily" used Spark to generate a commendable college entrance examination composition. It can be seen that Spark has reached the level of a "high-scoring composition" in terms of both logic and language.

(Source: "Chongqing Daily")

In fact, AI learning machines, as vertical education products, have been iterating along with the large model base version. This press conference also mentioned that the latest version of AI 1-on-1 tutoring function can conduct multimodal heuristic explanations, free question personalized answers, interactive inquiry-based learning, and ultra-anthropomorphic guided companion learning.

However, combining the opinions of multiple subject teachers, we are more concerned about the Spark Intelligent Grading Machine, which integrates intelligent grading, precise academic status, and personalized learning. It supports free typesetting, intelligent grading of multiple subjects and question types, and the generation of multi-dimensional academic status reports, providing materials for teachers' homework review and face-to-face tutoring.

"New Position" believes that "burden reduction" is a high-frequency term in the education field in recent years, but when various education hardware designs products centered on "reducing students' burden," the attention paid to "teaching burden reduction" may be lacking. In fact, burden reduction should be two-way.

In our communication with high school math, Chinese, and politics teachers, "enlightenment" is the most frequently used keyword. Regardless of the subject, students have to undergo a transformation from quantity to quality in daily teaching. Accumulating quantity is a necessary prerequisite, and teachers need to rely on years of accumulated teaching experience to enlighten students at critical points and provide "enlightenment" for different students' progress.

However, in the necessary accumulation process, a large number of exercises and tests need to be manually graded to allow students to identify and make up for deficiencies in training and make progress step by step. Such tedious, repetitive work that requires certain teaching experience is precisely where AI should intervene. In the on-site demonstration, the Spark Intelligent Grading Machine simulated real handwriting and completed the grading of 15 student assignments in half a minute, providing not only an overall class report but also the basic knowledge and subject ability mastery of each student.

This is a personalized transformation based on the needs of various subjects in education. With the support of AI, this two-way burden reduction frees up teachers' "productivity," allowing them to invest more time in specialized teaching and grasp the teaching progress of the class, while reducing the difficulty for students to obtain targeted teaching resources, enabling them to obtain feedback and manage their learning progress at any time.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.