iFLYTEK Spark 4.0 Turbo Launched! Aiming to Rival GPT-4o, Making Large Models More User-Friendly?

10/24 2024 405

Entering the second half of 2024, generative AI remains the most compelling technological focus.

From the early days of so-called 'AI' that barely comprehended human instructions, to today, where a single command can faithfully generate paintings, write articles, and produce videos, AI has significantly boosted productivity. Many hope to leverage AI to enhance their work and studies, improving efficiency and even indulging in a bit of leisure.

The Nobel Prize's recognition of large models and neural networks has further inspired leading tech enterprises worldwide to launch AI large models, invest resources, and conduct in-depth research. The industry is witnessing a flourishing and rapid development, with the global battle for large models heating up.

Among them, iFLYTEK Spark, which has initiated public testing early on and undergone several iterations, stands out as a unique presence.

(Image source: Leitech)

On October 24th, the 7th World Voice Conference and the 2024 iFLYTEK Global 1024 Developer Festival officially kicked off at the Hefei Olympic Sports Center, and I was invited to participate in the opening ceremony.

At today's conference, iFLYTEK unveiled Spark 4.0 Turbo, with upgraded foundational capabilities, notably improved math, coding, and long-text abilities. Its overall Chinese and English capabilities remain industry-leading, and its training and inference efficiency has significantly increased, further meeting the growing demand for large-scale deployments.

Moreover, popular multimodal interaction and hyper-realistic virtual human capabilities have also been introduced, along with upgraded industry-specific large models and applications in fields such as education, healthcare, research, justice, and government affairs. Follow me now to explore them.

iFLYTEK Spark Upgrades Galore

Since its launch in May last year, iFLYTEK Spark has undergone several iterations in just a year and a half, with the rapid deployment of Spark Cognitive Large Model V4.0 propelling iFLYTEK's capabilities closer to the forefront of the industry.

So, what surprises can iFLYTEK bring us this time?

Let's start with the latest upgrades to the foundational Spark 4.0 Turbo model.

(Image source: Leitech)

Currently, Spark surpasses GPT-4 Turbo in seven key areas, including textual knowledge and language understanding, and excels in math and coding compared to GPT-4o. It has completed algorithm validations for ultra-long thinking chains, tree searches, and self-reflection evaluations, with significant improvements in high-difficulty math capabilities like o1 expected by the end of the year.

(Image source: Leitech)

However, Liu Qingfeng acknowledges that Spark still lags behind GPT-4o in logical reasoning and multimodal capabilities, requiring continued efforts to catch up.

In terms of specific functions, iFLYTEK highlighted its new multimodal interaction and hyper-realistic virtual human technologies.

Earlier this year, OpenAI's updated GPT-4o stunned many industry insiders with its real-time audio, visual, and textual reasoning capabilities, marking a significant step towards more natural human-machine interaction (and even human-machine-machine interaction).

Five months later, iFLYTEK publicly showcased its Spark Extreme Multimodal Interaction technology at the conference for the first time.

(Image source: Leitech)

Simply put, multimodal interaction enables more natural, efficient, accurate, and flexible human-machine interaction by integrating multiple sensory modalities (e.g., vision, hearing, touch).

In my opinion, the core of this technology lies in fusing data from different modalities. The large model needs to accurately identify the input data type and obtain intent information to comprehend and process tasks within a broader context.

Thus, the live demonstration began with information input.

Our old friend Liu Cong, President of iFLYTEK Research, led the audience through a new real-time speech dialogue exploration.

(Image source: Leitech)

During the dialogue, Spark proactively captures the user's current state and actively engages. For example, when Liu Cong mentioned recent flight delays, the hyper-realistic digital avatar showed concern and even Selling cuteness (acted cute), making the interaction entertaining.

Regarding visual interaction, Spark can now perceive the surrounding world through cameras.

When photographing figurines on the desk, Spark accurately identified the characters of Sun Wukong and Ultraman, inferring interactions between them based on Liu Cong's poses and adding its own commentary.

This feature enables real-time voice translation and travel assistance overseas. Your phone can translate between Chinese and English, identify different overseas products, and offer purchase suggestions.

Want something more personalized?

Coupled with existing voice imitation capabilities, now just one photo is needed to create a digital avatar for more dimensional and personalized expression. The scene of Liu Cong conversing with his digital double, Liu Xiaocong, was quite amusing.

(Image source: Leitech)

Moreover, there's a surprise today.

iFLYTEK also launched the Spark Multilingual Large Model, initially supporting eight languages beyond Chinese and English. The official claims its multilingual capabilities reach 96% of GPT-4o's performance, even surpassing it in multiple industry scenarios.

From the official demo, the Spark Multilingual Large Model primarily aids in international business expansion or empowers foreign trade, enabling consumers speaking other languages to experience Spark's features like meeting minutes, knowledge retrieval and reasoning, and complex scenario intent understanding, without relying on overseas large models.

(Image source: Leitech)

Powered by domestic computing capabilities, this large model takes its first step towards overseas markets.

Spark Expands into Scenarios, Touching Various Industries

Unlike us ordinary folks, many insiders are more concerned about how such large models will impact industry trends.

For instance, today's debut of the in-vehicle Spark Large Model.

(Image source: Leitech)

Liu Qingfeng explained that many drivers encounter no-network situations when driving through tunnels or in other special scenarios, or prefer not to sync personal data to the cloud for privacy reasons.

To address this, iFLYTEK deployed the large model locally in vehicles, introducing an end-side model with approximately 1.3B parameters. Compared to cloud deployment, the performance loss is ≤1%, with an initial response time of 40ms, practically indistinguishable from cloud performance.

Starting in Q4 this year, multiple Chery, GAC Motor, and Great Wall models will integrate the end-side Spark Large Model and become available for purchase, enabling users to experience it soon.

In education, iFLYTEK AI Learning Machines unveiled the 'AI Homework Filter' to scientifically reduce students' workload using the large model.

(Image source: Leitech)

According to the official introduction, the AI Learning Machine uses OCR to identify exercise questions, categorizing them into 'Must-Do,' 'Optional,' and 'Suggested Not to Do' based on students' learning history and local exam trends. This helps students prioritize their studies and avoid repetitive, ineffective practice.

If I had such a machine back then, I could've cut down on at least 50% of wasted effort!

For teachers, besides the upgraded iFLYTEK Smart Board 2.0, iFLYTEK jointly launched the 'Problem Chain-Based High School Math Smart Teacher System' with the National Institute for Education Research, soon to be officially available.

Specifically, this technology breaks down conventional problems into multi-step problem chains, guiding students to deeply understand subject concepts and enhance learning outcomes. Pilot applications showed significantly increased student engagement and interest, alongside improved teaching efficiency, receiving overwhelmingly positive feedback.

Oh, and the Spark Medical Large Model has also been updated to version 2.0.

(Image source: Leitech)

The headline update is the all-new Spark Medical Imaging Large Model, trained on numerous medical imaging instances to enable automatic quality control and multi-disease identification within a single image.

Additionally, iFLYTEK showcased a multi-language AI translation transparent screen for real-time Chinese-German translation, the Spark Intelligent Office All-in-One for government agency support, and the VIAS Evaluation Robot for testing human-machine interaction in smart cockpits.

(Image source: Leitech)

According to iFLYTEK Chairman Liu Qingfeng's speech, from January to September this year, iFLYTEK Spark's smart hardware GMV increased by 50% year-on-year. As of October 23rd, the GMV during the Singles' Day promotion across all channels surged by 280%, marking a steady progress in iFLYTEK's hardware-software integration strategy.

iFLYTEK Aims to Make Large Models More User-Friendly and Practical

Looking back at the entire conference, iFLYTEK's mission is clear:

"Making large models more user-friendly and practical."

In my opinion, for large models to be applied on a large scale across industries, akin to electricity and water reaching every household, the collective efforts of multiple enterprises and developers are necessary. Building an AI 'Spark' ecosystem is precisely what iFLYTEK strives for.

From the ground-level implementations showcased by iFLYTEK, we witnessed not only the gradual intelligent transformation of government and enterprises but also steady progress in education, healthcare, research, and more. An increasing number of enterprises aim to 'liberate productivity and unleash imagination' by adopting large model technologies.

(Image source: Leitech)

Consolidating strongholds in consumer, education, healthcare, and automotive sectors while penetrating new domains like telecommunications, finance, energy, and transportation, and establishing roots in enterprise market scalability – this is iFLYTEK Spark's commercialization path.

Admittedly, OpenAI's products may still excel in multimodal and reasoning capabilities, but their sudden decision to cut off access has forced domestic enterprises and developers to seek alternatives.

Compared to foreign tech enterprises, Chinese counterparts excel at 'down-to-earth' implementations. A more diverse range of hardware, faster technology applications, a more vibrant industrial ecosystem, and a dominant position in video generation are notable achievements by iFLYTEK and its peers through persistent efforts.

Indeed, this Spark foundational upgrade represents an exploration of future possibilities.

It showcases iFLYTEK's deep accumulations in AI and China's formidable strength in the field, proving that building world-class large models based on independently innovative computing foundations and leading algorithms and data is not a dream.

(Image source: Leitech)

Moreover, at the conference, the domestic ultra-large intelligent computing platform 'Feixing II' was officially launched. This domestic computing platform will continuously adapt to new models and algorithms, expand intelligent computing clusters, explore uncharted territories, and offer a second choice for industries worldwide.

The era of domestic large models may not be far off.

Source: Leitech

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.