Tested 4 leading PPT generation large models: Frequent errors in image matching and data visualization failures

08/27 2024 497

Large models have expanded into PPT generation from text, images, and videos, but they are still far from the ultimate goal of 'one-click generation.' This will depend on who can sustainably invest in advancing the technology.

@TechNewKnow

Following chatbots, text generation, image creation, and video production, large models have boldly entered the PPT generation scene.

Participants include traditional office software developers like Kingsoft Office with WPS AI and iSlide with iSlide AI; internet giants like Baidu and Ali Group with Baidu Wenku and Quark AI PPT; and AI heavyweights like iFLYTEK, Tiangong AI, and Kimi from the Dark Side of the Moon. Each company has leveraged its strengths to compete for a leading position in the PPT generation race.

For example, the domestic large model Kimi partnered with AiPPT to launch PPT Assistant, while iFLYTEK released iFLYTEK SmartWrite 2.0, an intelligent document AI assistant, which includes an upgraded version for one-click PPT generation. These tools can not only extract the core content of documents to generate an outline but also switch themes and templates with a single click, even supporting conversion from multiple documents to PPTs. Each function is designed to save time and effort, reflecting the determination to compete in this field.

PPT creation has long been a pain point for professionals. If technological competition can drive innovation in office productivity, users will naturally welcome it. However, if companies merely deliver a superficial AI solution without addressing users' genuine needs, it could backfire and disappoint users.

Therefore, TechNewKnow will test four domestic large models—Kimi, iFLYTEK SmartWrite, Baidu Wenku, and WPS—to evaluate their sincerity towards users.

To ensure objectivity and relevance, we adopt the following criteria for our evaluation:

1. Use uniform Chinese prompts, including simple and compound prompts.

2. Test each model's ability to generate PPTs from input themes and uploaded documents.

3. Use the web versions and their default settings on a computer.

4. Present only the first generated result without further adjustments or optimizations.

Here are the results of our evaluation:

Generation Method 1: Input Theme to Generate PPT

Instruction: You are an experienced observer of the film industry invited to share insights on the 2024 May Day holiday box office and film analysis in a university class. Create a PPT with a clear structure, logical flow, accurate data support, visually appealing design, including charts, and no more than 20 pages.

Instruction Clarification: As a communication tool, a PPT should effectively extract key points from complex information, summarize, and visualize data. To test these AI PPT tools' basic capabilities, we chose a multi-faceted, data-rich theme.

WPS AI: Swipe left or right to view more

WPS AI: The nearly 900-word PPT outline is well-structured and comprehensive, raising expectations for the presentation. However, the generated PPT merely fills the outline into a selected template, requiring users to search, summarize, and analyze the content themselves.

For example, under 'Background Introduction,' WPS AI simply states, 'Analyze the 2024 May Day holiday film market in China to understand industry trends,' providing a thinking framework rather than analysis. Similarly, for 'Market Size Overview of the Film Industry,' it gives a vague summary: 'The domestic film industry continues to grow, with an expanding market size.'

Kimi

Kimi: In one minute, Kimi generated a nearly 2000-word outline attempting to cover background, situation, box office analysis, audience analysis, success/failure factors, marketing strategies, policy environment, market regulation, and future market outlook in 20 pages. While comprehensive, the outline lacks focus on the core theme of '2024 May Day holiday films and their box office analysis.'

Like WPS AI, Kimi provides a framework and analysis template rather than a fully-fledged PPT. For '2024 Box Office Analysis,' Kimi suggests listing key box office indicators like total box office, daily box office, and attendance and analyzing market factors like film quality, promotion, and audience feedback. This is a complete analysis framework, more detailed than WPS AI's.

Baidu Wenku: Swipe left or right to view more

Baidu Wenku: As shown, the PPT's table of contents is its entire outline, suggesting a concise approach. Surprisingly, among the first three tested products, Baidu Wenku came closest to our vision of 'one-click PPT generation,' with a complete structure, appropriate level of detail, prominent theme, and The ability to cite official data and conduct analysis .

However, during outline generation, Baidu Wenku produced two identical paragraphs, resulting in duplicate slides in the PPT. Despite its excellent content, this issue requires manual deletion.

iFLYTEK SmartWrite: Swipe left or right to view more

iFLYTEK SmartWrite: Despite some overlapping text issues, iFLYTEK SmartWrite accurately grasps the theme, Reasonably allocate the proportion of each part's content , and effectively analyzes relevant dimensions around the theme, such as film genre distribution, ticketing channels, viewing modes, and key city box office data. Its information extraction and text generation capabilities are impressive, aligning with the theme and offering sufficient detail.

Generation Method 2: Upload Document to Generate PPT

Instruction: You are a university student preparing a final course presentation titled 'Analysis of Character Images in the Movie Pride and Prejudice.' The 1500-word document includes film and director introductions, plot summary, main character analysis, characterization techniques, and conclusions. Generate a PPT based on this document.

Instruction Clarification: This test assesses AI's language and scene understanding, text reading, and summarization capabilities. Assuming an average speaking rate of 240 words per minute, the presentation should last about 6.5 minutes.

WPS AI: Swipe left or right to view more

WPS AI: A minimalist approach with a cover and closing slide, resulting in a 6-page PPT. The layout is reasonable, but the images are irrelevant. The document content is processed straightforwardly without additional context or analysis.

Kimi: Swipe left and right to see more

Kimi: Although the uploaded document is already a complete report speech draft, Kimi seems to have its own ideas. It arranges the layout based on the framework and template of the speech draft, extracts key points, but does not expand the analysis. Instead, it provides users with directions for thinking, more like a "semi-finished product". Users need to fill in the content according to its instructions. Additionally, Kimi surprisingly made the low-level mistake of reversing content and subtitles in this round's performance.

Baidu Wenku: Swipe left and right to see more

Baidu Wenku: Based on the options provided, we chose "appropriate expansion" over "consistent with the original text." The final draft did indeed supplement effective information and improvements based on the document content, expanding around the keyword "character image" with logical reasoning. The layout had no significant issues, but the most critical flaw was the incorrect use of screenshots from the movie "Little Women" as illustrations.

iFLYTEK AI Writing: Swipe left and right to see more

iFLYTEK AI Writing: The layout still fails to reasonably arrange the template and text content, resulting in overlapping words in multiple places. Additionally, the excessive amount of text on PPT pages not only affects the visual experience but also hinders the audience's quick access to information. However, through online expansion, the final product effectively improves the report content based on the speech draft. Furthermore, the AI-generated images it provides are a highlight, albeit with room for improvement in relevance, but still better than previous attempts.

Generation Method 3: Evaluation of Other Relevant Dimensions

Apart from the overall presentation, we also evaluated the following relevant dimensions: response speed and efficiency, usage cost and price, templates and styles, user experience and interaction, and security and privacy protection.

PPT Generation Times for Four AI Models

The "one-click generation" feature is undoubtedly a dream come true for workers and students overwhelmed by PPT tasks. After all, creating a well-designed PPT requires significant time and effort, from selecting a template to organizing structure, finalizing text, choosing words carefully, arranging layout, and designing optimizations. Overall, the four products tested in this issue generated their final drafts within 2.5 minutes, whether through inputting a theme or uploading a document. This showcases their efficient data management mechanisms and algorithm systems.

It seems speed and quality are sometimes difficult to achieve simultaneously. In theme-based generation, iFLYTEK AI Writing excelled with a total time of just 1 minute and high-quality content. When uploading documents, WPS AI completed the task in just 29 seconds but had the worst content among the four.

Comparison Table of Four AI PPT Evaluation Objects

In terms of usage cost, the AI newcomer Kimi is quite generous, currently offering its PPT assistant function for free. The other three adopt paid membership models but provide free or low-cost limited-time trials during their user acquisition phases.

Choosing a template is often the first challenging step in PPT creation. Overall, the PPT template libraries of these four products mainly cater to education, workplace, technology, and business scenarios, with room for further expansion in applicable scenarios and themes. Kimi stands out with the largest number of free templates, while most of Baidu Wenku's templates are VIP-exclusive, and even free templates require payment for download to local computers.

Regarding styles, the PPTs generated by each company use basic fonts, animation effects, and slide transitions, supporting online editing or post-download editing.

WPS AI Interface

Kimi Interface

Baidu Docs Interface

iFLYTEK AI Writing Interface

In terms of user interaction, all four products offer convenient registration and login methods, including through mobile phone verification codes, and support web-based operations. WPS AI, Kimi, and iFLYTEK AI Writing adopt a minimalist design, allowing users to quickly find desired functions. Baidu Wenku's interface is more complex, with "AI Generate PPT" located in the far right corner of the smart assistant, occupying only one-fifth of the homepage.

Regarding security and privacy protection, none of the four companies have explicit privacy policies outlining how user data is collected, used, and protected.

Evaluation Observations: The Battle Between Technological Progress and User Expectations

It is undeniable that in evaluating these four AI PPT products, we have witnessed significant advancements in AI's content creation capabilities. With simple input instructions and a click, an invisible hand quickly arranges and "instantly generates" content, undoubtedly a relief for PPT-burdened users.

While speed is important, quality remains the core. Dialogue and text generation are the initial entry points for most companies in this field. Through daily data training, large models have significantly improved their natural language processing capabilities, laying a solid foundation for their multimodal development. Based on this, all four products demonstrated impressive logical analysis in this evaluation.

However, most companies still primarily Stay in providing ideas" level in text content generation, requiring enhancement in extracting effective information and generating accurate, in-depth texts, as seen in WPS AI and Kimi. In comparison, Baidu Wenku and iFLYTEK AI Writing demonstrated a higher level of performance in this test, likely benefiting from their backing from Baidu, Baidu Wenku, and iFLYTEK's knowledge engines and content data accumulations.

While there are reasons for optimism, there are also disappointments. In the data visualization aspect, which "Tech Insight" highly anticipates, none of the four contestants addressed it. Data, the "Sword of Damocles" hanging over tech giants, is also the soul of PPTs, often involving data comparisons presented through charts and graphics for clarity and ease of understanding. This is a consensus among modern workers creating PPTs. Therefore, in the "input theme generation" test, we specifically selected themes with data information and issued specific instructions requiring the final drafts to "include charts." Unfortunately, none generated the requested content, reflecting a lack of understanding of PPT pain points and usage scenarios.

Another issue not to be overlooked is that common PPT application scenarios encompass schools, research institutions, governments, and enterprises, implicating substantial data privacy and business secrets. Currently, AI PPT large model products appear insufficient in data privacy protection standards, with a lack of confidence in this regard, often avoided in promotional materials.

It seems that PPTs are still far from the ultimate goal of "one-click generation." While players rush to develop, they must continue to refine their internal capabilities.

The comprehensive enhancement of large models heavily relies on massive data feeding, with development and training being complex endeavors requiring comprehensive resource investments. They not only depend on top talent teams and cutting-edge technologies but also necessitate rich, high-quality corpora, necessitating continuous data collection and processing capabilities from developers. Simultaneously, significant hardware resources must be invested to provide necessary computing power.

In short, large model development is a comprehensive test of technical depth, data breadth, and computing intensity, with each technological advancement backed by significant financial investments.

The era of the "Hundred Model War" is over, and survival is now the priority. Unlocking new application scenarios is the inevitable path for large models to gradually commercialize. According to Yuehu Data, the user base of the smart PPT industry reached 9.2 million in June 2024, with a compound monthly growth rate of 21% over the past three months. Facing the vast demand for PPT content generation, no player wants to fall behind in this competitive race.

To capture this essential market, enterprises must present tangible capabilities. Additionally, the user cultivation model merits consideration, as it determines the long-term success in winning markets and hearts. Reflecting on the development of globally popular online office software in recent years, companies initially offered free benefits to benefit workers before imposing usage restrictions and gradually charging for features, drawing criticism. To date, there is a lack of concrete data on these companies' actual return on investment and user retention rates, making it difficult to comprehensively assess their operational effectiveness.

As AI increasingly shapes our lives, scrutiny cannot be too meticulous. After all, every user paying for technology must also invest time, privacy, and trust. The PPT "helper" carrying the high hopes of workers and students must not squander this opportunity.

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.