07/02 2026
488

Although the forms of hardware may change, the significance of access points and data will remain constant.
Edited by Meng Wen
A recent Gartner report, titled "Protecting Your Digital Workplace from AI Wearables," specifically highlights three AI recording products, including DingTalk A1.
The report points out that employees bringing their own AI recording devices into the workplace are raising new concerns about data security and privacy. Simultaneously, there are "early indications of enterprise-level integration" in the market, with some devices at least attempting to become manageable.
This perhaps best describes the current landscape of AI recording hardware: on one side are consumer tools focused on personal efficiency, continually pushing the boundaries of freedom; on the other side are enterprise nodes being "re-softwarized" under the constraints of security, permissions, and auditing.
Plaud and DingTalk A1 are prime examples of these two approaches.
Plaud Note entered the market two years prior to DingTalk A1, selling over 1 million units globally and confirming the market potential for "smartphone-attached recording to large model transcription." Despite their similar forms, their market positioning differs significantly: Plaud relies on personal subscriptions to act as a second brain, while DingTalk A1 integrates with collaboration platforms to incorporate sound into organizational workflows.
Why has the seemingly "niche" category of AI recording hardware seen a surge in popularity over the past year or two? Why are companies ranging from Plaud to DingTalk, Feishu, iFlytek, and Insta360 all entering this space? What exactly are they competing for?

Plaud Leads the Way: The "Renaissance Moment" for Recording Hardware
2023 was a year of optimism for AI hardware.
Humane's AI Pin garnered media attention with its narrative of "replacing smartphones," while Rabbit R1 founder Lü Cheng's TED talk went viral in tech circles.
Everyone was anticipating AI's "iPhone moment," a disruptive hardware that could redefine computing. Yet, unexpectedly, the first product to achieve commercial viability was a recording card attached to the back of a smartphone: Plaud Note.
(Plaud Note)
Plaud's journey is a pragmatic entrepreneurial tale that precisely addressed a market gap.
Since the first iPhone, Apple has avoided native call recording due to privacy concerns. Plaud Note resolved one of the most significant pain points in the Apple ecosystem with a physical add-on.
The technological turning point came when OpenAI open-sourced the Whisper model. Trained on 680,000 hours of labeled audio, it natively supports 99 languages, is open-source and free, and offers unprecedented resistance to noise and accents.
Whisper played a crucial role: it transformed Automatic Speech Recognition (ASR) from a proprietary technology controlled by a few companies into a universal capability accessible to any developer. From then on, creating recording products no longer required developing proprietary ASR engines, making speech-to-text as fundamental as utilities like electricity and water.
Three months later, ChatGPT was released.
The emergence of large models brought qualitative changes to the industry. AI recording products could now not only "record for you" but also "understand for you." Xu Gao, founder of Plaud AI, stated in an interview with Jiemian: "The moment large models arrived, intelligence suddenly broke through, and this logic became valid. Moreover, it became something with a very high theoretical ceiling."
That same year, Plaud Note officially launched on Kickstarter, raising over $1.1 million in less than two months. It then moved to Indiegogo, where it raised an additional $2.38 million, totaling over $3.48 million and setting a new crowdfunding record for global recording devices.
To date, Plaud has sold over 1 million units globally, with overseas markets generating $250 million in annual revenue. Nearly half of this revenue comes from AI subscription services. Zhu Xiaohu revealed that Plaud's latest valuation exceeds $1 billion, making it a bona fide unicorn.
Plaud's success validated three key points for the industry: in terms of form, magnetic cards are acceptable to users, and an interface simplified to a single button is an advantage, not a flaw; in terms of business model, users are willing to pay a one-time fee for hardware and subscribe continuously for AI services; in terms of demand, people whose primary work medium is language indeed need a tool more focused than smartphone recording apps.
This became the starting point for all latecomers.

AI Recording Hardware: A Hot Commodity Among Major Players
While Plaud surged overseas, the domestic market heated up.
In August 2025, DingTalk released the DingTalk A1 AI recording card, priced between 499-799 yuan. Five months later, Feishu partnered with Anker to launch a 10-gram AI recording bean. iFlytek unveiled five new products at once. Insta360 teamed up with Tencent Meeting to integrate cameras into recording devices, while Mobvoi compressed its device to just 3 millimeters.
(Anker AI Recording Bean)
Why has AI recording hardware suddenly become a favorite among major players? The value reconstruction brought by the maturation of AI Agent technology is a core factor.
After the emergence of large models, the efficiency boundaries of recording were entirely raised. Recordings could now directly produce structured summaries, to-dos, interview highlights, and even article outlines, transforming "information" into "action" and achieving a closed loop of recording, transcription, thinking, and execution.
This efficiency boost precisely addressed the core pain points of today's workforce. With hybrid work becoming the norm, cross-departmental meetings, client visits, online collaboration, and cross-language communication have become standard. No one wants to spend two hours organizing meeting notes after a two-hour meeting.
Additionally, AI has spawned new usage scenarios. Vibe Coding for voice-commanding AI to write code, capturing inspiration fragments anytime, anywhere, voice recording for site inspections, real-time archiving of medical consultations... AI recording hardware has shifted from an optional tool to a productivity necessity.
(Source: Leikeji, Insta360 Wave)
This robust demand quickly reflected in the market. By 2026, the voice recorder market exceeded 3.3 billion yuan, with AI-powered mid-to-high-end products accounting for over 55%. DingTalk A1's initial batch of 1,000 units sold out instantly, and it ranked first among Tmall's new voice recorder products during Double 11.
Meanwhile, the supply side was primed for explosion. Recording hardware is not a highly technically demanding category; microphones, low-power chips, and Bluetooth connectivity are all mature technologies verified by the market, with production costs continuously declining.
When Plaud gained popularity overseas, white-label manufacturers in Huaqiangbei quickly drove down prices for similar AI recording cards to the 120-150 yuan range, offering comparable basic functions at one-third or even one-fifth of Plaud's price.
There was also ample room for innovation in product form. From traditional rectangular recorders to cards that attach to smartphones, to magnetic beans that clip onto collars, manufacturers achieved "unnoticeable wearability."
For internet giants, the allure of entering the recording hardware market lies not in hardware profits but in the high-frequency, high-retention, high-value AI users that hardware brings, along with the continuous inference calls and data accumulation these users generate.
The dividends of publicly available structured data on the internet have peaked, while conversational data generated in real work scenarios represents the highest-value unstructured data—and the scarcest nourishment for large model iteration and optimization.
(iFlytek AI Voice Recorder Pokee Series)
AI recording hardware fills the final piece of the puzzle for "digitizing voice information" in office scenarios.
In the battle for collaborative office tools, Feishu, DingTalk, and WeChat Work have reached a fever pitch. Any ecological gap could lead to user loss. If competitors have hardware that can instantly convert recordings into their own summaries, to-dos, and knowledge bases while you do not, users may "slip away" along their workflows. This anxiety drives them to secure a position.
Today, players at the table have split into two distinct development paths. DingTalk and Mobvoi have chosen independent R&D, maintaining deep control over hardware development and data security while targeting enterprise market compliance needs and private deployment. Feishu and Tencent have opted for ecological collaboration, leveraging their respective strengths in AI algorithms and hardware manufacturing to cover the mass market more rapidly.
As large model technologies converge, hardware parameter gaps will continue to narrow, and competition will focus on ecological integration and scenario deepening.

The Competition Is Not for Hardware, But for AI Access
In truth, whether recording cards or beans, these are likely just transitional forms.
Standalone recording cards/beans do address specific pain points, such as poor smartphone recording quality, the lack of call recording on iPhones, and inconvenient offline meeting documentation. However, as AI glasses, AI earphones, and other wearables become ubiquitous, with multi-microphone arrays, all-day sound pickup, and local AI capabilities maturing, more recording scenarios will be absorbed by these more natural hardware forms.
Future AirPods might automatically record and organize meeting content, while smart glasses could natively record, transcribe, and understand sights and sounds. By then, standalone recording cards may persist only in a few professional scenarios, such as journalism, law, and healthcare, much like today's specialized recorders.
Regardless of how hardware forms evolve, the path of converting "sounds generated in the real world into structured data that AI Agents can understand, invoke, and execute" will endure.
Gartner recently included AI recording devices in its report "Protecting Your Digital Workplace from AI Wearables" because "sound has become an asset."
Traditionally, recordings were merely audio files saved on personal devices; today, they are entering enterprise knowledge bases, workflows, and Agent systems, becoming part of organizational digital assets.
Conventional recorders are bulky and complex, carrying a ritualistic "I am recording" presence. In contrast, a magnetic card or a collar-clipped recording bean can blend seamlessly into daily office scenarios, making recording behavior increasingly covert.
The imperceptibility of technology is dissolving the physical boundaries of "informed consent" in traditional workplaces, raising entirely new challenges in workplace data governance and privacy compliance.
Who may record? Who owns the recorded data? Where can it be uploaded? Who has access? Can it feed into enterprise large models and knowledge bases?
Consumer products prioritize unrestricted recording and ultimate efficiency, while enterprise products emphasize permissions, auditing, and data isolation. This is why Gartner noted "early signs of enterprise-level integration" in its report.
For enterprises, an AI recording device only truly qualifies for office scenarios once integrated into organizational permission systems and data governance frameworks.
DingTalk A1 was specifically mentioned by Gartner largely because it supports centralized procurement, device management, and encrypted storage—capabilities that may seem trivial to individual users but are critical thresholds for AI wearables to pass enterprise IT reviews.
(DingTalk A1)
Plaud opened the first door to this market, and major players quickly took over. While similar in form, their positioning differs sharply.
Plaud competes for the gateway to personal knowledge management, serving as a "second brain" for freelancers, content creators, and knowledge workers. DingTalk and Feishu compete for organizational collaboration's voice nodes—the "final undigitized" voice information in enterprise digital workflows.
In the short term, these two paths will not replace each other but will continue to extend alongside AI's evolution. Today, the competition is over recording cards; tomorrow, it may be AI glasses or earphones; eventually, it could be any terminal capable of continuously perceiving the real world.
In other words, they are not competing for hardware but for the primary gateway to AI's perception of the physical world.
This is a race for the next generation of human-computer interaction access points, and no one dares to sit it out.

Editor: Muren Proofreader: Zhang Wenxin Producer: Rui Zong