07/15 2025
394
Podcasts are no longer confined to the private realm of headphones; they are now invading the screens of hundreds of millions of users with accompanying visuals. The global podcast market is projected to reach a size of $30.72 billion by 2026, fueled by a compound annual growth rate of 27%.
Amidst this trend, Bilibili, China's largest youth culture community, has made a bold move. It has unveiled the "Video Podcast Circle-Breaking Plan," a support policy aimed at helping audio and text creators transition to video content creation and fostering account growth.
Image source: pixabay
Across the ocean, YouTube announced in February this year that its monthly active users for podcast content have surpassed 1 billion, surpassing the market share of audio giant Spotify and prompting the latter to launch a video revenue-sharing plan to retain creators.
As in-depth content consumption emerges as a rigid spiritual need for users seeking to escape fragmentation, video podcasts are reclaiming their position at the heart of the content industry.
Video podcasts showcase groundbreaking value, and Bilibili is seizing the moment to break through current limitations.
For a considerable period, Bilibili's gaming business has stagnated, and advertising revenue has plateaued, leaving the platform at a crossroads in its commercial transformation.
Despite boasting a vast user base of Generation Z, the platform has grappled with the contradiction between its "powered by love" community ethos and the efficiency of commercialization.
As short video platforms rapidly capture user time, Bilibili urgently needs to find a middle ground that maintains the depth of its content while paving the way for monetization.
The quiet rise of video podcasts offers a glimmer of hope for Bilibili to break this stalemate. Data from the "Cooperation Plan" reveals that in this quarter, Bilibili's video podcast audience surpassed 40 million, with user viewing time jumping from 6.9 billion minutes to 25.9 billion minutes, marking a growth rate of over 270%. This growth occurred organically, with "operations and products completely uninvolved," highlighting the strong user demand for in-depth content.
What further excites Bilibili is the monetization prowess demonstrated by top creators. Legal field influencers like "Wang Yikuai" (COO of Mita Technology, Wang Yiwei) and "Erzhong's Daxuan Brother" have achieved significant annual income through user charging (content payment) and knowledge courses, far surpassing equivalent audio podcast platform revenues.
This monetization model, based on a deep trust relationship, perfectly aligns with Bilibili's community-driven DNA.
Bilibili's video podcast support policy this time encompasses three key components: cold start traffic support, free recording venues in major first-tier cities, and exclusive AI creation tools for video podcasts.
Among these, the launch of the "Code H" AI tool directly addresses creators' pain points. It helps podcast creators save time on finding and editing video materials. Creators input content, and the tool automatically generates images. Supporting both text and audio inputs, it reduces the video production time for a thousand-word content to within 6 minutes, with the potential to shrink it to 3 minutes in the future.
Technology empowerment is removing barriers to professional video production, enabling knowledge elites and audio creators to operate with ease.
Historically, Bilibili has been caught between the community genes of "powered by love" and the efficiency demands of commercialization. While the gaming business briefly leveraged blockbusters to drive profitability, over-reliance on a single product exposed the fragility of its profit structure.
Against this backdrop, the explosive growth of video podcasts has emerged as a crucial variable. To capitalize on this trend, Bilibili has launched a comprehensive strategy, leveraging AI to eliminate the technical hurdles of professional video production, allowing knowledge elites to operate seamlessly. This bold move is not only Bilibili's last-ditch effort to break the curse of commercialization but also heralds the return of in-depth content to the forefront.
From "niche self-reserved land" to "mass content infrastructure," the battle for video podcasts has begun.
As Bilibili heavily invests in video podcasts with "1 billion traffic + AI tools," the global battlefield is already ablaze.
YouTube leads the world's largest podcast platforms with 1 billion monthly active podcast users, prompting audio giant Spotify to hastily follow suit. Meanwhile, the domestic market remains in its nascent stages, with Douyin and Xiaohongshu secretly testing the waters, and Himalaya and WeChat Video quietly making their moves.
Comparatively, there are notable differences between the Chinese and American markets. Currently, the American market boasts a mature ecosystem where "top podcasts = video podcasts." Whether it's Lex Fridman interviewing Bezos or Joe Rogan conversing with Musk, YouTube serves as the core distribution platform for these contents.
The success of "The Joe Rogan Experience" underscores that video elements can elevate podcasts to immersive thought fields—viewers can not only hear the conversation but also observe Rogan's micro-expressions and body language during his interactions with Musk, making abstract ideas concrete and dramatic.
In contrast, the vast majority of top podcasts in China remain purely audio-based. The dual obstacles behind this disparity stem from the inherent difficulty in commercializing audio podcasts and the higher cost investment required for video production. When audio monetization struggles, creators naturally lack the motivation to upgrade to video.
Thus, video podcasts, with their visual personal IP shaping and diversified advertising formats, are poised to break the "acclaimed but not profitable" dilemma faced by traditional audio podcasts.
It is no coincidence that Bilibili has emerged as the first platform to prominently participate in this battle. Its unique attributes constitute a competitive advantage.
Firstly, it embraces medium to long content. Currently, Bilibili is the only video platform proven to engage users in diversified medium to long content consumption. Its unique "black listening" culture (referring to users listening to sound without watching the screen) allows users to freely choose their viewing and consumption format.
Secondly, there's the barrage culture and secondary creation ecosystem. Bilibili's barrage feature enables real-time thought resonance, and users spontaneously edit and create secondary content for podcasts, forming a "metabolic system" that distills core viewpoints from long content and then feeds back into long content consumption through fragmented dissemination.
This ecosystem is a competitive edge that platforms like Douyin and Xiaohongshu find hard to replicate.
It's worth noting that Bilibili's "Video Podcast Plan" is not an isolated initiative but deeply aligns with the overall evolution trend of Chinese podcasts. Data from the "CPA Podcast Marketing White Paper 2025" reveals that the number of Chinese podcast listeners will exceed 150 million by 2025, with China ranking first globally in 2024 with a growth rate of 43.6%.
JustPod data indicates that 74% of users are willing to pay for podcasts, and 71.6% of users have engaged in consumption behaviors due to podcasts. These data suggest that podcasts are transforming from "niche self-reserved land" to "mass content infrastructure," and Bilibili's all-in decision on video podcasts at this juncture is both a response to market trends and an attempt to seize industry upgrade dividends.
The revival of in-depth content and the "videofication of podcasts" achieve self-transcendence.
In a content ecosystem dominated by algorithm recommendations and 3-second highlights, video podcasts remain essentially "slow media." They don't seek instant gratification but require viewers' sustained investment of time; they don't offer information fast food but serve feasts of thought.
This starkly contrasts with short video ads that pursue immediate conversion, introducing a unique dimension for evaluating commercial value. Hence, the rise of video podcasts is intrinsically a rediscovery of the value of in-depth content.
A deeper dive reveals that the Elaboration Likelihood Model (ELM) unveils video podcasts' dual persuasion paths: when users engage deeply, they rationally accept content through the "central route"; when their attention is distributed, they rely on the "peripheral route," such as the host's persona, to establish trust. This "dual-track superposition" ability to build trust is precisely the niche advantage shared by Bilibili and podcasts.
Moreover, the "slow philosophy" of countering fragmentation has become its core competitiveness. Just Pod research shows that 91.2% of Chinese podcast users hold a bachelor's degree or higher, and 73.4% are users in first-tier and new first-tier cities, forming a precise high-net-worth individual traffic pool.
For instance, after a skincare brand placed customized content on "Business Is Like This," its Tmall flagship store's search volume surged by 180%, affirming podcasts' unique business logic: "content seeding - mind capture - long-term conversion." This "slow conversion" model may offer more advantages in integrating brand and effect than the "watch and buy immediately" model of short videos in the era of consumption upgrading.
Furthermore, amidst the trust economy, Chinese podcasts are undergoing a value leap from "traffic business" to "trust economy." Surveys and data indicate that users have a high tolerance for podcast advertisements, with only 0.6% of listeners exiting due to ads. This suggests that the core of podcast commercialization lies not in hard selling but in emotional connections grounded in trust.
The addition of video elements makes this trust-building more multidimensional. When the host steps from behind the sound to in front of the camera, listeners' perception of their personality traits becomes more complete. This new form of social interaction, where "the body is absent but emotions are present," creates emotional stickiness that traditional media struggles to match.
Lastly, there's the visual expression of the market of ideas. Video podcasts are not mere recording processes moved to the screen but rather construct immersive thought fields around personal IPs. The success of "The Joe Rogan Experience" on YouTube demonstrates that the host's charisma, lens expression ability, and deep dialogue skills are core assets.
At this juncture, Bilibili's 1 billion traffic has not only ignited creators' enthusiasm but also sounded the counterattack horn for in-depth content against the era of fragmentation.
Currently, globally, the boundaries between audio, video, and live streaming are blurring, and their ultimate form may transcend both pure audio and traditional video. Bilibili's bet on video podcasts at this time is both a forward-looking judgment on content consumption trends and a strategic attempt to break through the commercialization dilemma.
Source: Hong Kong Stocks Research Institute