02/05 2026
487

Author | Lin Yi Editor | Key Point
On February 4, at the Cisco AI Summit, Li Feifei, founder of World Labs, unveiled the technical details of Marble, the company’s inaugural spatial intelligence product.
As a key architect of the current generative AI revolution, Li Feifei has consistently resisted following the crowd. She reiterated a bold, contrarian stance: Pure large language models (LLMs) cannot lead to artificial general intelligence (AGI).
From Li Feifei’s perspective, language has only been a factor in biological evolution for roughly 500,000 years. In contrast, spatial intelligence—embodied by vision and touch—sparked a neural evolutionary arms race 500 million years ago during the Cambrian explosion. If AI cannot comprehend the 3D physical world and lacks physical intuition, she argues, it will remain confined to digital pixels forever.
With this philosophy, World Labs, under Li Feifei’s leadership, aims to chart a different course from OpenAI. By building physically consistent world models, the company seeks to address AI’s perceptual limitations.
Below is a condensed summary of the interview’s core insights:
1. The Path to AGI: Language Alone is Insufficient
Li Feifei reframes AI’s trajectory through the lens of biological evolution. Language, she notes, has been part of human evolution for just 500,000 years—a blink in evolutionary time. Meanwhile, perceptual abilities like vision and touch drove neural competition during the Cambrian period, half a billion years ago.
Her conclusion: AI confined to language will remain trapped in the digital realm. Only by integrating spatial intelligence—a more ancient and fundamental cognitive capacity—can machines truly understand, reason, and interact with the 3D physical world, paving the way for AGI.
2. World Models Redefined: Physical Consistency Matters
Li Feifei described Marble as a state-of-the-art spatial intelligence model capable of processing multimodal inputs—text, images, videos, or simple 3D data—and transforming them into a fully navigable, interactive, and physically consistent 3D environment.
Unlike video models like Sora, which prioritize visual flair, Marble generates geometrically structured virtual spaces with physical properties. Users can explore these spaces freely, simulating robot movements or writing game code within them.
Marble is already being applied in game development, visual effects (VFX), robot training, interior design, and even clinical therapy. For example, researchers use it to treat obsessive-compulsive disorder (OCD) by creating triggering environments (e.g., a cluttered laundry room) for exposure therapy.
3. Synthetic Data Drives the Next Scaling Law Explosion
Why has physical-world AI lagged behind language models? The bottleneck lies in data quality. Textual data is clear and semantically rich, whereas physical-world pixels and voxels are noisy and harder to scale.
World Labs addresses this with a hybrid data strategy: combining existing web-based text, image, and video data with synthetic data and real-world sensor inputs.
Li Feifei predicts that as synthetic data technology matures, world models will experience a scaling law explosion akin to that of LLMs.
4. General-Purpose Robots: The AI Crown Jewel
While autonomous driving is often hailed as the pinnacle of AI, Li Feifei argues that general-purpose robots operate on a far more complex dimension.
Autonomous vehicles follow 2D logic: a car moves on a flat plane, with obstacle avoidance as its primary goal. General-purpose robots, however, must navigate 3D space, performing precise tasks without damaging objects—a fundamentally different challenge.
5. AI as Civilization’s New Infrastructure
Amid polarized debates about AI’s potential to destroy humanity or create utopia, Li Feifei advocates for a balanced, humanistic approach.
She compares AI to electricity a century ago. Electricity succeeded not because of vast grids, but because it illuminated schools, powered factories, and extended lives. Similarly, AI’s success hinges on becoming infrastructure that empowers civilization, enabling dignity and happiness for all.
Li Feifei revealed World Labs’ ambition to integrate spatial intelligence into industries like healthcare and agriculture by 2026, bringing AI out of screens and into the physical world.

Excerpts from Li Feifei’s Interview:
1. Spatial Intelligence: AI’s Next Frontier
Host: It’s inspiring to see World Labs’ progress. Let’s discuss your work and its significance.
Li Feifei: My focus is singular: spatial intelligence. Two years ago, I co-founded World Labs with young tech visionaries. Why spatial intelligence? It’s AI’s next frontier. Evolutionarily, neural competition began with perception, not language. While language emerged 500,000 years ago, animals developed light perception and touch 1.5 billion years ago, sparking instincts, vision, and nervous system evolution.
Instincts, though vague, emerged from physical interaction with the world. This drove an evolutionary arms race, making organisms more active and intelligent. Understanding, reasoning, and navigating the 3D/4D physical world is as foundational as linguistic intelligence. That’s why spatial intelligence is AI’s next frontier—and World Labs’ mission.
Host: Tell us about Marble.
Li Feifei: Marble is our first-generation spatial intelligence model. Though often called a “world model,” the term is overused. What matters is its capability: Marble processes multimodal inputs (text, images, videos, 3D data) and generates a navigable, interactive, physically consistent 3D world. Unlike video models, Marble’s environments support robot simulations and game development.
Launched two months ago, Marble represents the cutting edge of 3D generative modeling.
Host: Some argue AGI requires physical enhancement, not just language models. What breakthroughs do you foresee in five years?
Li Feifei: We don’t need to wait five years. Users already employ Marble in game development, VFX, robot training (with partners like Nvidia), and interior design. Unexpected applications include clinical therapy for OCD, where Marble creates tailored immersive environments, and personalized fitness training (e.g., yoga spaces). As Marble evolves, horizontal use cases will expand.
2. AI’s Social Impact
Host: You’ve dedicated your career to AI.
Li Feifei: That’s a diplomatic way to mention my age!
Host: No—you’ve pursued AI long-term, not just as a trend. What surprised you most in founding World Labs?
Li Feifei: Great question. Curiosity remains vital. I entered AI decades ago, driven by a desire to understand intelligence. Back then, AI was obscure. Today, it’s a civilization-level force, giving me responsibility to guide its development.
Two surprises stand out. First, AI’s rapid progress fuels anxiety—there’s too much to learn. Yet, as Socrates said, “I know that I know nothing.” Stay curious.
Second, polarized AI rhetoric worries me. Online debates swing between utopianism and doomsday scenarios. For a technology shaping civilization, this black-and-white discourse is irresponsible. Entrepreneurs, engineers, and citizens must guide AI responsibly. By 2026, I hope we adopt a nuanced, compassionate view—optimistic yet accountable.
Host: What AI achievements would you deem successful?
Li Feifei: A century ago, people couldn’t predict electricity’s impact. Yet, they likely hoped for bright schools, warm homes, industrialization, and longer lives. Similarly, AI’s success lies in improving civilization—enabling happiness, prosperity, and dignity for all.
3. World Models and Embodied Intelligence: Technical Challenges
Host: Let’s discuss tactics. Are Large World Models (LWMs) as computationally intensive as Language Models (LMs)?
Li Feifei: First off, there exist various types of large - scale world models. Our team is dedicated to developing world models with explicit 3D representations. These models hold great potential to revolutionize fields like robotics, gaming, entertainment, and design. There's another category that's often dubbed world models, but they're actually video - generation models. At present, the scale of our model isn't that substantial. From a broad perspective, the training volume of GPT - 5 is approximately 1026 flops (floating - point operations per second). In comparison, our Marble model is still several orders of magnitude smaller in scale.
Host: Is this simply due to a lack of sufficient data to feed these models?
Li Feifei: I think it's a combination of multiple factors. Scaling does indeed begin with data and model parameters, and data plays a pivotal role. However, on the flip side, this field is still in its infancy. The Transformer paper was published around 2016. Language models have been evolving for nearly a decade, while world models are a much newer area of research. Although we've managed to mitigate some scientific risks over the past two years, we're still in a relatively early stage of exploring model architectures. That's why our models are currently on a smaller scale. Nevertheless, considering the progress in our lab and the entire field, the next few years will be incredibly exciting as we witness large world models making significant leaps along the Scaling Law curve.
Host: This topic really piques my interest. Language models are trained on freely available public data on the internet, which makes it relatively straightforward to amass large volumes of data. But acquiring physical data is a tough nut to crack. As a result, synthetic data becomes of utmost importance, and you also have to collect real - world data at a slower pace. What limitations does this impose? Will the progress of world models be hampered because of this? Will we see the emergence of general - purpose robots, or will we be stuck with specialized robots due to data limitations?
Fei - Fei Li: This is a question brimming with information. At World Labs, we've adopted a hybrid data strategy. I must admit, I'm quite envious of my friends who conduct language research. The input of language data is fully observable, single - modal, and has a clear meaning. In contrast, the 3D world composed of pixels or voxels is far more chaotic. To push the technological envelope and create 3D and 4D worlds, we have to acknowledge that we don't have a massive amount of 3D data. Therefore, we've taken a layered hybrid approach. We leverage internet - scale text, images, and videos, combine them with simulated data, and also incorporate real - world collected data. It's similar to how self - driving companies have spent decades collecting both real and simulated data.
As for the speed of progress, although data acquisition is a challenging task, computing power is on the rise, chips are advancing, and the entire ecosystem is maturing. The data suppliers we're currently collaborating with didn't even exist three years ago. Synthetic data has indeed played a crucial role, and the models we're building will, in turn, contribute data to the simulated world, creating a flywheel effect.
Regarding general - purpose robotics, it can be summed up in a few sentences, but implementing it is an incredibly daunting task. I've been running a robotics lab at Stanford for over a decade. As a scientist, I have to admit that this is an extremely difficult problem. Just because we can see the North Star doesn't mean the journey will be short. Looking back to 2006, my colleagues and I led a team that created the first self - driving car to travel 138 miles in the desert. At that time, we predicted that self - driving cars would be widespread in 20 years. It wasn't until last year that Waymo began large - scale operations on city streets. It's truly been a long and arduous journey.
The difference between cars and robots is significant. A car can be seen as a square - shaped robot moving on a two - dimensional plane, with the primary goal of avoiding collisions. A robot, on the other hand, is a three - dimensional entity operating in a three - dimensional world. The goal of a general - purpose robot is to interact with objects without causing any damage. This is a higher - dimensional problem, compounded by the extreme difficulty of hand simulation, visual precision, and spatial understanding. That's precisely why I founded World Labs. I'm not a fan of making over - the - top promises, but this is indeed a significant problem that we're committed to solving.
Moderator: Finally, for business leaders, how should they view world models, physical AI, and the entire field you're working on?
Fei - Fei Li: Although sometimes my mindset still feels like that of a graduate student, thinking that there must be free food (a light - hearted nod to the perks often associated with academic life), for businesses, World Labs is very open to communicating with partners. World models and spatial intelligence are horizontal technologies. We often talk about robotics, simulation, and immersive interactive entertainment, but the applications aren't limited to just these areas. We haven't even scratched the surface when it comes to healthcare, educational products, field services, financial services, agricultural manufacturing, warehousing inspection, and urban planning. There's a vast array of possibilities with spatial intelligence. This is indeed the next frontier, and I invite everyone to join this journey, whether by collaborating with us or conducting independent research.