Epic Evolution: OpenAI Launches Mac Version of 'Super Lobster': Codex Evolves into a Cyber Colleague

Home

Finance

ICV

Smart City

Digital Live

Cloud

Optics

Home Finance AI ICV Smart City Digital Live Cloud Optics

04/17 2026 565

Who wouldn't want a Mac that can automatically get work done?

Another day of envying Mac users.

Early this morning, OpenAI officially released a new version of Codex for macOS, with the accompanying message:

Codex for (almost) everything.

It can now use apps on your Mac, connect to more of your tools, create images, learn from previous actions, remember how you like to work, and take on ongoing and repeatable tasks.

Codex is (almost) capable of everything.

It can now schedule applications on your Mac, connect to more productivity tools, and generate images. Additionally, Codex can learn from historical behavior, remember your work preferences, and autonomously undertake ongoing and repetitive tasks.

In short: The 'native lobster' for Mac is here.

After recruiting the founder of OpenClaw (Lobster) into the company in mid-February, OpenAI has been working for the next two months to integrate OpenClaw's capabilities into Codex. Now, the results are finally visible, and it's a 'blockbuster' right from the start.

Image Source: X

Next, let Leitech (ID: leitech) take a look at what the latest Mac version of Codex can do.

From Developer to Maintainer: Codex Achieves Full Automation

OpenAI's published demo video of Codex first showcases its capabilities for autonomous development and debugging in a Mac environment.

The user gives Codex an instruction: Test a 'Tic-Tac-Toe' application and fix all bugs. Upon receiving the instruction, Codex autonomously opens the local Xcode project on the Mac, clicks sequentially on the grid of the Tic-Tac-Toe project, ultimately locates the program code and executes the start command.

Image Source: Leitech

From this, it is evident that Codex does not directly invoke test code through backend APIs but truly 'uses' the application like an ordinary user through the graphical user interface (GUI). The difference lies in that the former merely represents that it has solved the problems of instruction understanding and code execution, essentially relying on the application's open APIs; the latter, however, does not need to invoke the application's APIs and can complete tasks through image recognition.

This means that Codex possesses true 'universal execution capabilities,' as many third-party applications simply do not provide open APIs. For previous AIs, these applications were 'black boxes'—they knew of their existence but could neither operate nor read them.

Moreover, this demonstrates OpenAI's powerful multimodal visual recognition and coordinate mapping capabilities. Codex can 'understand' the UI elements on the simulator and decide which pixel coordinates on the screen the mouse should click to complete the chess move.

Next, Codex automatically proceeds to testing and directly identifies a bug: 'The human makes one move, and the computer opponent makes two moves.' This is the most impressive part of the entire demo because Codex does not refer to any error documentation but completely judges the application's behavioral bug through visual observation and logical reasoning of the game rules.

Image Source: Leitech

To some extent, this indicates that Codex already possesses certain autonomous decision-making and 'human-like' reasoning capabilities. After identifying the problem, it begins to fix the Tic-Tac-Toe program, then recompiles and runs the program to confirm that the bug has been resolved. In another video, Codex also utilizes code assistance plugins to autonomously explore local front-end projects without explicit file path prompts and provides a code modification solution with the smallest scope of changes.

It can be said that OpenAI has intuitively demonstrated Codex's complete workflow capabilities from the front end to the back end through two simple cases. Moreover, all of this is accomplished through visual recognition of the graphical interface, indicating that it already possesses full-process closed-loop development capabilities covering almost all development environments.

To be honest, this is truly a bit frightening. If, in the past, developing applications with Codex required some programming knowledge to solve issues like API access, now you can directly skip these processes and let Codex operate the computer and generate the desired program like a 'real person.'

Not Just a 'Producer,' but a 'Collaborator'

Another video showcases Codex's execution capabilities at the multimodal level. In this video, the user asks Codex to generate an image for the main visual area of a webpage, without even providing specific image style prompts.

So, how does Codex handle this? It does not directly generate an irrelevant image but first reads the local project files and then combines the information read from the graphical interface to determine that the webpage's theme is 'Late-Night Fast Food in Philadelphia' and generates an image of 'hamburger + fries + late-night lights' based on this.

Image Source: Leitech

Furthermore, Codex further analyzes the layout requirements of the 'main visual area.' To avoid blocking the text on the left side, the generated image needs to leave sufficient space on the left, and the visual focus should be biased to the right. This is something that previous AIs struggled to achieve because most auxiliary development tools were still at the 'pure text code generation' stage, unable to understand the 'visual elements' in webpages and even requiring users to manually specify image generation and path introduction.

Image Source: OpenAI

After confirming that the image meets the requirements, Codex automatically executes the instruction to move the generated image to the local project folder and proceeds to modify the HTML file, replacing the original placeholder with a real image tag and local path; at the same time, it subtly adjusts the CSS styles to ensure that the image perfectly fits the webpage's size and finally refreshes the webpage in the built-in browser to display the final effect.

OpenAI also demonstrated how Codex can completely autonomously build a webpage. After receiving the user's development request for a 'Lego tracking webpage application,' Codex invokes development software to complete the code writing and automatically starts a development server locally, loading the page on Codex's built-in browser panel.

Subsequently, the user can directly tell Codex any requirements, and it will adjust the corresponding elements of the webpage based on data obtained through image recognition and other means. For example, in the video, the user only provides the requirement to 'reduce the font size' in the corresponding edit box, and Codex automatically completes a series of steps such as font size reduction and re-layout, truly achieving 'what you see is what you get.'

Image Source: Leitech

For webpage developers, Codex's role has actually changed. Previously, it was more often seen as a 'code producer' for debugging and webpage framework construction, with the final integration still requiring human intervention.

Now, it has become your 'collaborator,' and you can entrust it with more work. Even if this involves specific visual element modifications and UI fine-tuning—something that AIs might have struggled to accurately understand your intentions before—it is different now because it can also 'see' the webpage.

Your Exclusive Personal Assistant is Here

In the demonstrations of the last two videos, OpenAI intends to turn Codex into your 'personal assistant.' In the video, the user uses just one sentence to have Codex simultaneously search four distinct SaaS platforms: Slack, Gmail, Google Calendar, and Notion.

Next, based on its semantic understanding capabilities, Codex autonomously analyzes notifications and information from each platform and sorts them according to priority, categorizing the information into 'urgent' and 'can be deferred'; at the same time, based on the specific content of the information, it reminds the user that although some information may seem like daily reports, they involve items that require approval and need extra attention.

Image Source: Leitech

After summarizing and categorizing the information, the user issues a new instruction: 'Keep an eye on it and notify me.' Codex directly establishes a background task named 'Teammate - Hourly' and automatically sets the specific operating rules for this background task: check each SaaS platform once an hour and only remind the user when there is substantive information added (or when the latest information cannot be obtained).

This feature is actually the reason why OpenClaw became popular before—a fully automated 'employee' hanging in the background. With just an instruction, Codex will continuously monitor and execute related tasks in the background without requiring the user to take proactive action, thus transforming AI from 'passive response' to 'active assistance.'

Moreover, Codex's current automated operations can run in the same thread. By simply opening the corresponding chatbox, you can have the AI repeat or continue executing previous tasks without needing to reassign work to it. So, don't underestimate the simplicity of the video demonstration; in fact, as long as the instructions are detailed enough, Codex can also execute complex automated workflows like OpenClaw.

The video demonstration also shows that after monitoring a new email, Codex directly provides a summary of the email content and asks the user if they need help drafting a reply, which is also inferred and set by Codex based on the user's different task requirements.

Image Source: Leitech

In the last video, based on the user's request, Codex accesses the company's internal knowledge base through plugins and finds the corresponding product report, then generates a briefing for executives. Throughout the process, the user only provides the product name and what they need Codex to do, without mentioning where the product report is stored or how to find it.

Fully automated addressing, quick retrieval of large numbers of different documents and images, extraction of key information, and generation of documents. With just one sentence from the user, Codex autonomously breaks down and executes multiple steps; moreover, it does not require the enterprise to provide private API interfaces but only calls documents through the user's existing permissions, minimizing the risk of data leakage for the enterprise.

Of course, Codex now also possesses the ability to directly create corresponding documents. In the video, Codex directly organizes the recent issues of a GitHub project on the webpage into a spreadsheet by theme and then outputs it as an Excel file. Combined with the capabilities mentioned earlier, you can actually use it as an efficient 'data collector,' collecting and summarizing data from private libraries to public data into corresponding documents and then directly calling them in other work.

Currently, Codex has integrated more than ninety mainstream office and development plugins, which users can freely invoke in the chatbox. What else can be said? Just get to work.

Why Mac?

To be honest, OpenAI's latest version of Codex is more suitable for most users than OpenClaw. This is because it does not require users to provide system-level permissions, sacrificing security and privacy for convenience. Instead, it achieves stable and secure operation by leveraging macOS's comprehensive accessibility APIs and underlying sandbox control. This is currently not possible on the Windows side (due to complex permission management and chaotic APIs).

Moreover, Codex is clearly deeply integrated with Apple's official development tools. It can not only directly read the project structure of Xcode but also handle settings such as Swift package dependencies and simulator status, while automatically invoking Apple's official development documentation and API specifications for real-time error correction (which is crucial for Apple developers).

Another very critical factor is the Apple ecosystem. Many people overlook the impact of the hardware ecosystem when discussing AI Agents. Imagine if you forget to open a remote desktop program when asking an AI to perform a task on Windows, you basically have to go to the computer to operate it; whereas the collaborative ecosystem between Mac and iPhone/iPad allows users to easily view Codex's work results on mobile devices and issue new instructions with ease.

When you arrange for Codex to work at home while you go out to have fun, the native remote management functionality undoubtedly provides a better experience than third-party tools (although Apple Remote Desktop is really expensive).

In summary, the release of Codex for Mac essentially marks the AI tool's official transition from a 'passive assistant' to an 'all-powerful agent' that directly takes over the system desktop.

It is no longer a tool that requires you to rack your brains to solve API interface and various usage issues. Instead, it is a 'cyber colleague' that can understand the screen, autonomously operate different software, and even coordinate cross-platform work on your behalf (suddenly wondering, can Codex help me beat Cyberpunk 2077?).

Anyway, the pressure is now on Microsoft, the old rival of macOS. When will Windows roll out similar functionality? Copilot has been struggling for a year or two and still looks the same, which really doesn't live up to the massive resources Microsoft has invested.

OpenAIApplemacOS

Source: Leikeji

Images in this article are from: 123RF Royalty-Free Image Library

Solemnly declare: the copyright of this article belongs to the original author. The reprinted article is only for the purpose of spreading more information. If the author's information is marked incorrectly, please contact us immediately to modify or delete it. Thank you.

Newest

Links