How Codex GPT-5.2 works: what is OpenAI's latest move to counter Google and Meta

Credit: OpenAI.

Codex GPT-5.2, the agent coding model defined by OpenAI «(as) the most advanced for software engineering in complex contexts» (mainly engineering and cybersecurity), represents the most recent attempt by the artificial intelligence giant to strengthen its position in the increasingly intense competition in the sector. The release of GPT-5.2-Codex is in fact a precise signal on the priorities of the company directed by Sam Altman who, in recent weeks, triggered the “red code” for OpenAI, precisely due to the pressing competition from Google Gemini. Let’s see the characteristics of GPT-5.2-Codex and its role in the competition between OpenAI, Google and Meta.

The features of GPT-5.2-Codex

GPT-5.2-Codex was born as a variant of GPT-5.2 optimized for so-called agentic coding, i.e. the ability of an artificial intelligence system to act as an autonomous agent that plans, executes and corrects sequences of complex operations over time. Unlike models that respond to single, isolated requests, an agent maintains context over long time horizons and interacts with tools such as terminals and development environments. To achieve this, OpenAI worked on context compaction, a technique that allows relevant information to be preserved while reducing token consumption, thus improving the efficiency and coherence of reasoning.

From a practical point of view, this translates into superior performance on tasks such as refactoring, i.e. reorganizing code without changing its behavior. GPT-5.2-Codex also demonstrates greater reliability in Windows environments, which have historically been more complex for automated tools to manage, and integrates more advanced visual capabilities for interpreting screenshots, technical diagrams and mockups, i.e. preliminary drafts of an application’s interface.

Given the progress made by OpenAI, it is not surprising that GPT-5.2-Codex has achieved interesting scores in some tests. In the SWE-Bench Pro benchmark, a test where the model is given a real repository and asked to generate a working patch the model achieved a score of 56.4%; while in the Terminal-Bench 2.0 benchmark, a test that simulates authentic terminal environments with complex tasks such as compiling code or configuring servers, the model achieved a score of 64.0%. These results indicate that the model does not limit itself to “writing code”, but also has the ability to operate in realistic and dynamic contexts.

In the SWE–Bench Pro benchmark, the system is provided with an archive of real code and is asked to produce a corrective change capable of solving a concrete software engineering problem. In the Terminal–Bench 2.0 benchmark, it is used to evaluate the behavior of artificial intelligence agents in realistic terminal contexts, with tests that include compiling code, training models and setting up server infrastructures. Credit: OpenAI.

A particularly delicate aspect concerns cybersecurity. As the capabilities of models increase, so does their effectiveness in identifying vulnerabilities, i.e. flaws that can be exploited to compromise a system. Techniques such as fuzzing, which involves testing software with random or malformed inputs, or zero-shot analysis, in which the model approaches a problem without providing it with preliminary examples, become more powerful when supported by agentic systems.

OpenAI recognizes that the same capabilities that help defenders could be abused. In this regard, the company has in fact explained:

These advances can strengthen cybersecurity at scale, but they also introduce new risks of misuse that require careful implementation. While GPT‑5.2-Codex does not reach the “High” level of cybersecurity capabilities according to our Readiness Framework, we are designing our deployment approach⁠with future capability growth in mind.

GPT-5.2-Codex is already being released on all Codex features reserved for paid ChatGPT subscribers, while OpenAI is completing the work necessary to make the model also available via API gradually and securely in the coming weeks. At the same time, an experimental project is being launched, based on invitation, which will allow verified professionals and organizations that are engaged in cyber defense to access in a controlled manner more advanced capabilities and models with fewer restrictions, with the aim of «(balance) accessibility with security», to quote OpenAI verbatim.

The competition between OpenAI, Google and Meta

This release should also be read from the perspective of competition with Google (which gained quite a few points with the release of the Gemini 3 family) and Meta. We must not forget, in fact, how concerned Sam Altman, CEO of OpenAI, was about the exponential growth of competing models. Growth that triggered the “red code” in the offices of the AI giant. Regarding this, Fidji Simo, head of applications at OpenAI, explained on the occasion of the release of GPT 5.2:

We announced the “code red” to clearly signal to the company that we want to concentrate resources in a specific area; it’s a way to prioritize and determine what can be put on the back burner. (…) We have increased the resources dedicated to ChatGPT in general; I’d say this helps in the release of the model, but it’s not the reason why it comes out this week.