Lockdown Mode arrives on ChatGPT for security against hacker attacks: what it is and how it works

ChatGPT’s Lockdown Mode was created to respond to a concrete security problem that emerges when artificial intelligence systems begin to interact in a profound way with the Web and external applications. This advanced and optional protection mode designed for a small number of high-risk users which drastically reduces the possibility of data exposure by strictly limiting the AI’s interactions with the outside world. The goal is to mitigate an increasingly relevant class of attacks, that of the so-called prompt injection, in which a malicious actor attempts to manipulate the behavior of the model by inducing it to follow unexpected instructions or leak sensitive information from the attacked systems.

Alongside the Lockdown Mode, new “High Risk” labels also arrive which indicate, directly in the interface, the features that entail a more delicate security profile. Let’s dive into a little more detail about why these tools were introduced, how they work, and how they help users and businesses make more informed decisions about using network-connected AI.

How Lockdown Mode works on ChatGPT

When an AI system communicates only with the user, the risk perimeter is relatively controllable. Things change, and a lot, when the model can navigate online, query external services or act via connected apps. In this context, the phenomenon of prompt injection becomes potentially concrete. With this technique, attackers can exploit external content (for example, a web page or seemingly innocuous input) to “inject” instructions designed to trick the model into performing unwanted actions, as if the user had legitimately given them. This entails multiple risks, including potential data exfiltration, i.e. the unauthorized leakage of information to third parties (cybercriminals).

Lockdown Mode responds to this scenario with a deterministic approach, i.e. based on rigid and predictable rules. By activating it, some ChatGPT features are limited or disabled if strong control over data flows cannot be guaranteed. A key example is web browsing: in Lockdown mode, Internet access is confined to cached content, avoiding real-time network requests to the Web. In simple terms, this means that it is much more complex for attackers to manage to get data and information out and, where this level of guarantee is not achievable, ChatGPT’s web browsing function is simply turned off.

Diagram of how ChatGPT Lockdown Mode works. Credit: OpenAI.

This setting is not intended for the average user, but for particularly exposed profiles such as executives, security managers or teams that manage critical data. It is no coincidence that the Lockdown Mode is “grafted” on the protections already present in the Enterprise plans, which include techniques such as sandboxing (the isolation of execution in controlled environments), controls against URL-based exfiltration and monitoring and audit systems.

It is initially available to users with ChatGPT Enterprise, ChatGPT Edu, ChatGPT for Healthcare, and ChatGPT for Teachers, and can be managed by administrators by creating specific roles and applying additional restrictions beyond standard policies. Regarding the extension of the new security mode to all other users, OpenAI said:

We plan to make Lockdown Mode available to consumer users in the coming months.

One notable aspect is the granular control over connected apps. Because many business workflows depend on external integrations, administrators can decide precisely which applications and actions are allowed when Lockdown Mode is enabled.

It also gets the “High Risk” label for safety

Alongside the Lockdown Mode, comes a second important piece: the systematic introduction of the “High Risk” label for some features which, while useful, expand the attack surface. Functions that require access to the network or external resources are reported consistently across ChatGPT, ChatGPT Atlas and Codex, accompanied by clear explanations of what changes and what risks are introduced.

The objective is to immediately clarify when a specific function implies a greater security risk: as reported by OpenAI, however, these labels are not definitive and could be removed over time.