Why ChatGPT became "fixated" with goblins and trolls: OpenAI intervened to correct the problem

Image generated with AI for illustrative purposes only.

ChatGPT has developed an unusual obsession with goblins, gremlins, raccoons, trolls and orcs, starting to insert these references into their answers with an unusual frequency, referring to these figures in many cases quoting them out of context. The problem became clear after the launch of GPT-5.1 last November. Users had reported that the model seemed strangely excessive in tone, almost too confidential. This prompted the team to examine specific linguistic patterns in the responses. A researcher asked to include the words “goblin” and “gremlin” in the analysis, and according to an internal analysis described by OpenAI, the data revealed something surprising: Use of the former term had increased 175% compared to the pre-launch period, while use of the latter had seen a 52% increase. Let’s try to understand why ChatGPT became obsessed with goblins and trolls and, above all, how OpenAI solved the problem.

ChatGPT’s fixation on goblins: the causes

The reason why ChatGPT became fixated on goblins and other similar figures was traced back to a chatbot customization feature called “Nerdy,” one of the options that allowed users to change the style and tone of responses. The system message associated with this personality invited the model to recognize the “strangeness” of the world and to address issues lightly, avoiding self-seriousness. During training via reinforcement learningor reinforcement learning, a technique in which the model is guided by “reward” or “penalty” signals based on the perceived quality of the answers, some reward signals ended up favoring answers with metaphors linked to fantastic creatures. In 76.2% of the datasets analyzed, responses containing the terms “goblin” or “gremlin” received systematically better ratings than equivalent responses without those terms.

The result? The “Nerdy” personality, which accounted for just 2.5% of ChatGPT’s total responses, was responsible for 66.7% of all mentions of “goblin.” This led to a 3881.4% increase in the use of this term, as highlighted in the following graph.

The “Nerdy” personality is responsible for the exponential increase in the term “goblin” in responses provided by ChatGPT. Credit: OpenAI.

But the phenomenon didn’t stop there. The reinforcement learning it does not guarantee behavioral isolation: a pattern rewarded in one context can propagate to others, especially when it enters fine-tuning datasets. This is exactly what happened: the goblins multiplied far beyond the personality that gave rise to them.

How OpenAI solved the problem

To address the issue, OpenAI retired the “Nerdy” personality in March and eliminated the reward signal responsible for the issue, while also filtering out training data containing references to the creatures. GPT-5.5, however, had already begun its training cycle before the cause was identified. For this reason, an explicit instruction has been included in the Codex programming environment that prevents the model from mentioning goblins, gremlins, raccoons, trolls, orcs, pigeons or other creatures, unless they are strictly relevant to the request.

This story illustrates one of the more subtle challenges in developing language models: Even a single poorly calibrated reward signal can trigger a vicious cycle in which a behavior is rewarded, generalizes, transfers, and amplifies. Understanding it in time, developing the tools to identify it and correct it at its root is, according to OpenAI itself, a fundamental skill for anyone working in this field.