How AI detectors work and how to best use them: what is an AI text detector

“AI text detectors”, or AI-generated text detectors, are software that analyzes the style, structure and words used to estimate the probability that a text was written by an artificial intelligence.

The problem, however, is that AI is also improving quickly. Tools like ChatGPT are now capable of writing in a natural and realistic way, so much so that distinguishing between a text entirely written by a person, one revised by AI and one generated from scratch is becoming increasingly complicated. Precisely for this reason, AI detectors are not 100% reliable and should never be used as the sole criterion for making important decisions. They can give an indication, but we cannot completely rely on them.

Let’s see in more detail what AI text detectors are, how they work and what the main critical issues are.

What are AI text detectors and how do AI detectors work

AI text detectors are software designed to analyze text and estimate whether it was generated by artificial intelligence. Some assign a percentage (for example: 80% AI, 20% human), others simply classify the text as “human”, “hybrid” or “AI-generated”. Some even highlight suspicious phrases and explain why they seem artificial.

Detectors, as we know them today, have existed since 2019. Shortly after the arrival of the first GPT, we realized how skilled machines were becoming at simulating human language and the risks this entailed. Since then, AI has made enormous progress, and the software to recognize it has tried to keep up.

Today there are dozens of different tools, both free and paid. Among the free ones, the best known and most used are: GPTZero, zeroGPT, Scribbr, Neural Writer, Grammarly’s AI detector and NoPlagio. Let’s see how they understand if a text was written by an AI.

There is no single way to understand if a text was written by an artificial intelligence. Each AI detector uses different strategies, but almost all are based on some typical characteristics of AI writing. The main ones are:

Linear sentences and poorly researched words: Texts generated by AI tend to respect grammar, follow coherent patterns, linear sentences and poorly researched words. Human texts, however, tend to be more varied and imperfect.
Uniformity in sentence length (“Burstiness”): AIs tend to maintain more uniform and regular sentence lengths. This parameter measures precisely how much the text “swings” in terms of length and complexity of the sentences. The less uniform the text, the more “human” it is.
Predictability of sentences (“Perplexity”): AIs use very frequent and predictable linguistic patterns. “Perplexity” is used to measure this “predictability”: the more predictable the text, the more likely it was written by AI.

To these, you can add techniques such as:

Hidden Watermarks: Some AI models can intentionally insert invisible “fingerprints” into text: specific word frequencies, syntactic patterns, rhythms. But only those who know the pattern can actually detect them and use them to design a detector, so this technique is limited to developers.
Stability tests: Some detectors modify the text by changing some words with synonyms and measure how much the “perplexity” changes. If it varies a lot, it is likely that the text was written by an AI, otherwise, it could be by a human.

All these techniques help to build an estimate, but none of them guarantee us that the answers they give us are correct. AI detectors can make mistakes. And they do it often.

How reliable are AI detectors and how to use them

ChatGPT, Gemini, DeepSeek and Claude write differently. So how can a detector always be able to recognize a text written by AI? The short answer is that it can’t.

The ability to correctly classify a text as “written by AI” or “written by a human” depends on many factors:

from the model with which the text was generated;
by the length of the text: the longer it is, the more reliable the analysis is;
by how well the person who produced it writes;
by the language used: the detectors work better with English than with Italian;
on whether the writer is using his or her native language or not. For example, if an Italian person writes in English, they will tend to use simpler and more standard sentences: they could therefore be mistaken for an AI.

The degree of AI intervention also matters. If we use a model only to rephrase a few sentences, or to write everything from scratch, the detector may not be able to distinguish between human text and artificial text.

Furthermore, there are methods to ask the AI to write in a more “human” way and tools that “humanize” the text (called “AI humanizer”), which allow it to overcome the control of most detectors. We tested these techniques on a text completely written by an AI, then created a “humanized” version and loaded both onto five free AI detectors. The responses varied greatly, but most recognized the first text as generated by an AI and the second as written by a human, even though they were both artificial.

A good rule, therefore, if you want to use these tools, is not to consider them infallible predictors, but simple indications. And above all, not to base important decisions only on their answers.