A new study finds that there is more text created by AI than written by humans on the Internet

Over half of the articles available online are said to be written by AI. This was revealed by a recent study conducted by the SEO analysis company Graphite which recently published the results of a research during which it emerged that in November 2024 the quantity of articles generated by artificial intelligence published online exceeded that of texts written by human beings. It is the first time something similar has happened: for a short period, machines produced more textual content than “flesh and blood” authors. The study, which examined 65,000 English-language articles collected from the vast open source Common Crawl archive, however, also showed that growth in artificially generated content has plateaued since May 2024.

A key element of this research is that, despite the amount of text written by algorithms, most of it does not appear in search engines or in chatbots like ChatGPT: according to Graphite, more than 80% of the results shown by Google and conversational assistants still come from articles written by people. The data suggests that, at least for now, artificial intelligence has not “taken control” of online information, but has rather joined humans in a form of co-production that is increasingly widespread and difficult to distinguish.

Graphite’s investigation and the method used

Graphite’s investigation was based on a very specific methodology. To assess the spread of AI-created content, the researchers used an AI detector developed by SurferSEO, which can analyze 500-word blocks of text and estimate the probability that they were written by a language model. An article is considered “artificial” if more than 50% of its content was generated by an AI system. Before applying this classification, the scholars wanted to verify the precision of the tool, calculating the rate of false positives (i.e. human texts incorrectly labeled as artificial) and false negatives (texts generated by AI not recognized as such). The analysis showed a false positive rate of 4.2%, testing articles published before the launch of ChatGPT, a period in which, in all likelihood, the texts were entirely written by people. The false negative rate, however, stopped at 0.6%, using articles produced by GPT-4o, one of the most advanced linguistic models available at the time of the research, using a very detailed and defined prompt to produce texts on various topics. From these preliminary checks it emerged that the classification made with the help of the SurferSEO detector is quite reliable with a correct detection rate of the AI ​​equal to 99.4%.

The results collected outlined a clear trend: since November 2022, when ChatGPT was made public, the production of automated content has literally exploded. In just 12 months, AI-generated articles went to represent around 39% of published texts, briefly surpassing human ones in November 2024. However, since 2025 the situation has stabilized and the two values ​​have aligned. Graphite hypothesizes that this slowdown is due to the poor performance of texts produced by AI in search engines: Google’s search engine, for example, tends to give priority to content written by people, considered more reliable and relevant. A second research conducted by the same company found that 86% of the articles present in Google Search are of human origin, while only 14% come from artificial intelligence systems. A similar figure surprisingly also emerges from the main AI chatbots: ChatGPT and Perplexity cite 82% of articles written by human beings and only 18% of articles whose text is generated using artificial intelligence.

In the graph you can see how the contents written by AI (the yellow line) have at a certain point exceeded the quantity of contents written by human authors (green line), at least according to the analysis carried out by Graphite. Credit: Graphite.

The limitations of the study

The Graphite study which discovered the percentage of diffusion of AI texts online, although interesting, still has some limitations, clearly highlighted by the authors of the study themselves. First of all, it should be remembered that the Common Crawl database, the data source used for the research, does not cover the entire Web. Many paid sites prevent the indexing of their contents, which are therefore not included in the samples analyzed in the study. This means that the percentage of articles written by humans may actually be higher than estimated. On top of this, the study’s authors admit that it can be difficult to distinguish articles written by AI and then reworked by humans, as well as the fact that AI models continue to improve and, therefore, may become increasingly difficult to detect.