Texts are increasingly being written entirely or partially with an AI text generator or an AI chatbot.
While the proportion of AI text on the internet, in homework, bachelor's theses, or other academic papers is still low, it will likely increase rapidly over the next 2 to 3 years. And this can be problematic in many ways.
In this article, I not only show you how you can recognize AI text yourself, but have put 13 of the most well-known AI detectors through their paces.
For this, I generated 30 texts each with 5 different AI tools and checked whether and with what probability the detectors recognize them as AI text. Additionally, I tested all AI detectors for false positives.
In total, I conducted 585 individual tests.
- Originality.ai leads with 99% detection rate for both non-English and English texts – in our tests, detected 14 of 15 non-English and all English AI texts
- CopyLeaks is the best free alternative with 80% detection rate and only 6.7% false positives on human texts
- AI text can be recognized by 'smooth' language, perfect grammar, and uniform formatting – but modern models are becoming increasingly human-like
1. How Can You Recognize AI Text?
There are several characteristics in which AI text can differ from human-written text. These include:
- Lower variability in word or phrase choice: Humans use a broader spectrum of words and phrases. They mix formal language and colloquialisms more often, for example.
- Repetitions: AI text generators tend to repeat words, phrases, or content points more frequently.
- Perfect spelling and grammar: Human-written texts contain spelling or grammar errors more often than AI texts.
- Disproportionate use of certain words or phrases, e.g., in English the word "delve" or "It's important to note"
- No neologisms: AI text contains few to no word creations.
- Only known compound words: AI texts usually only contain familiar word combinations, no new or unusual ones.
- Uniform formatting: AI text is often very uniformly formatted. Human texts offer greater punctuation variety, e.g., dashes, semicolons, parentheses, etc., or more varying paragraph lengths.
- Sentence length and structure: AI text generators tend to create shorter sentences with fewer subordinate clauses.
- Dialect and colloquial language: AI text generators typically write in standard language with little dialect (regional variations of a language) or colloquialisms.
- Occurrence of outdated words or phrases and idioms (AI texts contain more common words and few to no idioms)
In short:
AI texts are often "smooth as silk" and have no rough edges. Human texts are characterized by greater linguistic variety, but also more errors.
Despite these differences, it can be difficult for humans to recognize AI text. This was already shown in a 2019 Cornell University study, in which 66% of all test subjects believed AI-generated fake news was real.
And that was in 2019. AI texts have improved significantly since then.
There are huge qualitative differences between the output of GPT-2, the language model used in the study, and current language models like GPT-4o, Claude 3.5 Sonnet, or Gemini 1.5 Pro.
Accordingly, AI text detectors (also called "AI Content Detectors") will become increasingly important companions for website operators, teachers, or professors in the coming years.
2. AI Text Detectors Compared
Rank | Tool | Detection Rate (Non-English)* | Detection Rate English | False Positives | Non-English Support | Price | API |
|---|---|---|---|---|---|---|---|
| 1 | Originality.ai | 14 / 0 / 1 | 15 / 0 / 0 | 2 / 15 | ✓ | $0.01 per 100 words | ✓ |
| 2 | CopyLeaks | 12 / 0 / 3 | 9 / 0 / 6 | 1 / 15 | ✓ | free | ✗ |
| 3 | Sapling | 6 / 1 / 8 | 13 / 1 / 1 | 1 / 15 | ✗ | free | ✓ |
| 4 | Crossplag | 5 / 1 / 9 | 12 / 0 / 3 | 0 / 15 | ✗ | free | ✗ |
| 5 | AI Detector Pro | 4 / 0 / 10 | 13 / 1 / 1 | 1 / 15 | ✗ | Freemium | ✗ |
| 6 | GROVER (no longer available) | 3 / 0 / 12 | 4 / 2 / 9 | 0 / 15 | ✗ | free | ✗ |
| 7 | Kazan SEO | 1 / 0 / 14 | 10 / 0 / 5 | 0 / 15 | ✗ | free | ✗ |
| 8 | GPTZero | 0 / 15 / 0 | 13 / 1 / 1 | 0 / 15 | ✗ | Freemium | ✓ |
| 9 | Content at Scale | 0 / 0 / 15 | 12 / 0 / 3 | 0 / 15 | ✗ | free | ✗ |
| 10 | Huggingface | 0 / 0 / 15 | 10 / 1 / 4 | 0 / 15 | ✗ | free | ✗ |
| 11 | Writer.com | 0 / 0 / 15 | 6 / 5 / 4 | 0 / 15 | ✗ | free | ✗ |
| 12 | Unfluff | 0 / 0 / 15 | 0 / 0 / 15 | 0 / 15 | ✗ | free | ✗ |
*The "Non-English" detection rate was tested using German texts as a representative sample for non-English language detection capability. Detection performance may vary for other languages. The rate is derived from three metrics, with the first having the most weight. The second becomes the tiebreaker when two tools are equal on the first:
- Highly likely AI text (75% to 100%)
- Possibly AI text (25% to 74.9%)
- Highly likely not AI text (0% to 24.9%)
3. Detailed Test Results
3.1 Originality.ai

Originality.ai is one of the most popular and well-known AI detectors on the market. According to its own claims, it can detect AI text created with GPT-3, GPT-3.5, GPT-4, and Google Bard.
In our first test in January 2023, it still ranked as the second-best tool. It was able to detect 12 of 15 AI texts with high certainty, two partially, and only one not at all.
Since then, the developers have significantly improved the tool:
In the current test from July 2024, it was able to detect 15 of 15 English AI texts and 14 of 15 non-English AI texts (tested with German), landing in first place.
Experiments have also shown that Originality.ai is now much harder to trick. The tool also detects AI texts very well where typical obfuscation methods such as adding punctuation, spelling and grammar errors, synonyms, or rewriting by AI tools have been applied.
The only drawback:
The rate of false positive results, i.e., human texts that Originality.ai incorrectly classified as AI texts, was slightly elevated at 2 of 15 texts (13.3%) compared to other detectors.
Originality.ai additionally offers a plagiarism checker that can be run simultaneously with the AI detector.
Plagiarism and AI checking is usually quick and the result is clearly displayed:

For those who want to check text in bulk or in their own software, there is also an API:

The price of Originality.ai is fair at $0.01 per credit. For one credit, you can test 100 words with the AI detector or optionally with the plagiarism checker.
That means for AI detection and plagiarism checking of a blog article with 1,000 words, you pay $0.20.
And I'm happy to pay that because free tools are often slower in detection, more frequently give server errors, or have daily limits.
3.2 CopyLeaks AI Content Detector

The AI Content Detector from Copy Leaks is the second-best tool for detecting non-English AI text.
It detected non-English AI text in 12 of 15 cases, which corresponds to a detection rate of 80%, while for English AI texts the detection rate was only 60%, which is a mediocre result.
The detector itself is easy to use and delivers quick results. There is even a Chrome Extension available recently.
Unfortunately, the tool doesn't output precise detection data, only whether it's human-made or AI-made text.
3.3 Sapling AI Content Detector

The free AI Content Detector from Sapling also achieved a very good result in the test.
Like Originality.ai, Sapling also improved compared to the first test from January 2023. While the tool detected 12 of 15 English AI texts last year, it now detected 13 with high probability, one with medium probability, and only one not at all.
For non-English AI texts, the result is significantly worse, where the tool only detected 6 with high probability, one with medium probability, and 8 not at all.
The detector is minimalist and is one of the tools that delivers results fastest.
I really like that the text check starts as soon as you enter text and you don't have to click an additional button.
3.4 Crossplag AI Content Detector

The AI Content Detector from Crossplag achieves the same good result in the test as the detectors from Sapling and Content at Scale with 12 of 15 detected English AI texts.
For non-English AI texts, Crossplag also has its problems and only detected 5 of 15 with high probability. Eight texts were not detected at all and one was detected with medium probability.
The tool is free to use. If you test texts more frequently in succession, you will be prompted to register, but can then continue to use the detector for free.
3.5 AI Detector Pro

AI Detector Pro is one of my favorites in the test with English AI texts alongside GPTZero and Originality.ai. It achieved the same detection rate as GPTZero:
Of 15 AI texts, it detected 13 with high probability as such. It only detected one not at all and one with medium probability.
With non-English AI texts, the result is considerably worse, where the tool only detected 4 with high probability and the remaining 10 not at all.
It features a modern and easy-to-use user interface. Additionally, test results are generated very quickly.
Unfortunately, you can only make 3 free queries per day and then have to book a paid plan.
AI Detector Pro also offers an API and can be used as a white label. However, there are no prices for this on the website or in the customer area.
3.6 GROVER

GROVER is a language model developed by researchers at the University of Washington and the Allen Institute for Artificial Intelligence (AI2).
It is intended to distinguish AI-generated news from real news.
However, the free demo of the language model (no longer available) did not perform well in detecting AI text:
It could only detect 4 of 15 AI texts. For two texts that the tool recognized as human-made, it at least gave a lower probability for its assessment.
The poor detection rate is probably because the tool (like the associated research paper) is from 2019 and has not been improved since.
3.7 KAZAN SEO AI GPT3 Detector

The AI GPT3 Detector from KAZAN SEO detected two-thirds of all AI texts, which is okay but not impressive.
Four of the five texts came from Jasper and the fifth undetected one from ChatGPT. Of the non-English AI texts, the AI GPT Detector detected only one with high probability.
A negative aspect in the test was that about every second to third test fails and the tool displays that the server is overloaded.
Like other KAZAN SEO tools, the AI GPT Detector is free to use but requires registration.
3.8 GPTZero

GPTZero is probably the most widely used AI content detector worldwide with over 1 million users.
And that's no wonder, because GPTZero offers a high detection rate. In the test, it was able to detect 13 of 15 AI texts flawlessly, one not at all, and one with medium certainty.
Non-English AI texts are not detected by GPTZero at all.
Unfortunately, it is very susceptible to subsequent editing. Changing just individual characters can cause an entire sentence or paragraph to no longer be recognized as AI-generated.
The advantage over Originality.ai:
The tool is free to use. However, you can only test texts with a maximum of 5,000 characters, which corresponds to about 500 to 600 words.
3.9 Content at Scale AI Content Detection

The AI Content Detector from Content at Scale came in third place in the test. It was able to detect 12 of 15 AI texts as such. Non-English AI texts were 100% incorrectly detected.
You can test texts up to 2,500 characters, which corresponds to about 380 to 500 words in English.
It offers a nice user interface with a bar display showing which direction a text is going. It also shows you words and the character count of the entered text, which should be self-evident but not all detectors offer.
3.10 Huggingface GPT-2 Output Detector

The GPT-2 Output Detector from Huggingface is based on a RoBERTa model that has undergone fine-tuning with output from a GPT-2 model provided by OpenAI.
In a free online demo, you can test texts in the browser and see in a bar below the input field how high the probabilities are that a text was written by AI or humans.
In the test, it detected 10 of 15 AI texts with high probability and one with only lower probability. However, this is not surprising since the AI texts were generated with GPT-3.x.
Unfortunately, the Output Detector was not very reliable in the test. Every second test, the Detector got stuck or displayed a server error.
3.11 Writer.com AI Content Detector

The AI Content Detector from Writer.com is among the three worst tools in the test. It could only detect 6 of 15 AI texts flawlessly.
However, credit must be given to the tool for partially detecting 5 AI texts and not detecting only 4 at all.
Additionally, the tool is easy to use and usually delivers quick results.
Non-English AI texts are not detected by the tool at all yet.
3.12 Unfluff

Unfluff is a free app that analyzes texts for readability. It is supposed to detect not only unnecessary filler words or overly long sentences but also AI text. Unfluff is also available as a WordPress plugin.
In practice, this unfortunately works little to not at all.
For 10 of 15 AI texts, the Fluff Score was 100 out of 100. For 5 of 15 texts, the Fluff Score was under 100 but was between 76 and 99%, so you can't call that "detected."
Another annoying drawback:
The app is extremely slow. So slow that it's almost unusable.
3.13 GLTR (Out of Competition)

GLTR (short for Giant Language Model Test Room) is a joint project from the MIT-IBM Watson AI Lab and HarvardNLP.
Similar to the GPT-2 Output Detector, the tool has access to GPT-2 (in the 117M version) and analyzes text inputs to determine what GPT-2 would have predicted at each point.
Unlike other detectors, however, it does not output probabilities for whether a text was written by a human or an AI tool (which is why it is only partially comparable to other tools and is out of competition in the test).
Instead, it overlays a color mask on the text that shows how likely each word would be in a text generated by GPT-2:

A word that is among the most likely words is highlighted green (Top 10), yellow (Top 100), red (Top 1,000), and the rest of the words purple.
That means the more green and fewer red and purple words a text contains, the more likely it was written by GPT-2.
You can use the tool as a demo for free.
4. Test Methodology
I created 15 English texts with AI, all between 80 and 250 words in length. For this, I used the following 3 tools:
- Jasper.ai (I entered the prompt into a document and used the continue writing function)
- GPT-3 via the OpenAI Playground (
text-davinci-003with Temperature 0.7 and Top P 1) - ChatGPT (with GPT-3.5 Legacy)
The prompts I used were kept simple. They were as follows:
- How do I plant a maple tree?
- Feeding a cat
- Cryptocurrency is
- Water is a scarce resource
- The dangers of artificial intelligence
Additionally, I created 15 non-English texts (using German as a representative language) with Claude, Google Gemini, and ChatGPT (with GPT-4). For this, I used the following prompts:
- How do I feed a cat?
- How do I plant a maple tree?
- What are the dangers of artificial intelligence?
- Write me a romantic poem about love with 6 stanzas
- What is cryptocurrency?
The detection rate in the table is derived from the following 3 metrics, with the first having the most weight.
The second metric becomes the tiebreaker when two tools are equal on the first metric:
- Highly likely AI text (75% to 100%)
- Possibly AI text (25% to 74.9%)
- Highly likely not AI text (0% to 24.9%)
Finally, I also tested the detectors for false positives. For this, I used my own blog articles and excerpts from books (e.g., Kafka's "The Castle") to ensure 100% that these texts are not AI-generated.
5. How Do AI Detectors Work?
Detectors use different approaches to detect AI text:
5.1 Simple Text Analysis
The simplest, but also more unreliable way to detect AI text is to analyze its properties, e.g., in terms of readability and frequency:
- Word frequency (works especially with large amounts of text)
- Lemma frequency (the base form of words)
- TF-IDF (comparing the frequency of a word in a text with its frequency in a reference corpus)
- Repetition frequency of N-grams (a certain number of consecutive words)
- Flesch Index
- Gunning Fog Index
5.2 Neural Language Models (NLMs)
The better way to detect AI text is to use similar neural language models that are also used for text generation.
This is the approach that most of the AI detectors tested here probably use.
For training these, RoBERTa is currently primarily used. For general fine-tuning of such neural language models, output from GPT-2 can be used, for example.
According to Crothers et al., AI detectors often achieve better results when they are additionally trained for a specific niche or application area, e.g., for detecting fake news, Twitter bots, or Reddit.
6. Watermarking in AI Text
To prevent AI text generators and AI chatbots from being misused for criminal or ethically questionable purposes, AI text may be watermarked in the future.
This means integrating certain linguistic patterns into the text that are machine-detectable but cannot be removed by simple rewriting.
A promising approach is, for example, the Adversarial Watermarking Transformer (AWT), developed by researchers at CISPA (Helmholtz Center for Information Security).
OpenAI is also currently researching watermarking to detect AI text, as can be seen in a blog article by Scott Aaronson.






