Our testing methodology
We tested 8 AI detectors using 500 text samples: 250 unmodified AI-generated texts from GPT-4, Claude 3, and Gemini 1.5, and 250 human-written texts including academic papers, blog posts, and news articles.
Results: true positive rates
On unmodified AI text: Text Humanica Pro 98.2%, GPTZero 96.1%, Turnitin 94.8%, Originality.ai 97.3%, Copyleaks 93.2%, Winston AI 91.7%, ZeroGPT 89.4%, Sapling 88.9%.
The false positive problem
False positives are a serious problem. Our results: ZeroGPT 12.4% false positive rate, GPTZero 8.2%, Turnitin 6.1%, Originality.ai 4.8%, Text Humanica 3.1%.
Accuracy on humanized text
After running AI text through Text Humanica's Pro humanizer: GPTZero detected 4.2% of samples, Turnitin detected 7.1%, Originality.ai detected 5.8%, Copyleaks detected 9.3%.
Which detector should you use?
For the most accurate detection of unmodified AI text, Text Humanica Pro or Originality.ai are the top choices. For academic use where false positives are a concern, Text Humanica's low 3.1% false positive rate makes it the safest choice.