This is a benchmark evaluating a model's ability to recognize text that is produced by humans, compared to text that is produced by AI.