The (Graded) Essay Is Dead

And other developments this week

Jesse J Rogers


Photo by Scott Graham on Unsplash

[disclaimer: I do not speak on behalf of my employer. Views are my own, expressed with AI assistance]

As educators we’ve already started to face an unprecedented challenge: the rise of AI-based academic dishonesty. For the past year, the issue has moved from being a distant sci-fi hypothetical to a disruptive reality infiltrating our classrooms and undermining our teaching methods.

Many of us have comforted ourselves with the belief that AI detection software would rise to the challenge of identifying artificially generated content, just as plagiarism software arose to address the challenges created by the internet.

Maybe someday. For now, we have no reliable defense against AI-based academic dishonesty. None.

As I learned in this video, OpenAI — the team behind ChatGPT and current leaders in the field — pulled their AI Classifier for being too error prone. They’re frantically building and testing its replacement. But if you read between the lines about the progress so far, there’s little reason for optimism about the new model either.

“Our classifier is not fully reliable. In our evaluations on a “challenge set” of English texts, our classifier correctly identifies 26% of AI-written text (true positives) as “likely AI-written,” while incorrectly labeling human-written text as AI-written 9% of the time (false positives). Our classifier’s reliability typically improves as the length of the input text increases. Compared to our previously released classifier, this new classifier is significantly more reliable on text from more recent AI systems.” — Open AI

A 26% true positive detection rate? A 9% false positive rate? Let’s be clear, we’re talking about a system which only catches 1 in 4 instances of academic dishonesty, and of those, guides you to falsely accuse almost one out of ten of the students it flags.

If the top-tier geniuses at OpenAI are unable to definitively recognize and differentiate output from their own model, then how performant do you think the software from companies like Turnitin is?

If you haven’t experimented with the paid version of ChatGPT or the comparably powerful Claude 2 (free open Beta, released…