The AI Paradox - Can OpenAI read its own writing?

OpenAI, the AI research firm behind ChatGPT, has released a new tool to distinguish between AI-generated and human-generated text.

Before we dive into the details of the tool, let's get a brief idea of what is a classifier in terms of Machine Learning.

A classifier in machine learning is an algorithm that automatically orders or categorizes data into one or more of a set of “classes.” Machine learning classifiers go beyond simple data mapping, allowing users to constantly update models with new learning data and tailor them to changing needs.

OpenAI’s New AI Classifier for indicating AI-Written Text

OpenAI has launched its new AI Text Classifier that claims to limit the ability to run automated misinformation campaigns by use of AI tools for academic fraud and impersonating humans with chatbots. It has a vivid impact on journalists, mis/disinformation researchers, and other groups.

The classifier is fine-tuned to distinguish between text written by a human and text written by AIs from a variety of providers. This dataset was a combination of various human scripts such as the pretraining data and human demonstrations on prompts submitted to ChatGPT (Text Generation, Language Translation, etc.).

Within the dataset, each piece of text is divided into groups of prompts and responses and the model when executed upon the prompt text results in responses of multiple language models. Each document is labeled as either very unlikely, unlikely, unclear if it is, possibly, or likely AI-generated.

Even though it’s impossible to detect AI-written text with 100% accuracy, OpenAI believes its new tool can help to mitigate false claims that humans wrote AI-generated content.

During the evaluation of the classifier, a challenge set was passed to the tool consisting of English text. The tool was able to correctly identify 26 percent of the AI-Written text which is a positive sign. On the other hand, about 9 percent of inaccuracy was observed where in the human text was labeled as AI-written text.

Since the classifier has been made public, there have been multiple speculations based on its performance. In an attempt to test it, the AI model was fed with 10 text samples generated by ChatGPT and it is ironic that only 4 of the samples were termed “likely” to be generated by an AI and 3 as “possibly” AI-generated which seems absurd as the model seems to fail at its own data.

The Shakespeare Effect

The debate on AI-Classifier brewed when AI & ML researcher Sebastian Raska tested the OpenAI classifier in ChatGPT with snippets from the initial pages of Shakespeare’s Macbeth and the result showed “ The classifier considers the text most likely to be AI-generated” which is amusing.

Constraints of AI-Classifier

The AI classifier cannot be employed as an all-around text differentiation tool due to its notable limitations.

It can be used as one of the fragments for authenticating text and not as the primary decision-making tool.

The classifier has language restrictions as it is mostly trained in the English language and can generate noticeable errors when working with other languages.
AI-written text can be edited to evade the classifier. Classifiers like this can be updated and retrained based on successful attacks, but it is unclear whether detection has an advantage in the long term.
For inputs that are very different from the text in our training set, the classifier is sometimes extremely confident in a wrong prediction.
The classifier is very unreliable on short texts (below 1,000 characters). There is inconsistency observed with the labeling of long-length texts as well.
OpenAI has not yet assessed the effectiveness of the classifier in detecting content written in collaboration with human authors.

In summary, OpenAI’s new tool can make mistakes and is a work in progress. The tool is made available to the public for trial and test bases. We can gather any text of our choice which may be a novel or this current blog and run it over the following API.

Let's have fun running snippets on OpenAI API

The AI Paradox - Can OpenAI read its own writing?

OpenAI’s New AI Classifier for indicating AI-Written Text

The Shakespeare Effect

Constraints of AI-Classifier

About Ketavarapu Srinidhi