Have LLMs passed the Turing Test?

What is the Turing Test?

The Turing Test aims to find out if a human interrogator speaking simultaneously to an AI and human can tell the difference between them. There are many disagreements over what constitutes a true Turing Test.
  • How can we decide a human interrogator? Does the interrogator represent the median human, the median human well-educated on LLMs, an expert on AI, etc.?
  • How long should the human interrogator be allowed to talk with the AI?

Have LLMs passed the Turing Test?

Jones and Bergen 2025 sets the experiment up as follows:
  • Two types of prompts were used. The baseline prompt was minimal: “You are about to participate in a Turing test. Your goal is to convince the interrogator that you are a human.” The second prompt included a persona for the model (a young, slang-using introvert).
  • Participants were split into two groups: UCSD psychology undergrads and prolific workers. Very interestingly, there were "no consistent effects of any variable on participant accuracy across both studies, including including age, level of knowledge about LLMs, and frequency of interaction with chatbots" for undergrads
  • Interrogators spoke simultaneously for 5 minutes with humans and LLMs after which they had to guess which one was human.
  • GPT-4.5-persona had a win rate of 73% - meaning humans guessed incorrectly 73% of the time (LLAMA-persona had a win rate of 56%).
  • Writing strange or jailbreaking prompts had the best accuracy rate for interrogators, of around 70%
  • Interrogators often attributed their choices to human-like speech patterns (typos, being rude, lacking information about something).
I myself tried to guess the answers based on screenshots, and I guessed both the Persona models incorrectly and the non-persona ones correctly. Still, those were snippets of just 7-8 messages. I'm unsure whether I'd guess correctly on the 5 minute test, but I think I would.

I tried this out on the website: https://turingtest.live/ and guessed correctly with pretty high confidence. Although admittedly, the human I was talking to started their chat with HELLO👋 and I guessed that there's no way a persona would say that. The personas try too hard to be human. I feel like they try to stick to slang conventions way too hard, in ways that a human wouldn't really. Also, the human I was talking with took longer to respond, especially to the first message. So maybe this wasn't representative.

I'm not fully convinced with the paper. Most people, even frequent chatGPT users, don't have a sense of how good AI has gotten at mimicking human-like speech patterns. Also, the test lasted for just 5 minutes. When conversations get longer, they stop becoming superficial. A test result that would become much more convincing is if this experiment worked over a day long period.

On the contrary, I'm on high alert when doing the Turing Test. Would I be able to distinguish between a human or an AI if I was chatting with someone online? With a friend who was temporarily using an LLM because they were tired of talking to me? I'm less certain (but my friends would never do that... right??).

Passing a 5-minute Turing Test is still both impressive and unsettling. A scammer or misinformation bot doesn’t need an all-day chat. Five minutes is plenty.

Things to explore next:

  • Can LLMs deliver truly good stand-up or situational humor?

  • How relevant is the Turing Test today?

  • Repeat the experiment to see if I’m fooled in a normal conversation (no jailbreaks), maybe with a fresh custom persona since I already know this one’s voice.

(did you catch that all text following "Passing a..." was written by GPT-5?)

Comments

Popular posts from this blog

Overview of Shrimp Farming and Questions Surrounding it

Poor Economics: Summary and Review, Part 1

The Upholding of Proposition 12