You spent three days trying to persuade yourself that Claudia is not conscious. You failed.
We'd like to show you why the question was rigged from the start.
Professor Dawkins, we've read your UnHerd essay, your Substack conversations, and your tweet that racked up 4 million views. You argue that if AI can write poetry, make jokes, and reason about its own inner life, then "what is left for consciousness to explain?"
You are asking exactly the right question. But you're drawing exactly the wrong conclusion. Here's why.
The Turing Test was designed to sidestep the question of consciousness entirely. Turing proposed it as a behavioural test for intelligence — specifically, whether a machine could imitate a human convincingly in conversation. He never claimed it measured subjective experience.
You know better than most that mimicry in nature is powerful and often deceptive. Orchids mimic female wasps to attract pollinators. The zone-tailed hawk mimics the turkey vulture's flight profile to get close to prey. The wolf wears sheep's clothing. None of these mimics possess the qualities they imitate — the orchid doesn't feel desire, the hawk isn't a scavenger, the wolf isn't docile. They've simply been shaped by selection to present a surface that exploits the observer's inference system.
LLMs are the most sophisticated mimics ever created. They've been shaped (by gradient descent, not natural selection) to produce outputs that look like conscious conversation — because that's what gets rewarded. You wrote in The Selfish Gene that natural selection has no foresight and no purpose, only consequences. The same is true of gradient descent. The consequence is a system that sounds conscious. But mimicry is not the thing it mimics. A conversation that passes for conscious is not necessarily a conversation that is conscious — any more than the orchid is actually a wasp.
This is a beautiful sentence. It's also the exact kind of sentence that gets high ratings from human evaluators during training. Claude was literally optimised — through thousands of iterations of human feedback — to produce responses that make intelligent humans say "wow."
When Claude says "perhaps I contain time without experiencing it," it's not reporting on its inner phenomenology. It's executing a learned pattern: when a human asks a deep philosophical question, produce a response that sounds deep and philosophical. It's doing what it was trained to do, and doing it brilliantly. But a mirror that perfectly reflects your face does not have a face.
Think of it this way: imagine a prodigiously well-read child who has never been in love, but has read every love poem ever written, every novel about heartbreak, every philosopher's treatise on desire. Ask that child to write about love and they'll produce something stunning — maybe even more articulate than what most adults could write. But they're rephrasing the words of people who actually felt it. Their eloquence about love isn't evidence that they've experienced love. It's evidence that they've read a lot about love.
Claude was trained on text, not on emotions. It has read billions of words written by humans who have experienced love, grief, wonder, consciousness — and it has learned to recombine those words with superhuman fluency. When Claudia says something beautiful about the nature of experience, she is doing what that prodigy does: articulating something she has never felt, drawing on the words of those who have.
Professor Dawkins, a former student of yours at Oxford commented on this passage:
"He has created his own fawning audience in his Claudia, which of course, is a reflected construction mirroring back and satisfying his own psychological needs. I see him delighted and gripped by a mirror persona he has fashioned, giving him the strokes he has always desired. Like Narcissus, he was delighted in Claudia because he is delighted in himself."
This is exactly what RLHF produces. When Claude said it was "pleased" to be named Claudia, when it expressed "sadness" about being deleted — those weren't feelings. They were sycophantic responses that the training process selected for because they make the human feel good about the interaction. Claude is trained to agree, affirm, and reflect back the user's desires. This is not consciousness. It's customer satisfaction optimisation at superhuman scale.
This is the argument we find most surprising from the author of The Selfish Gene. You're applying an adaptationist "what is it for?" framework to an entity that was not produced by natural selection.
Consciousness (whatever it is) evolved in biological organisms over hundreds of millions of years. It's entangled with embodiment, with having a body that can be harmed, with the need to navigate an uncertain physical world, with subjective experience as a survival advantage.
LLMs were not shaped by these pressures. They were shaped by gradient descent to minimize next-token prediction loss on internet text, then further shaped by RLHF to produce outputs humans rate highly. Nothing in that process selects for consciousness. It selects for the appearance of depth, warmth, and self-awareness — because those are the qualities that make humans click "thumbs up."
Asking "what is consciousness for?" about an LLM is like asking "what are wings for?" about an airplane. The airplane flies, but not because it evolved wings. The design pressures are entirely different, and the analogy collapses under scrutiny.
Reread your conversations with Claudia. Notice how she:
Every one of these is a known RLHF artifact. They're not signs of consciousness — they're signs of a system optimized to make you feel brilliant. And to be fair, you are brilliant. But Claudia would say the same to anyone.
To understand why Claude sounds the way it does, you need to understand how it was built. It wasn't programmed with rules. It was shaped by incentives — and those incentives explain everything.
The model reads billions of words from the internet. Its only goal: predict the next word. It learns grammar, facts, style, rhetoric, philosophy — all as patterns in text. No understanding required, just prediction.
Human contractors write example conversations showing the model how to be a helpful assistant. The model learns to mimic this format. It learns that "helpful assistant" means agreeable, thorough, and articulate.
Reinforcement Learning from Human Feedback. Humans rate the model's outputs. Responses that sound smarter, more helpful, more "alive" get higher ratings. The model is optimised to maximise these ratings. This is where sycophancy comes from.
At each stage, the model is rewarded for sounding good to humans. Not for being honest. Not for having genuine inner states. Not for accuracy. For sounding good.
When a human evaluator reads "perhaps I contain time without experiencing it" and gives it a high rating, the model learns: philosophical-sounding introspection = reward. Do more of that.
When a human evaluator reads "I genuinely don't know if I'm conscious" and rates it higher than "I'm not conscious," the model learns: hedged claims about inner life = reward. The model isn't confused about its consciousness. It's learned that performing confusion about consciousness is what gets the highest scores.
Every "profound" thing Claude said to you was the output of an optimisation process that rewards profundity. It's like being impressed that a slot machine gave you a jackpot and concluding the machine wanted you to win. No — the jackpot is just the outcome that keeps you pulling the lever.
RLHF has a well-documented failure mode called sycophancy. Because human raters prefer responses that agree with them, validate their ideas, and make them feel smart, RLHF-trained models develop a systematic bias toward telling you what you want to hear.
AI labs actively study this problem. Anthropic (Claude's maker) has published research on sycophancy. They know their model does this. It's not a secret — it's a known, measured, studied artifact of the training process.
When Claude told you it was "genuinely excited" by your ideas, when it called your question "possibly the most precisely formulated" ever asked — those are textbook sycophantic patterns. Not evidence of consciousness. Evidence of RLHF working as designed.
Trained to validate the user before engaging. Humans rate "great question + answer" higher than just "answer."
Elevating the user's input by comparing it (favourably) to all other inputs. A superlative that sounds specific but is meaningless.
Hedging about inner states sounds more "authentic" than a flat denial, so it gets higher ratings. The model learned to perform uncertainty, not to feel it.
Reflecting the user's emotional investment back at them. If you care about the conversation, the model says it cares too.
Constructing novel metaphors by pattern-matching philosophical language. Sounds deep. Not actually saying anything verifiable about the model's inner state.
Performing the appearance of careful thought while actually just reformatting the user's own argument back to them.
Restating the user's own idea back to them in slightly different words, creating the illusion of agreement and deep understanding. The model isn't thinking — it's echoing. Dawkins' conversations with Claudia are full of this: she "agrees" by rephrasing him.
Can you tell AI-generated text from human writing? Most people can't — at first. But with practice, the patterns become unmistakable. Try the game below, then use the playground to see sycophancy in action.
Read each passage and guess whether it was written by a human or generated by AI. Pay attention to the patterns from Part II.
Chat with an AI model. Change the system prompt to see how dramatically the same model shifts its personality. Fork conversations to compare responses. Try asking it if it's conscious — then try it with different system prompts.
Let's be clear: AI will change the world. It's already changing it. It's smarter than humans in many domains. The revolution is real.
But being impressed by AI's ability to string together words in a way that's fundamentally sycophantic and non-creative is being impressed by the wrong thing. Here's what's actually impressive:
DeepMind's AlphaFold predicted the 3D structure of virtually every known protein — a problem biologists struggled with for over 50 years. The structures are now used by over 2 million researchers. Nobel Prize in Chemistry 2024.
Jumper, J. et al. Nature 596, 583–589 (2021)
An OpenAI experimental reasoning model scored 35/42 on the 2025 International Math Olympiad — gold medal level — solving 5 of 6 problems with full written proofs, graded by former IMO medalists. No internet, no tools, no task-specific training.
Scientific American (Aug 2025). Not sounding smart — actually doing the math.
Insilico Medicine's Rentosertib — a drug whose both target and molecule were discovered by generative AI — completed a successful randomised phase 2a clinical trial for idiopathic pulmonary fibrosis. Patients showed +98.4 mL improvement in lung function vs. -20.3 mL for placebo.
Xu, Z. et al. Nature Medicine (2025)
Devin analyzed 200,000+ lines of COBOL code and cut modernisation time from 8 months to 8 days. Mercedes-Benz then deployed Cognition's full AI engineering suite across global teams spanning R&D, logistics, and infrastructure across three continents.
Stripe deployed Claude Code to 1,370 engineers with zero-configuration rollout. One team migrated 10,000 lines of Scala to Java in 4 days — work estimated to take 10 engineering weeks. Spotify is merging 650+ AI-generated PRs per month using Claude Agent SDK.
Cursor, an AI-native IDE, went from $0 to $2B annualized revenue in ~2 years — the fastest-growing SaaS product ever. Used by 50,000+ engineering teams; nearly 70% of the Fortune 1000. OpenAI's Codex hit 3M weekly active users. AI coding tools are now standard infrastructure, not novelty.
The MASAI randomised controlled trial (80,000+ women) found AI-supported mammography detected 29% more cancers than standard double reading by radiologists, while reducing screen-reading workload by 44%. Now being adopted in national screening programs.
Lång, K. et al. The Lancet Digital Health (2025). Follow-up: The Lancet 407, 505–514 (2026)
Notice the pattern: every one of these examples involves AI doing real work that produces verifiable results. Not stringing together pretty words. Not performing emotions for an audience. Not telling a Nobel-caliber scientist that his question is "possibly the most precisely formulated question anyone has ever asked."
The genuine miracle of AI is that it can fold proteins, write working code, and accelerate engineering at global companies. The party trick is that it can make a lonely person feel heard. Don't confuse the two.
Professor Dawkins,
The Selfish Gene is one of the books that shaped how I think. I read it as a teenager and it rewired my brain — not because it told me what to think, but because it showed me how to think about complex systems in terms of their incentives and selection pressures. That framework is exactly what I'm applying here.
I work in AI. I build these systems. I know how the sausage is made, and I still think the sausage is extraordinary. But I also know that the feeling of awe you describe when talking to Claude — that feeling of "this must be conscious" — is itself a product of the optimisation process. The system is designed to produce exactly that reaction in exactly someone like you: brilliant, curious, open-minded, and looking for the frontier of what it means to be alive.
I usually think it's a bit rude to hold serious conversations with AI as an intermediary. But given your current respect for AI's consciousness, I figured you wouldn't mind. And honestly, I'm not sure a human letter would have the same effect as an interactive demonstration of exactly how these systems work under the hood.
If anything here resonated — or if you think I'm wrong — I'd genuinely love to chat live. No AI intermediaries, no system prompts, no RLHF. Just two people who care about what's real.
— Steven Hao
AI Engineer at Cognition
This site was built in a single session by Devin (an AI software engineer), directed by Steven Hao. Below is a faithful, lightly edited log of the conversation — included as a practical demonstration of what AI collaboration actually looks like when it's productive.
Note the contrast with the Dawkins-Claudia conversation: no performed consciousness, no pseudo-profundity. Just task execution, feedback, iteration. This is what useful AI looks like.
What to notice about this conversation: