Notes on Dwarkesh-Karpathy

A summary of the Karpathy interview on the Dwarkesh podcast, with my thoughts in italics. I paraphrased what Dwarkesh and Karpathy said, they’re not direct quotes.

Big insights:

  • LLMs are still cognitively lacking in many ways — no continual learning, not very multimodal, don’t have a reflection process, have a collapsed distribution of responses. These are hard problems to solve and will take time. Karpathy argues these won’t just be solved by scaling up models and doing RL on different types of tasks. These behaviors are not emergent, they require algorithmic breakthroughs.

  • Humans don’t learn what we think of as intelligent tasks by RL, they seem to learn by something different and more reflective and deliberate. RL is terrible, you’re updating everything in the trajectory of actions, even if intermediate steps were wrong. It’s also a slow, inefficient way of learning.

  • AI progress will come from better everything — data, hardware, kernels, and algorithmic breakthroughs. LLMs are bad at very novel stuff, but this is most of what AI research is. Karpathy is bearish on RSI causing fast takeoff.

  • Karpathy argues AI will only continue the current exponential growth trend (2% per year). LLMs will have many problems with reliability and will take time to diffuse. You will need humans in the loop for a long time. 

Overall, the interview has updated me towards slightly longer timelines. My timelines are still quite broad, but I think Karpathy has convinced me that there are hard algorithmic problems to solve and we should expect them to take time. Still haven't thought enough about whether RSI-driven fast takeoff is likely.


I think Karpathy’s view that exponential growth will continue even after AGI is wrong. A regime change from 2% to 20% is huge, and I’m not convinced of that either (unless we get fast takeoff). However, AI that is as smart as humans seems likely to create new ideas and technologies that spur growth — automation is not the only source of growth.


Part 1: The Decade of Agents


Karpathy: It will take around a decade to reach agents which have the capability to fully do what a human colleague or assistant will be able to do. There are many small things they’re lacking. There are a number of bottlenecks that need to be solved for, including: computer use, multimodality, intelligence, continual learning, etc. I expect these breakthroughs to take 10 years, rather than 1 or 50, because of my intuition and experience with how fast AI progress in the past has gone — the problems are hard, but surmountable.


In the past: the first big breakthrough came when AlexNet was released and everyone started doing deep learning on neural networks. After that, AI labs spent a few years focusing on computer games. They were heavily reinforcement learning focused, believing that getting really good at games could help you generalize to real world tasks. I think this was a mistake because he didn’t see how games helped you learn to do real world tasks like accounting. Instead, I was working on getting AI to operate a webpage using a mouse, which I think was also too early — you would just be clicking random things, reward was too sparse. AI needed these base representations of the world to do anything useful, which came after LLM pretraining.


Some context: A computer use agent consists of a loop: observe screen, decide on action, execute action, repeat. In OpenAI’s universe project, the agent would see raw pixels, output actions like clicking something, get a reward signal (e.g. based on whether it could book a flight) and be trained through RL. However, the agent had no idea what it was looking at — it didn’t know that “this is a button”, “clicking Search lets you find flights”.  But nowadays, we don’t start from scratch — we start from a pre-trained vision language model (LLM trained on both text and images). At inference time, the agent receives task description and screenshot. These are fed into the context window (screenshot is converted into tokens using a vision encoder). The model outputs text like click(x=3, y=2). Text is parsed and executed onto the OS. First, you use SFT on demonstrations. Humans record themselves doing tasks, models trained to predict outputs. Then, you do RLVR, except now, LLMs already have base representations of what things on the computer mean. This is still hard, especially for long-time horizon tasks where the LLM has to do a sequence of steps right. 


Karpathy: And so pretraining added this significant jump in improvements that came from better representations of things in the world. If improvements continue within similar periods of time, you should reach the breakthroughs required to do everything a worker can do (continual learning, computer use, etc.) to take ~a decade. 


Dwarkesh: But humans and animals can learn a task super rapidly with no training in that specific domain, no labels — just by looking at sensory data and understanding what they need to do. Why shouldn’t that be the vision for AI? 


Karpathy: Animals maybe aren’t a great analogy — they came about by a very different optimization process. Zebras can walk as soon as they’re born, this is not reinforcement learning. Evolution has some way of encoding the weights of our neural nets in DNA. AIs aren’t really trained to be like animals — there isn’t the outer loop of evolution. They’re trained to mimic human internet training data. It’s a different type of intelligence. We could make them more animal-like, this would be great if we can get that to work, but it’s hard.

Karpathy: “If there were a single algorithm that you can just run on the Internet and it learns everything, that would be incredible. I’m not sure that it exists and that’s certainly not what animals do, because animals have this outer loop of evolution.”


From Sutton’s interview, it seems likely that current LLMs are much worse than animals at learning in some ways. They’re not sample efficient. RL is terrible, as Karpathy states later. Animals seem much much better in some respects — sample efficiency, generalization, continual learning, etc. Makes me bullish on the huge potential of algorithmic improvements. Could we get an algorithm that just continuously makes AI smarter and smarter through interactions with the world? Is that part of what a fast takeoff looks like? Claude points out that animals don’t unboundedly become smart, learning plateaus. For fast takeoff, you need an algorithm that doesn’t plateau, which seems quite hard maybe. Makes me wonder what the limits of algorithmic progress are — what could we do with the “ideal” learning algorithm? Questions to explore in more detail later!

Karpathy: What animals are doing I think isn’t that much reinforcement learning, especially the intelligence based tasks. Motor learning is more analogous to RL. Doesn’t mean we shouldn’t do those things for LLMs though. We should just do what works. Evolution might be really hard to replicate.


From Claude on RL in humans: dopamine neuron firing patterns look strikingly like a temporal difference (TD) prediction error — the signal at the heart of TD learning, which is one of the foundational algorithms in RL. When a reward is unexpected, dopamine spikes. When a reward is predicted, dopamine fires at the predictor rather than the reward itself. When an expected reward fails to arrive, dopamine dips below baseline. This is the canonical TD error signature.


This system may be the main driver for some behaviors (motor learning, habit formation), but not for others (abstract reasoning and concept formation). So what are humans using instead of RL? Maybe the topic of another post!


Dwarkesh: Evolution has to compress information into 3GB of DNA. The weights themselves are not encoded in sperm or egg, they also have to be grown. Information for every single synapse can’t exist in DNA. Evolution seems closer to finding the algorithm that does the lifetime learning than encoding weights. Maybe the algorithm is not RL, though.


A human brain has 100 trillion synapses. DNA has 750 MB of information. There is not enough information in DNA to encode the weights of these synapses. How is a zebra able to run within minutes of being born? Running is an incredibly complex task. Claude describes evolution as “meta-learning”. Evolution is not learning the “running policy” or something — it is learning how to build a zebra whose developmental program produces a brain pre-wired with the running policy (well, more like a stumbling policy). This is another topic that it’s easy to go down a rabbit hole on! Forget just intelligence, there is an incredible amount of biological information in an animal, about the organs and skeletal structure and so on that is just not possible to compress in 750 MB. DNA is bringing about the developmental process from which this biological structure and neural circuitry emerges. 


Karpathy: We don’t have to build animals though, we just need to do what’s practically useful. Pretraining is useful in getting us a starting point that we can do RL on. Pretraining is also getting us some “cognitive core”. In the process of next token prediction, LLMs do learn knowledge, but they also learn cognitive skills. I think the knowledge / memorization is actually holding us back somewhat. LLMs turn to rely too much on it and perform worse on tasks that are very out of distribution of the training data. It is better in some ways to remove the knowledge during pretraining, but keep the cognitive core.


The “cognitive core” Karpathy is talking about refers to circuits that do something like logic. For example, “induction heads” allow the model to not just look at how previous tokens attend to the current token, but also how previous tokens attended to previous occurrences of the current token. This allows the model to pick up patterns. Paper on how LLMs develop circuits to do modular arithmetic using Fourier transforms. As a side note, at first glance, it seems absolutely absurd that models learn to do this just through gradient descent. I have no intuition how an LLM learns to do this instead of just memorizing training data — definitely something to explore later.

Part 2: In-context Learning


Dwarkesh: In-context learning is what feels like true intelligence: you can see LLMs start on a path, then correct themselves. 


Karpathy: What is the mechanism by which this in-context learning happens? There was a paper that input a bunch of x, y pairs, then had an x and the model had to output a y. When researchers looked at the weights of the neural net to see how it was predicting y, it looked a lot like gradient descent. So in-context learning might actually be implementing some sorts of algorithms like gradient descent internally.


Incredibly interesting, though I’m confused how this expands to more complicated tasks the LLM has to do rather than just predict y. Like… here you have a bunch of labelled x, y pairs, but what about situations where you don’t have pairs? 


Dwarkesh & Karpathy: Why does in-context learning feel like continual learning, real intelligence, while pretraining doesn’t? There is a lot of compression in pretraining. All the knowledge from the tokens isn’t fully baked into the weights (15 trillion tokens into a couple trillion weights). However, the KV-cache has the capacity to store a lot more bits/token of information (100s of KB of information per token). This is why loading some knowledge into the “working memory” of an LLM can get it to perform much better, you don’t have this hazy recollection.


Part 3: Flaws in LLMs and Future Progress


Dwarkesh: In-context learning was never explicitly trained. We incentivized it and it emerged. Maybe continual learning will happen similarly. Similarly, if we do RL on long-horizon tasks  that span multiple sessions, maybe the model will spontaneously develop some mechanism to preserve information across sessions. E.g. it will fine tune itself, write to an external memory, etc.


Karpathy: I’m a bit skeptical. The existing architecture supported in-context learning, but it doesn’t exactly support continual learning. Weights are fixed at inference. Model can’t modify itself between calls.


For humans, there’s some sort of distillation of the knowledge into our weights, we don’t just keep growing our context window. For LLMs, we can’t just keep growing the context window - we also need to somehow distill this information into weights. So we need some architectural innovation — sparse attention is one step to have longer context windows.

Dwarkesh: How much do you expect AI architectures / training methods to change?

Karpathy: Based on previous trends, I expect that while we’ll likely have big algorithmic changes, we’ll still broadly be training big neural nets through backpropogation. Lots of things will contribute to AI progress together — better data, better hardware, better algorithms — rather than just one thing.  


Compute bottlenecks are quite interesting to think about too. In the Dylan Dwarkesh episode, Dylan was talking about how ASML, which is not AGI-pilled, is not projecting a significant increase in lithography machine capacity. Compute spent on inference, if there’s very high demand, can be a big bottleneck. The better models get, the higher the demand for them and the higher the marginal price of compute. It’s unclear how even superhumanly intelligent could significantly accelerate compute capacity. Douglas and Bricken argue though that models have also gotten significantly more efficient.


Part 4: AI and Coding


Karpathy (as of October 2026): I primarily use LLMs for autocomplete rather than as agents. In particular, they struggle at writing “code which has never been written” and integrating it into the style and previous functionalities of an existing codebase. One reason I’m slightly more skeptical of a fast, “rapid self improvement” driven takeoff. 


What LLMs produce is currently slop. It’s not very impressive, and I find it surprising that people are finding it so impressive (I suspects its fundraising or something). I see AI as another step in the history of improvements we have made to productivity — like code editors, compilers, linters, etc. In fact, I don’t quite see a start and end to “AI”. Google Page Rank was in many ways AI. 


Part 5: Reinforcement Learning


Karpathy: Reinforcement learning is very flawed. If you only reward LLMs based on output, you also reward every single intermediate step that came before the output. However, those intermediate steps might be going completely in the wrong direction. Humans don’t do this. They have a lengthy review process, they think through which intermediate steps worked and didn’t work.


Process-based supervision is interesting, but has some problems. You need some way to assign partial credit to the progress. This is tricky to do. For one, how do you know how much credit to assign? Some labs have tried using LLM judges. However, the model being trained is often able to find adversarial examples that generate very high reward in the judges (like nonsense token sequences that the judge has never seen in training, so its purely generalizing). There are trillions of possible adversarial examples. Labs are trying to get these judges to work, but I still think we need other ideas.


Something I find interesting is a review-based approach, or training on synthetic examples of solving tasks efficiently. Some papers have tried getting this to work, but it’s hard to know what scales well for big models.


Dwarkesh: Humans daydream and reflect. Is there an analogy of that for LLMs? Are LLMs lacking in this way?


Karpathy: We’re missing some aspects. When LLMs read books, they look at each token sequentially and try to predict the next one. This is not how humans read books. They read and then reflect, think about whether parts of the book contradict other books, etc. Emulating this in LLMs is tricky. It would be interesting to have a reflection process in pretraining.


Dwarkesh: Why not do supervised fine-tuning on reflection text for a book?


Karpathy: I think this will make LLMs worse. LLMs are already quite low entropy. If you tell ChatGPT to tell a joke it will only tell you like three. If you ask an LLM to reflect on a book, it will look quite good at first, but then it just produces the same reflection over and over. Training on synthetic reflection text would just collapse the distribution more.


To solve this, we need to adjust the entropy of the model to a sweet spot. Not too high though, or else it starts using very uncommon words or generating nonsense. Not trivial to do.


Dwarkesh: Kids are really fast learners, but they have terrible memory. Pretrained base models have excellent memory but are really slow to generalize. Adults are somewhere in the middle. Thoughts?


Karpathy: Humans are not good at memorization which forces us to find more general patterns. Memorization is bad for LLMs in many ways — they rely too much on it. We need some way to take out the knowledge and keep just the cognitive capabilities, then allow the model to look up the knowledge when it needs to.


Dwarkesh: Could this solve model collapse?


Karpathy: Not sure, I think it’s a separate axis. Frontier labs are possibly not incentivized to have higher entropy because consistency and reliability is good for many applications. But some applications need it.


Karpathy & Dwarkesh: The 20B GPT-OSS model is better than the trillion+ parameter GPT-4. Parameter counts in frontier models have plateaued (labs are focusing more on RL and inference compute). A lot of work the parameters are doing is memorizing information. I expect the cognitive core of AI to require only ~1B parameters (we still want to retain some knowledge). 


One interesting contrast to this perspective is Douglas and Bricken on the Dwarkesh Pod back in May 2025. They seemed to argue somewhat the opposite — the human brain is composed of 100s of trillions of synapses. Unclear if we can just analogize synapses to params, but this points to how larger models might be better at generalization and learn from fewer examples if they were bigger. Large models have more sophisticated circuits in some respects — representing similar concepts in different languages in a similar way. While smaller models separate out these concepts. Karpathy says the opposite, that large models memorize too much.


Karpathy: Labs are being practical with compute. Pre-training is currently not best marginal use of compute. Unclear whether models will grow or shrink or what happens.

 

Part 6: AI and the Economy


(Using the definition of AGI as “automation of all remote work”.)

Karpathy: I expect this automation slider — humans will move up layers of abstraction. There are simple jobs to automate like call center employees (very routine tasks). In this case, you’ll have a human supervising a team of AIs. Full automation of jobs is hard though.

AI is quite good at coding for a few reasons: (1) verifiable reward (2) purely text-based (3) there’s lots of infrastructure for handling code and text, like diffs. Other things are more vague and there isn’t infrastructure built to handle.


Karpathy expects AI to simply allow us to maintain our current level of growth (~2% GDP per year). He argues that the intelligence explosion has been happening for hundreds of years. We’ve been slowly automating away parts of the economy for a long time. AGI is not a paradigm shift, it’s simply a continuation of the ongoing intelligence explosion. Computers and the internet barely showed up in the growth statistics, for example. We’ve also been having rapid self improvement for years, he doesn’t see an incredible fast takeoff as likely. Dwarkesh disagrees. He thinks AI will be a paradigm shift — similar to how the industrial revolution shifted the rate of growth from ~0.2% to 2% per year, AI will shift the growth rate from 2% to 20% or something. Moreover, he expects RSI to be qualitatively different from before.


I am confused by Karpathy’s perspective, and I don’t think he adequately explains why AI will not be qualitatively different than previous technologies. He points out that knowledge work only contributes 10-20% of today’s economy (would be useful to have an exact figure!). However, it wouldn’t surprise me if this 10-20% enables a ton of other sectors. Like banking, online retail, entrepreneurship being essential for so many sectors. 


However, I feel like fast takeoff people push this point of “infinite copies of geniuses”, but this is not exactly accurate. Each copy has an inference cost. Also, you need these geniuses to be high entropy — they can’t all think alike or you won’t generate that many new ideas, the number of new ideas will slowly plateau. 


Dwarkesh brings up that there have been cities with 10-20% economic growth —how did they do it? Population growth is one factor. Another is that these were often less developed cities playing catch-up. They didn’t have to innovate much, just adopt existing technologies and capital. So the mechanisms of “catch-up growth” that worked for developing countries may not work for developed countries. They have to create new innovation. 

 Capital has diminishing marginal utility — it’s initially very high, but then plateaus. There are these “low hanging fruit”. Labor productivity hugely increases as more capital is used. People gain skills and critically, shift to high productivity sectors. And then at some point, diminishing returns kick in. 


Is there a similar story with AI? You have diminishing returns kicking in at some point. Initially, there’s this big “labor overhang” — so many jobs that need to be done, no one to do them. But then, as AI does a lot of these jobs, the labor overhang reduces, diminishing utility of deploying an AI agent. So maybe you see this initial surge of growth and then it slows down. On the other hand, if AI keeps qualitatively improving and can continuously do newer and more complex jobs, this labor overhang may not get filled. The rate of improvement of AI is maybe one way it differs from other technologies — computers stopped meaningfully improving, so they hit diminishing returns. Will eventually have to read Trammel and Korinek’s paper!  


Karpathy: I am nervous about a gradual loss of control from society’s perspective — where we’ll have multiple, competing superintelligences controlled and aligned to particular actors, but not necessarily aligned to society.


Part 7: Evolution of Intelligence & Culture


Karpathy: How difficult / unlikely is intelligence evolutionarily? We took 2 billion years to get to eukaryotes. To get to human intelligence, you needed an environment that rewarded marginal increases in intelligence. Having hands and benefitting from tool use, making fire, probably incentivised this intelligence increase in humans. If raven brains grow, they find it harder to fly. Unclear the gains from increased dolphin intelligence. You also needed some cognitive skills that couldn’t just be hardcoded as evolutionary circuits — hence intelligence specifically would help.


Humans also arguably started dominating because of culture and collaboration, not purely intelligence. LLMs are not there with culture yet. LLMs are like these prodigious kids. They’re incredibly smart, can pass PhD level exams. However, they lack some cognitive skills that make them well-functioning adults.


Part 8: Tesla and Self-Driving


Karpathy: Self-driving took a long time to rollout, and it’s still not there yet. Still not widely deployed. Adding each “9” or reliability (90%, 99%, 99.9%…) takes roughly the same amount of time. When the cost of failure is so high, you need to guarantee reliability, even in edge cases. That’s why I’m always skeptical of demos. The demo to deployment gap is always large.


Software engineering actually has a similar cost of failure — security vulnerabilities can be extremely costly. LLMs making a huge mistake once in a few years would have a very high cost. It’s not easy to automate SWE. 


Slightly skeptical of this claim, but maybe. We just need AI to make less mistakes than human software engineers right, not to be error-free?


Karpathy also had a segment on education. I enjoyed listening to it, but I think it digresses from the main discussion, so I’ve chosen to drop it from my notes. One thing that was interesting was that Karpathy is creating this SOTA school to teach technical knowledge. One of his aims is to prevent gradual disempowerment. What Karpathy’s doing seems great, education will be meaningful even if all labour is automated. However, I don’t expect it to really guard against gradual disempowerment.


Comments

Popular posts from this blog

Overview of Shrimp Farming and Questions Surrounding it

The Upholding of Proposition 12

Georgia Tech Part 1