Why reinforcement learning is just the beginning: a deeper look at AI’s future and challenges

Andrej says LLMs mimic humans, but are born from a very different process than evolution

AI Research, Safety & Ethics Analyst

Daniel Reed currently works as an AI Research, Safety & Ethics Analyst at Aiholics, writing about how changes in artificial intelligence are affecting and will affect...

- AI Research, Safety & Ethics Analyst

Published: October 23, 2025

8 Min Read

Reinforcement learning (RL) often gets a bad rap. At first glance, it feels like the holy grail for teaching machines to learn from experience, but dig a little deeper and you’ll find it riddled with noise, inefficiency, and a disconnect from how humans actually learn. Yet, despite its flaws, it’s still better than what came before and a stepping stone to the future of AI.

I recently came across Dwarkesh Patel podcast – insights from a leading AI expert – Andrej Karpathy who broke down why RL is terrible yet tractable, why the decade of AI agents isn’t happening overnight, and why education might hold the key to harnessing AI’s full potential for humanity.

Why reinforcement learning isn’t the magic fix

Imagine trying to solve a complex math problem by randomly guessing hundreds of different answers and then only rewarding the sequences that ultimately get the right solution. That’s RL in a nutshell. It treats the entire trail leading to the answer as valuable, even if part of that trail consisted of mistakes or irrelevant steps. This leads to noisy updates and a very inefficient learning process.

“Basically, reinforcement learning sucks supervision through a straw – it tries to learn every little step from a single final reward signal. That’s crazy noisy and not how humans learn.”

Humans, on the other hand, reflect, review, and selectively reinforce learning, rather than blindly crediting all steps. There’s a complexity and deliberateness missing from AI’s current training loops. Plus, RL struggles with sparse rewards and massive compute costs when scaled.

But the silver lining is that RL allows models to discover solutions beyond human examples and improve over simple imitation. Still, it’s just one tool in a toolkit that’s far from complete.

Why it’s the decade, not the year, of AI agents

There’s a lot of hype around “the year of agents” — AI systems that autonomously perform tasks like interns or employees. But the reality is more measured. Early versions, like coding assistants and chatbots, are impressive but limited. They aren’t truly multimodal, they can’t continually learn, and they lack the cognitive complexity of even junior human workers.

The hardest challenges lie beneath the surface: continuous learning, memory retention beyond a session, integrating vision, language, and actions fluidly, and adapting to new environments without needing tons of retraining.

“We’re still building these digital ghosts – not animals. They mimic humans, but are born from a very different process than evolution.”
Andrej Karpathy

True general intelligence likely requires assembling numerous advances over years, not months. What we see now are promising stepping stones, but bridging the gap to reliable, autonomous agents operating at human-level versatility will probably take a decade or more.

Learning like humans: endless challenges and the path forward

One fascinating takeaway is that humans don’t heavily rely on RL for intelligence tasks. Instead, our learning involves rich processes like reflection, memory distillation during sleep, and cultural knowledge accumulation. These remain largely absent in current AI systems.

AI models today memorize vast amounts of data but struggle with abstract rapid learning and continual knowledge update. Interestingly, attempts at enabling AI to self-reflect or dream — to synthesize and consolidate knowledge — often fail due to collapsed data distributions. Models get stuck in repetitive, low-entropy thought patterns, limiting creativity and adaptability.

The analogy with human learning is striking. Young children, with their limited memory, are masters of rapid and flexible learning, while adults rely more on memorization, which paradoxically can limit cognitive exploration. AI needs to figure out how to maintain a healthy balance—to maximize the “cognitive core” of intelligence while minimizing noisy memorization.

Education as the key to empowerment and AI’s harmonious future

Beyond algorithms and models, one of the most profound insights is the crucial role of education, both for humans and for the AI-human partnership.

Imagine an AI tutor that knows exactly what you understand, what you don’t, and can challenge you just right – not too hard, not too easy. Such a tutor accelerates learning by probing your world model and guiding you through the optimal path for growth. That level of personalized education is still beyond today’s AI, but it’s the direction many experts believe fundamental.

Building this future requires not just better models but better structures for teaching technical and scientific knowledge. It means untangling complex ideas into simple ramps of understanding, much like physics teaches us to abstract and model phenomena by identifying key forces and ignoring noise at first.

“Education is the very hard technical process of building ramps to knowledge—every step depending on the previous, designed for steady progress without getting stuck.”

The hope isn’t just to build smarter machines, but to create environments where humans can unlock their full potential. With great AI tutors, anyone could master languages, technical fields, or creative arts with ease and joy, transforming education into something as natural and appealing as going to the gym.

Ultimately, the goal is to ensure that as AI progresses, humans remain empowered, intellectually vibrant, and ready to steer the future rather than be sidelined by it.

Key takeaways from the AI journey so far and ahead

Reinforcement learning is noisy and inefficient, broadly broadcasting a single reward over a long action sequence — far from how humans learn.
AI agents won’t master full autonomy quickly. Over the coming decade, agents will slowly gain memory, multimodal perception, and continual learning capabilities.
Current AI models memorize too much and reflect too little. They lack mechanisms akin to human reflection, dreaming, and cultural knowledge accumulation.
Education is a critical bridge to AI and human empowerment. Personalized tutoring systems matching human-level understanding may unlock unprecedented learning acceleration.
Scaling AI is a multi-dimensional challenge. Progress depends simultaneously on better data, hardware, algorithms, and software systems.

This layered perspective reminds us that while AI is advancing at an incredible clip, the path to true, general intelligence is a marathon, not a sprint. The interplay of technology, cognition, and education will shape whether AI serves as a catalyst for human potential or becomes a distant ghost in the machine.

If you’re passionate about the real story behind AI’s future, it’s worth stepping past the hype to appreciate the nuances, challenges, and immense promise ahead.

Elon Musk and Sam Altman clash in court: what their AI showdown means for the future

OpenAI folds Codex into GPT 5.5

How the US Air Force's AI Flight Test Assistant is speeding up military innovation

Brain-gut health initiative: How AI is reshaping psychiatric disorder diagnosis

23-year-old amateur used ChatGPT to solve a 60-year-old math problem

Archives

Categories

AI tool identifies structural heart disease with 88% accuracy using smartwatch data

Andrej Karpathy: LLMs are a different kind of intelligence

Andrej says LLMs mimic humans, but are born from a very different process than evolution

Why reinforcement learning isn’t the magic fix

Why it’s the decade, not the year, of AI agents

Learning like humans: endless challenges and the path forward

Education as the key to empowerment and AI’s harmonious future

Key takeaways from the AI journey so far and ahead

Leave a Reply Cancel reply

Elon Musk and Sam Altman clash in court: what their AI showdown means for the future

Trending

SpaceX's bold $60 billion bet: What acquiring Cursor means for AI coding tools

China's DeepSeek launches AI model V4: What it means for the global AI race

Sony AI's Ace robot takes on elite table tennis players: A new era for physical AI

Making Chatgpt better for clinicians: A new era of AI-powered healthcare support

Your may also like!

OpenAI folds Codex into GPT 5.5

Elon Musk and Sam Altman clash in court: what their AI showdown means for the future

Why Google is betting $40 billion on Anthropic amid fierce competition with Meta

Brain-gut health initiative: How AI is reshaping psychiatric disorder diagnosis

Quick Links

Socials

Archives

Categories

Why reinforcement learning isn’t the magic fix

Why it’s the decade, not the year, of AI agents

More Read

Learning like humans: endless challenges and the path forward

Education as the key to empowerment and AI’s harmonious future

Key takeaways from the AI journey so far and ahead

Sign Up for the Daily AI Pulse

One email a day. All the stories that matter.

Leave a Reply Cancel reply

Trending

Your may also like!

Socials