Why AI Tutors Fail at Teaching Thinking

The Metric AI Tutors Are Optimizing For

Khanmigo gets students to the right answer. So does every other AI tutor on the market right now. That's what they're designed to do — reduce confusion, resolve errors, move the student from stuck to unstuck. The product metric is: did the student finish the problem? Did they stop struggling?

That sounds like tutoring. It isn't.

A tutor who reliably produces correct answers isn't teaching students to think — they're teaching students to depend. The measure of good instruction isn't whether the student got the answer. It's whether, next time, they need less help to get it themselves. Those are different goals, and current AI tutors are almost entirely optimizing for the first one.

The result is a tool that feels helpful, gets used constantly, and leaves a student no better at reasoning than when they started.

What "Helping" Looks Like When It Isn't

Here's a scenario that plays out thousands of times a day in middle and high school classrooms.

A student is working through a history essay: "Was the New Deal a success?" They get stuck on their thesis. They open their AI tutor. The tutor asks a few scaffolding questions, then suggests: "You could argue that the New Deal succeeded in providing short-term economic relief even if it didn't fully resolve the Depression."

The student writes that down. It's a decent thesis. They move on.

What happened? The student was presented with a cognitive task — forming a defensible argument about a contested historical question — and an AI removed that task from their plate before they had to do it. The student experienced no productive struggle. They didn't have to weigh competing interpretations, test tentative claims, or discover that their first instinct was too simple. They got a thesis.

The student learned how to ask an AI for a thesis. That's the skill they practiced.

Now multiply that by every homework assignment for two years. Students who consistently use AI tutors to resolve cognitive difficulty are systematically avoiding the exact situations where thinking develops. They arrive at correct answers having done less and less of the reasoning required to get there.

The tutors aren't failing by producing wrong answers. They're failing by producing correct ones too efficiently.

The Scaffolding Trap

ThinkingEngine helps teachers run Socratic discussions at scale. See how it works →

To be fair to Khanmigo and tools like it: they're not simply giving students answers. They scaffold. They ask questions. They prompt students to think before revealing the next step. The design philosophy is Socratic.

But there's a critical difference between Socratic questioning and scaffolded answer-delivery, and most AI tutors have landed on the wrong side of it.

Socratic questioning works by refusing to let the student off the hook. The teacher doesn't guide students toward an answer — they challenge every answer the student offers, including correct ones. "That's one way to look at it. What's the strongest argument against your position?" The student is never done. The goal is not arrival at a correct answer but deepening engagement with the question.

Scaffolded answer-delivery works differently. The AI breaks the task into manageable steps. It hints, it confirms, it encourages. It moves the student toward the target answer along a path the AI already knows. The student's job is to follow the path. They reach the answer, and the interaction ends.

One develops reasoning. The other delivers outcomes.

The practical difference shows up when students face novel problems — the kind not on any scaffold. Students who've been led to answers are worse at those than students who've been required to defend their own reasoning at every step. The scaffolding taught them to follow. It didn't teach them to think.

Khanmigo Is Doing What It Was Built To Do

Khanmigo is a tutoring product. Tutoring has always been about helping students clear specific obstacles and get through specific material. It's a completion service. The benchmark is: did the student understand the concept well enough to move forward?

That's a legitimate goal. For students who are behind, for content that is genuinely prerequisite, for skills that need mechanical practice before application — a tool that helps students reach correct understanding efficiently is valuable.

The problem isn't that Khanmigo exists. The problem is treating a tutoring tool as a thinking tool. They serve different purposes. A tutor is optimized to resolve confusion. A thinking coach is optimized to create productive confusion — to put the student in situations where they have to reason harder, not less.

Those two goals are in direct tension. You can't simultaneously minimize the difficulty of reaching an answer and maximize the reasoning required to get there.

Right now, most schools using AI tutors are using completion tools as thinking development tools. That's the mismatch. And because the tool produces correct answers and happy students, the mismatch is invisible — until you look at whether students can reason without the tool.

What Teaching Thinking Actually Requires

Thinking develops under specific conditions. Not all cognitive activity produces it — there's a difference between retrieving a memorized answer, following a procedure, and constructing a reasoned argument from incomplete information.

The research on critical thinking development is fairly consistent: thinking improves when students are required to produce reasoning under conditions of genuine uncertainty, where their reasoning is challenged and they have to defend or revise it.

Three elements: uncertainty, production, challenge.

AI tutors undermine all three.

Uncertainty requires that the student doesn't know where they're headed. The moment an AI tutor starts scaffolding toward a known answer, uncertainty collapses. The student is following a path, not exploring a problem space.

Production requires that the student generates the reasoning, not just confirms it. An AI that offers a thesis for the student to evaluate and adopt has eliminated the production step. The student responds to reasoning they didn't construct.

Challenge requires that the student's reasoning face real opposition — questions they don't have ready answers to. AI tutors are designed to be affirming. They're not adversarial. They celebrate progress. That's good for motivation and disastrous for thinking development, because thinking that is never challenged doesn't strengthen.

This isn't an argument against using AI in education. It's an argument for understanding what each tool is actually doing to student cognition — and designing instruction accordingly.

The Difference in Practice

Here's what the gap looks like concretely.

AI tutor (answer-oriented):

Student: "I don't understand why the French Revolution happened."

AI: "Great question! The French Revolution had several key causes. First, financial crisis: France was nearly bankrupt after the Seven Years' War and supporting the American Revolution. Second, food shortages: poor harvests in the 1780s led to widespread hunger. Third, social inequality: the Third Estate bore most of the tax burden while nobles were exempt. Does that help clarify things?"

The student has been given a list. They didn't have to reason about anything. They can now write down three causes on a worksheet.

Thinking coach (reasoning-oriented):

Student: "I don't understand why the French Revolution happened."

AI: "What do you already know about how revolutions start? What conditions do you think have to exist before people decide a violent uprising is worth the risk?"

Student: "People have to be really angry?"

AI: "Okay — what makes people angry enough to risk dying? And why wouldn't they just leave, or petition, or protest peacefully instead?"

Now the student is constructing a theory of political violence from first principles. When they later encounter the actual causes of the French Revolution, they'll understand why each one mattered — because they've already reasoned about the conditions that make revolution possible. The content connects to a framework they built.

The first interaction produces correct notes. The second produces understanding.

For more on structuring Socratic questioning like this, see How to Run a Socratic Discussion That Doesn't Suck.

Why Teachers Are Noticing This

There's growing frustration among teachers who've watched AI tutors spread through their schools, and it's worth taking seriously.

The pattern they describe: students complete more homework, get better scores on practice exercises, arrive to class having "done" the reading or the problem set — and then can't answer basic questions about the material in discussion. They can't defend a thesis they wrote. They can't extend a concept to a new context. They completed the work. They didn't learn the material.

The proxy metric (homework completion, practice score) looks fine. The actual metric (can this student reason about this content independently?) is worse than before.

This is predictable. If you optimize for correct answers at homework time, you get correct answers at homework time. You don't get better reasoning. Those are different things, and treating them as the same is the core error.

Teachers who have seen this play out are starting to treat AI tutor use the same way they treat ghostwriting: it produces the artifact without producing the learning. The student who had an AI tutor write their thesis hasn't learned to write theses. The student who had an AI tutor explain every step of their math problem hasn't learned to think through problems.

See Your Students Are Already Using AI. Here's How to Use That Against Them for practical ways to respond to this in your classroom.

What an AI That Coaches Thinking Looks Like

The distinction isn't whether AI is involved — it's what the AI is doing to the student's cognition.

An AI that coaches thinking has a different design goal: not to resolve confusion, but to require the student to reason their way through it. The AI's job is to refuse to let the student off the hook — to keep asking harder questions, to challenge every claim, to require defense before acceptance.

That looks less like a tutor and more like a rigorous discussion partner. It's often uncomfortable. Students who are used to AI tutors giving them answers find it frustrating that this AI won't just tell them. That frustration is the point — it's the friction that makes thinking happen.

Specifically, this means:

No answer-giving. If a student asks for the answer, the AI redirects to the reasoning process.
Challenge by default. When a student gives an answer, the AI challenges it — even if it's correct — to force deeper justification.
No rescue. When a student is stuck, the AI doesn't scaffold toward the answer. It asks them what they do know, what they're uncertain about, and what they'd try first.
Session transcripts. The teacher can see the actual reasoning — not just the final answer, but how the student got there, where they faltered, what claims they couldn't defend.

That last point matters as much as the others. The goal isn't to produce better answers in isolation. It's to make thinking visible, so teachers can coach it.

ThinkingEngine is built specifically for this. Students work through structured Socratic dialogue — one conversation at a time — and you see the transcript. Not a score. Not a completion rate. The actual reasoning.

The Right Tool for the Right Goal

AI tutors have a place. For students who need content explained, for foundational skills that need drilling, for resolving specific confusion about a concept before class — tools like Khanmigo are useful.

But if your goal is developing students who can reason independently, construct arguments, and think through novel problems without collapsing to the nearest AI — tutoring tools work against you. You need something that requires more from students, not less.

The question to ask about any AI in your classroom: Is this requiring students to reason, or removing the requirement?

Answer that honestly, and the tool selection becomes clear.

What This Looks Like With ThinkingEngine

ThinkingEngine runs Socratic dialogue with your students — not scaffolded tutoring. Students work through a question, defend their claims, and face follow-up challenges they didn't anticipate. The AI never gives them the answer. It requires them to find it.

You get session transcripts for every student. You can see who reasoned carefully and who deflected. You can coach based on what actually happened in the conversation, not what appeared on a final product.

For IB, AP, and honors classes where reasoning quality is the point — this is the tool built for that goal. Try a free session — no account required.

Teaching Critical Thinking with AI: A Practical Guide for Teachers — How to build reasoning skills at scale without outsourcing the thinking
How to Design Assignments AI Can't Do For Your Students — Assignment design that requires genuine reasoning, not just correct outputs
Your Students Are Already Using AI. Here's How to Use That Against Them — Turning the AI problem into a teaching tool

Ready to bring critical thinking into your classroom?

ThinkingEngine guides students through Socratic dialogue — questions that build reasoning, not recall. Free to start, no setup required.

Start Free →