How ThinkingEngine Scores Student Reasoning
A guide to understanding your students' reasoning scores — what they measure, how they break down, and how to use them in your classroom.
How the Scoring Works
Every time a student completes a dialogue session, ThinkingEngine generates a detailed reasoning score. The score isn't a grade — it's a map of where the student's thinking is strong and where it needs more support.
Here's the dashboard you see after students complete a session:
Each student's score is built from five dimensions of reasoning, with a composite score that represents the overall quality of their thinking.
The scoring process has four steps:
Socratic
dialogue
analyzed for
reasoning
scoring
shows scores
to teacher
The Five Scoring Dimensions
Every session is scored across five dimensions of reasoning. These five dimensions are consistent across all session types. They were chosen because together they represent the core moves a skilled reasoner makes — regardless of the topic.
When you review a student's score profile, you're not looking for a single number — you're looking at the shape of the profile. A student who scores 8.8 in Evidence but 5.2 in Alternatives is a different thinker than one who scores 7 across all five dimensions. Both can grow, but the coaching is different.
Understanding Your Scores: Three Tiers
Each dimension is scored on a 0–10 scale. The scores aren't letter grades — they're diagnostic markers that tell you what kind of support a student needs next. Here's how to read them:
Needs Support (0–3.9): The student is reasoning at a surface level. They might be jumping to conclusions, missing counter-arguments, or relying on intuition rather than evidence. This doesn't mean they're not capable — it means the session surfaced the gaps clearly.
Developing (4–6.9): The student is doing real reasoning but inconsistently. They might have good evidence but miss the alternatives, or see the assumptions but not follow the logic through. This is where most students live most of the time, and where the most growth happens.
Strong (7–10): The student is reasoning at a genuinely sophisticated level. They're integrating evidence, considering alternatives, and following the logic of their position. A high score doesn't mean they're done — it means they're ready for harder questions.
Composite score: The overall score shown on the dashboard is the simple average of the five dimension scores, rounded to one decimal place. It gives you a quick summary, but the dimension breakdown is where the real coaching happens.
Socratic Dialogue vs. Inquiry-Based Exploration
ThinkingEngine has two session types. They use different scoring dimensions because they're measuring different kinds of reasoning.
Socratic Dialogue scores six dimensions. The sixth — Argument Parsimony — measures whether a student uses the simplest reasoning necessary to support their claim, rather than piling on unnecessary complexity. In philosophy and ethics discussions especially, this matters: the student who makes the tightest argument is often the one who understands the problem most clearly.
Inquiry-Based Exploration uses five dimensions focused on scientific and investigative reasoning — how well students gather evidence, generate hypotheses, evaluate those hypotheses against data, and defend their conclusions.
Both session types score Assumption Identification. It's the one dimension that appears in both rubrics because the ability to name what's being taken for granted is foundational to reasoning in every domain.
What to Look For: A Response Ladder
To understand what scores actually look like in practice, it helps to see a single dimension across five levels of student reasoning. Here's Evidence Integration, in a discussion about whether the city should ban plastic bags:
The jump from level 1 to level 2 is about adding a qualifier ("bad for the environment" instead of just "bad"). The jump from level 3 to level 4 is about acknowledging the counter-evidence. The jump from level 4 to level 5 is about naming the specific values in tension — which environmental harm you're trying to protect — and making the reasoning framework explicit.
What you don't see in this ladder: appeals to authority ("experts say...") without specifics, or evidence cited without connection to the claim. Those score lower even if the underlying point is right.
What to Do After You See the Scores
The score tells you where the gap is. What you say to close it is the teaching. Here are diagnostic questions mapped to the five dimensions — use them in one-on-one conferences, small groups, or whole-class discussions:
Use the dimension-specific questions in individual conferences. Use the whole-class question when you notice a pattern across several students — it turns a score into a lesson.
The scores are not for ranking students. They're for identifying where each student needs to go next. A score of 4.2 on Alternatives is not a failure — it's a precise target for your next conversation with that student.
A Suggested Classroom Routine
Here's one way to integrate ThinkingEngine into your week without adding hours of prep. This is a starting point — adapt it to your content and pacing:
The second session on the same topic is where the real learning happens. Students who got a 5.2 on Alternatives the first week often get a 7.4 the second week — not because the topic was easier, but because they now know what to look for. The AI doesn't change its questions; the student changes their reasoning.
You're not building sessions. You're building a reasoning habit.