logo

Accuracy

Accuracy

Accuracy You Can Trust

We believe in transparency. Rather than making vague claims, we publish real accuracy data — tested against official examiner marks — so you can decide for yourself.

Verified against official IGCSE & GCSE examiner results
Quadratic Weighted Kappa — the gold standard
0.97
The metric exam boards use to measure marker agreement. Above 0.80 is near-perfect. The accepted threshold for AI marking systems is 0.70. Graded Pro scores 0.97 across 387 questions spanning maths and English — and a Wilcoxon signed-rank test confirms no statistically significant difference from the examiner (p = 0.12).

How We Compare

Metric Graded Pro Human Markers*
Quadratic Weighted Kappa 0.97 Varies by subject
Correlation with examiner 0.97 ~0.70
Average error — structured questions 0.26 marks Not published
Average error — essay questions ~2 marks 5.6 marks
Statistically different from examiner? No (p = 0.12) Yes

*Human marker data from a Cambridge Assessment study: 200 English scripts marked by a chief examiner were independently re-marked by experienced markers. Graded Pro results are based on 387 questions across IGCSE Higher Maths (13 students) and GCSE English Language Paper 2, compared against official examiner marks. No mark schemes or student work were adjusted in any way.

Results by Subject

All results are from real examination papers, compared against the actual marks awarded by the official examiner. The only inputs were the students' work and the official mark scheme — nothing was adjusted or modified.

Mathematics
IGCSE Higher · 13 students · 312 questions
QWK 0.97
Exact match 79%
Within ±1 mark 94%
Within ±2 marks 99.7%
Average error 0.27
Correlation 0.97
Aa
English Language
GCSE Paper 2 · 75 questions
QWK 0.97
Exact match 65%
Within ±1 mark 83%
Within ±2 marks 89%
Average error 0.96
Correlation 0.98

Structured Questions

Our system excels on questions with defined correct answers — the kind that make up the majority of assessments. Across 356 structured questions in both maths and English:

0.97
Quadratic
Weighted Kappa
Near-perfect agreement
80%
Marks identical
to the examiner
356 questions
95%
Within ±1 mark
of the examiner
Across subjects
0.26
Average error
per question
Across subjects

Whether it's a 1-mark calculation or an 11-mark multi-step problem, the AI consistently matches professional marking standards.

Extended Writing & Essays

Levelled questions — where markers use band descriptors to assess quality — are harder for any marker, human or AI. Our system uses a structured levelling process modelled on how trained markers work: identify the best-fit level, then position within it.

  • Average error of around 2 marks on levelled questions
  • QWK of 0.95 on levelled questions alone
  • 97% correlation with professional markers across all question types
  • Significantly outperforms the average experienced human marker (MAE 2.1 vs 5.6)

What This Means For You

AI marking is not a replacement for your professional judgement — it's a tool that handles the heavy lifting so you can focus on what matters.

Where the AI is strongest

Short-answer questions, calculations, retrieval tasks, and structured responses across all subjects. On these question types, the AI marking is highly reliable and ready to use as-is.

Where you should review

Extended writing and essay-style responses at the very top of the mark range. The AI occasionally under-marks the strongest responses by a few marks. A quick review of your highest-performing students' work is good practice.

What we recommend

Use AI marking to get a fast, accurate first pass across a full class set. Moderate a sample — just as you would with any marking — and adjust where needed. Teachers who use this approach typically report saving 50–70% of their marking time.

Works With Any Mark Scheme

Our system isn't locked to specific subjects or curricula. Upload any mark scheme or rubric and the AI adapts — whether you're marking a maths paper, a history essay, a science practical write-up, or a language analysis.

Not Just Exams

Our accuracy benchmarks are based on formal examination papers, but Graded Pro is built for everyday marking across all types of student work. The same AI that matches chief examiner standards on exam scripts delivers consistent, rubric-linked feedback on:

  • Homework — weekly assignments marked and returned the same day, with actionable next steps
  • Classwork and in-class tasks — quick, consistent feedback while the learning is still fresh
  • Termly tests and mock exams — full cohort marking with detailed breakdowns by question
  • Coursework drafts — formative feedback that helps students improve before final submission
  • Past paper practice — students get instant, exam-standard feedback on every attempt

Wherever there's a rubric or mark scheme, Graded Pro delivers accurate, detailed feedback — whether the stakes are high or the goal is simply helping students learn from their work.

Our Commitment

We continuously test and improve our marking accuracy. We don't claim perfection — no marker, human or AI, achieves that. What we do promise is transparency about where the system performs well and where it has limitations, so you can use it with confidence.

See For Yourself

Sign up for a free trial with 150 free credits and test it on your own papers.

Start Free Trial

No credit card required