Marking Consistency

A few weeks before Christmas Ofqual published important research relating to marking consistency and online standardisation, as well as a survey of examiners. The research entitled “Marking Consistency Metrics – An Update”, is a damning inditement of the current public examination system. Whilst I welcome Ofqual’s openness in publishing such data, I am sure parents and pupils alike will understand my concerns at this evidence that the current examination system is broken.

The key data shows the probability of a young person receiving the “definitive” grade. The “definitive” grade is defined as that which the Chief Examiner would award. The research looks at a range of popular subjects across the 2017 GCSEs, AS Levels and A Levels. It is important to note that this is qualification-level data with only results from samples of components or modules within a qualification (eg: paper 1 in a two-paper exam) being investigated.

There is an extremely wide variation in reliability of grading between subjects and, in particular, an unacceptably low percentage of reliable grades in humanities subjects (disproportionately taken by girls). This includes core subjects. For example, the probability of receiving the definitive/correct grade is reported as:

  • History: approx 56%

  • English Literature: approx 58%

On average, approximately a quarter of grades are unreliable according to Ofqual’s methodology; one in four for girls and one in five for boys. This amounts to millions of grades over time. In my opinion these findings are nothing short of scandalous.

Interpreting the graph:

  • The vertical axis represents a mix of A Level, AS and GCSE subjects from Summer 2017. These are therefore a mix of linear and modular qualifications;

  • The horizontal axis (0.0 to 1.0) represents 100 children, or 100%;

  • The key information lies in the dark blue boxes, which represent the actual qualification level results obtained in each subject;

  • Follow the vertical line within the dark blue box to the horizontal bar below to discover the probability of a grade awarded in that subject being the same as the grade awarded by the Chief Examiner.


Figure 1 : Marking Consistency Metrics, an update, (p21, fig 12)


Teachers that there can never be a definitive right mark for any answer, especially in Humanities subjects. A range of possible marks might be correct, therefore Ofqual introduced tolerances. Externally, this has been referred to as “fuzzy marking within hard grade boundaries”.

Headmasters have consistently maintained that the large-scale allocation of unreliable grades among candidates is random, arbitrary and unfair, and have reminded Ofqual of its regulatory duty to report accurately the achievement of each candidate and introduce positive regulatory reform. Given the high stakes for individuals reliant on grades to progress to A Level or university study, there is, at least, a need to educate schools, Higher Education and employers about their reliability.

In recent years Ofqual have introduced a number of rule changes designed to discourage pupils from appealing their results. This strategy has been very successful but cannot hide the fact that exam marking is inaccurate. In the summer of 2018, research suggests that 5.5% of all GCSE grades were challenged and 1.1% of grades awarded were changed. This represents a staggering total of 67,195 increases by one grade and 1,660 increases by two or more grades.