ALGORITHMIC GRADING

ALGORITHMIC GRADING

Scope of Use (estimation)
  • Algorithmic grading has been replacing International Baccalaureate (IB) final examinations in 2020
  • For IB, over 200,000 students from over 3000 schools worldwide were graded using an algorithm
  • Algorithmic grading has been replacing A-level examinations in the United Kingdom in 2020
  • For A-levels, over 800,000 students were graded using an algorithm
  • Full scope of use at national levels or for other curricula is difficult to establish
Technological robustness and efficacy
  • Predicting grades accurately and justly for every single student is per definition counter-factual and therefore nearly impossible task
  • The algorithmic models are set to predict the likely distribution of grades, not assessing the performance of a student on his/her own merits
  • In the test-phase, where the results of the algorithm were compared with actual grades, the accuracy of the model used for the “A-levels algorithm” was low – in the range of 50 – 60%
Impact on citizens and society (blue=pos, red=neg)
  • Unjust or incorrect grading has detrimental effects on the life opportunities of students as they determine their future education/career
  • AI algorithm considered past success of the school as a variable, favoured better (and usually private) schools over less successful (usually public) schools, risking deepening inequalities
  • Generalizing grading by taking into account variables beyond the student’s own merits.
Governance and Accountability
  • Submission to algorithmic grading in the domain of education is usually not voluntary
  • For the evaluated case, the deployment of algorithmic grading took place with involvement of key stakeholders such as teachers, students, student’s parents
  • For the evaluated case, a vast governance structure was set up
Acceptable trade-offs in times of crisis

Despite the inability to hold on-premises examinations due to the Corona-crisis and the obvious wish to have as many students as possible have their exams other options could have been examined and used. For example by looking at student’s past work, teacher evaluation, or introduction of alternative assessment methods (take-home exams). Algorithmic grading justifiably inflicts a sense of arbitrariness and injustice in grading. We see no acceptable trade-offs, in particular because of the low accuracy of the system, the impact on the law and several human rights.

The Ofqual Grading Algorithm

In the spring of spring 2020, due to the ongoing pandemic and consequent school closures, A-level students in the UK faced a novel situation of being awarded grades generated by an algorithm as administered by Office of Qualifications and Examinations Regulation (Ofqual). The inputs of the model included past work of the students, teacher’s predicted grade, past success of the school and ranking of the students within the school. After the results were published, the use of the algorithmic grading system caused a popular uproar as the method was deemed unjust. Almost 40% of students received grades lower than predicted, sparking public outcry and legal action. Ultimately, the Ofqual made a U-turn and grades were re-issued, based solely on teacher judgment.

We have tested the technology against the 24 requirements of the “Framework for Responsible AI in times of Corona”, divided into 3 categories: (i) Society; (ii) Technology; and (iii) Governance.

Each requirement is given a score from ‘0’ to ‘3’:

0 = unknown

1 = no compliance

2 = partial compliance

3 = full compliance

USE CASE

Society

  • Algorithmic grading sets a dangerous precedent of how algorithms can be applied to decision-making that might have detrimental effects on a person’s life prospects and opportunities
  • Incorrect or discriminatory grading impacts the human right to education 
  • Considering past success of a school to grade individual students has shown to be discriminatory towards students of disadvantaged areas discrimination based on socio-economic status
  • Algorithmic grading runs against the legal principle of individual autonomy by including variables that are based on performance of other students in the past
  • Changing the criteria for assessment at the chalk line runs against legal principle of prohibition of retrospective, ex post facto, law-making
  • The algorithmic grading system did not seem to allow for human judgement or discretion
  • According to Ofqual, information is available that explains how grades were calculated
  • Counter-factual prediction of student academic knowledge, skills and understanding is by definition unfair, as it does not emphasize student’s actual performance
  • Assessing grades based on school’s past performance advantaged private and generally well-performing schools
  • By placing emphasis only on past assignments, students who have progressively developed during their studies had no opportunity to demonstrate their knowledge
  • Accessibility was explicitly dealt with in the preparatory phases
  • The appeal programs (if any) usually (at least in the UK and IB example) included a fee, hence making the access to AI-system unequal for students with different economic backgrounds

Technology

  • The problem definition: grading when real life examination is not possible 
  • Broadly speaking the decision was made between the ‘statistical’ grading approach and the ‘personal’ grading approach:
    • Ofqual: “Therefore, we believe that an approach placing more weight on statistical expectations is appropriate and most fair to students, particularly in light of Ofqual’s statutory objective to maintain standards over time.”
  • Unknown whether alternative solutions were researched or investigated
  • The algorithmic grading model used for the A-levels was highly inaccurate. It predicted the correct grade in around 60% of cases, meaning that 40% of students were graded incorrectly.
  • Adverse effects (e.g. bias, exclusion, unfair outcomes) were identified
  • Level of resilience to (cyber)attacks: unknown
  • Eventually the outcomes were an inaccurate reflection of a student’s actual performance
  • This was primarily due to the chosen inputs/features
  • Grading of an individual’s performance is inherently personal so the level of generalisability is limited
  • While the system could be easily intervened with, it is unknown whether there usually a specific ‘human oversight’ structure in place

Governance

  • Legal basis for instructing Ofqual to arrange for calculated grades was found in the Apprenticeships, Skills, Children and Learning Act 2009
  • Ofqual put in place regulatory sturctures/amendments to facilitate the proces of algorithmic grading
  • Because of school closures and cancellation of sit in exams a different form of grading was necessary
  • The chosen form of algorithmic grading might not have been a proportionate measure
  • The domain and purpose of the system were well defined and limited
  • The procedure towards the process of algorithmic grading was transparent
  • Consultations of teachers, students and parents and carers took place
  • Students were informed through letters
  • No voluntary submission to the application other than avoiding the metrostation all together
  • “Shaking head” is insufficient to establish non-consent (CNIL)
  • Unknown if a specific sunset clause was set
  • Teachers, students, parents and carers were consulted
  • No public information on whether the specifics and workings of the system were documented for accountability purposes