ALGORITHMIC GRADING

Scope of Use (estimation)

Algorithmic grading has been replacing International Baccalaureate (IB) final examinations in 2020
For IB, over 200,000 students from over 3000 schools worldwide were graded using an algorithm
Algorithmic grading has been replacing A-level examinations in the United Kingdom in 2020
For A-levels, over 800,000 students were graded using an algorithm
Full scope of use at national levels or for other curricula is difficult to establish

Technological robustness and efficacy

Predicting grades accurately and justly for every single student is per definition counter-factual and therefore nearly impossible task
The algorithmic models are set to predict the likely distribution of grades, not assessing the performance of a student on his/her own merits
In the test-phase, where the results of the algorithm were compared with actual grades, the accuracy of the model used for the “A-levels algorithm” was low – in the range of 50 – 60%

Impact on citizens and society (blue=pos, red=neg)

Unjust or incorrect grading has detrimental effects on the life opportunities of students as they determine their future education/career
AI algorithm considered past success of the school as a variable, favoured better (and usually private) schools over less successful (usually public) schools, risking deepening inequalities
Generalizing grading by taking into account variables beyond the student’s own merits.

Governance and Accountability

Submission to algorithmic grading in the domain of education is usually not voluntary
For the evaluated case, the deployment of algorithmic grading took place with involvement of key stakeholders such as teachers, students, student’s parents
For the evaluated case, a vast governance structure was set up

Acceptable trade-offs in times of crisis

Despite the inability to hold on-premises examinations due to the Corona-crisis and the obvious wish to have as many students as possible have their exams other options could have been examined and used. For example by looking at student’s past work, teacher evaluation, or introduction of alternative assessment methods (take-home exams). Algorithmic grading justifiably inflicts a sense of arbitrariness and injustice in grading. We see no acceptable trade-offs, in particular because of the low accuracy of the system, the impact on the law and several human rights.

The Ofqual Grading Algorithm

In the spring of spring 2020, due to the ongoing pandemic and consequent school closures, A-level students in the UK faced a novel situation of being awarded grades generated by an algorithm as administered by Office of Qualifications and Examinations Regulation (Ofqual). The inputs of the model included past work of the students, teacher’s predicted grade, past success of the school and ranking of the students within the school. After the results were published, the use of the algorithmic grading system caused a popular uproar as the method was deemed unjust. Almost 40% of students received grades lower than predicted, sparking public outcry and legal action. Ultimately, the Ofqual made a U-turn and grades were re-issued, based solely on teacher judgment.

We have tested the technology against the 24 requirements of the “Framework for Responsible AI in times of Corona”, divided into 3 categories: (i) Society; (ii) Technology; and (iii) Governance.

Each requirement is given a score from ‘0’ to ‘3’:

0 = unknown

1 = no compliance

2 = partial compliance

3 = full compliance

Read the Framework

USE CASE

Society

1. Impact on Society (1)

Algorithmic grading sets a dangerous precedent of how algorithms can be applied to decision-making that might have detrimental effects on a person’s life prospects and opportunities

2. Human Rights (1)

Incorrect or discriminatory grading impacts the human right to education
Considering past success of a school to grade individual students has shown to be discriminatory towards students of disadvantaged areas discrimination based on socio-economic status

3. Democracy (1)

Algorithmic grading runs against the legal principle of individual autonomy by including variables that are based on performance of other students in the past
Changing the criteria for assessment at the chalk line runs against legal principle of prohibition of retrospective, ex post facto, law-making

4. Legal Compliance (1)

Breach of Apprenticeships, Skills, Children and Learning Act 2009 demanding a reliable indication of the knowledge, skills and understanding of the student
Breach of a range of anti-discrimination legislation
Potential breach of art. 22 (2C) of the GDPR

5. Human Agency (1)

The algorithmic grading system did not seem to allow for human judgement or discretion

6. Explicability (3)

According to Ofqual, information is available that explains how grades were calculated

7. Fairness (1)

Counter-factual prediction of student academic knowledge, skills and understanding is by definition unfair, as it does not emphasize student’s actual performance
Assessing grades based on school’s past performance advantaged private and generally well-performing schools
By placing emphasis only on past assignments, students who have progressively developed during their studies had no opportunity to demonstrate their knowledge

8. Accessibility (2)

Accessibility was explicitly dealt with in the preparatory phases
The appeal programs (if any) usually (at least in the UK and IB example) included a fee, hence making the access to AI-system unequal for students with different economic backgrounds

Technology

9. Problem Definition (3)

The problem definition: grading when real life examination is not possible

10. Solution Optimization (2)

Broadly speaking the decision was made between the ‘statistical’ grading approach and the ‘personal’ grading approach:
- Ofqual: “Therefore, we believe that an approach placing more weight on statistical expectations is appropriate and most fair to students, particularly in light of Ofqual’s statutory objective to maintain standards over time.”
Unknown whether alternative solutions were researched or investigated

11. Effectiveness (1)

The algorithmic grading model used for the A-levels was highly inaccurate. It predicted the correct grade in around 60% of cases, meaning that 40% of students were graded incorrectly.

12. Adverse Effects (3)

Adverse effects (e.g. bias, exclusion, unfair outcomes) were identified

13. Security (0)

Level of resilience to (cyber)attacks: unknown

14. Accuracy (1)

Eventually the outcomes were an inaccurate reflection of a student’s actual performance
This was primarily due to the chosen inputs/features

15. Generalisability (1)

Grading of an individual’s performance is inherently personal so the level of generalisability is limited

16. Human Oversight (0)

While the system could be easily intervened with, it is unknown whether there usually a specific ‘human oversight’ structure in place

Governance

17. Legal Basis & Policy (3)

Legal basis for instructing Ofqual to arrange for calculated grades was found in the Apprenticeships, Skills, Children and Learning Act 2009
Ofqual put in place regulatory sturctures/amendments to facilitate the proces of algorithmic grading

18. Necessity & Proportionality (2)

Because of school closures and cancellation of sit in exams a different form of grading was necessary
The chosen form of algorithmic grading might not have been a proportionate measure

19. Domain & Purpose Limitation (3)

The domain and purpose of the system were well defined and limited

20. Transparency (3)

The procedure towards the process of algorithmic grading was transparent
Consultations of teachers, students and parents and carers took place
Students were informed through letters

21. Voluntariness (1)

No voluntary submission to the application other than avoiding the metrostation all together
“Shaking head” is insufficient to establish non-consent (CNIL)

22. Sunset Clause (0)

Unknown if a specific sunset clause was set

23. Stakeholder Involvement (3)

Teachers, students, parents and carers were consulted

24. Accountability (0)

No public information on whether the specifics and workings of the system were documented for accountability purposes

Assessment preparation: Christofer Talvitie

CONTACT

ALLAI
Amsterdam Science Park 900
1098 XH Amsterdam
The Netherlands
welkom@allai.nl

ALGORITHMIC GRADING