In July 2020, the UK government withdrew a system that used Artificial Intelligence (AI) to screen visa applications, after claims it was automatically rejecting applicants of certain nationalities. This was the latest high-profile example of apparently “racist” AI, such as the Microsoft chatbot that made anti-Semitic remarks, or the Google online photo service that labelled African-American faces as gorillas.
To understand why these happen, we need to first understand how AI makes decisions such as rejecting applications or composing responses to questions. This doesn’t need technical knowledge, only familiarity with AI’s underlying concepts. In the case of inadvertent racism, three concepts are key: data, algorithms and machine learning.
The Importance of Data
AI works by finding patterns in data, making inferences relating to its purpose, and applying them to new data. For example, recruitment AI analyses data about employees, identifying characteristics of successful ones. It then searches for these characteristics in candidates.
This process of finding patterns in data is fundamental to AI. It relates to inadvertent racism because if the data containing these patterns is biased, an AI system built by analysing it could contain the same biases. For example, if most of the firm’s offices are in India, most employees will be Indian. Therefore a list of the most successful employees may contain mostly Indians - but of course it doesn’t follow that being Indian is a sign of a successful employee.
This level of bias is straightforward to remove, as long as the AI team is aware of it - for example by ensuring equal representation of nationalities. However other biases in data may be less straightforward to identify and are easier to overlook. Spotting these requires human as well as a statistical understanding of data, and in the realm of Data Science.
A critical role of Data Scientists is investigating, understanding and preparing data, so that 2 the possibility of data bias is removed.
What Algorithms Do
“Algorithm” refers to the logic used in AI systems to make decisions and draw conclusions from data. For example, AI surveillance uses at least two algorithms: spotting human faces in images; and matching those faces against a database, such as employees authorised to enter a building.
Algorithms work by performing complex mathematical and statistical operations on data to:
•detect patterns in the data;
•attribute meaning to those patterns by comparison with other patterns, and
•make decisions or draw conclusions to help solve a problem or perform a function.
For example, an AI recruitment system might use algorithms to extract relevant information from candidates’ applications, highlighting those with characteristics that indicate a strong fit. The algorithms will contain mathematical and statistical representations of those characteristics.
The relationship to possible racism lies in potentially biased decision-making by an algorithm, even using unbiased data. Algorithms typically generate a result based on many factors and steps, so an undetected bias in one of these may only create a small bias in the overall AI system. This may not even be apparent in the overall results initially.
For example, a hypothetical Indian IT firm’s AI recruitment system might include “number of languages spoken” as a factor in finding good candidates, because the firm considers language proficiency a potential indicator of programming skill. When first used, it may be that the algorithm finds good candidates effectively, so is considered a success. However, over time such a system could turn out to have an inadvertent bias against British applicants, albeit a small one.
The reason this might happen lies in the fact that Indian applicants may be statistically more likely to speak two or three languages (English, Hindi and a State language), whereas British applicants might be more likely to speak only one. Thus, using language proficiency as a 3 factor in the algorithm may inadvertently create a small pro-Indian hiring bias that’s not initially apparent.
So it’s not just data that must be unbiased, but also the algorithms using it. Such biases may only become apparent during use, especially small ones that only arise over time and use. Again, the role of data scientists is to prevent such issues from the outset.
How Machine Learning “Improves” AI Results Over Time
Machine Learning is an AI that improves its own results over time. It does this by trying out algorithm changes, incorporating those that improve the quality of its results. Those changes are made automatically, continually optimising the overall results of the AI system against a specific measure of success. An example could be a system that uses AI to decide loan applications and uses machine learning to continually improve loan default rate by tweaking the algorithm it uses to approve or reject applications.
The link to inadvertent racism is that if a machine learning system makes a change that improves results but happens to be racist, the racism won’t become apparent until it’s reflected and spotted in the overall results. In the loan application system above, data scientists may well have ensured race isn’t an explicit loan approval factor by removing any data about applicant nationality.
However, if over time the system finds that applicants from certain pin (or zip or post) code default more often than others, it may adjust its algorithm to approve fewer applications from those areas. What a human would realise but an AI system may miss, is that those applications come from more socially deprived areas, with higher proportions of BAME (Black, Asian and Minority Ethnic) residents.
Obviously, this adds complexity as well as a delicacy to the situation and requires much more consideration than simply optimising loan approvals to minimise default rates. But machine learning isn’t yet designed - or even able - to apply the judgement needed for such situations.
The examples in this article have been deliberately simplified, to ensure clarity of the underlying concepts. In practice, the issues described would be picked up and prevented by any experienced data scientist or AI team. However, less obvious versions of the same kind of problem do get missed, which is ultimately why some AI systems are inadvertently racist.
AI is all around, often without us realising. With its significant benefits come difficulties and dilemmas, such as those above. You can learn about AI in everyday life, how it works, and what we can do about its challenges, in the new my new book AI and Machine Learning