Tech

Artificial Intelligence: six biases to avoid in the next algorithm

As ChatGPT, Dall-E, and many other AI algorithms have crept into our lives, awareness of their potential consequences has grown. In the first place of practices (eg, can an algorithm take my job?) but secondly, of ethics. And in this sense, more and more researchers are wondering if it is possible to develop algorithms capable of avoiding the most biases that we human beings have. In other words: can we create an AI that is fair, charitable and empathetic…or are we doomed to repeat our mistakes over and over again?

Two MIT researchers have started from this premise, who in their paper “A Framework for Understanding Sources of Harm throughout the Machine Learning Life Cycle” identify six biases that are usually found in the data with which the models are trained. machine learning, and that can lead them to repeat undesirable behavior patterns, such as racism, underrepresentation of diversity, exclusion for socio-economic reasons or cultural uniformity. Harini Suresch and John Guttag, authors of this study, identify the following biases that should be avoided by data scientists and AI experts.

historical bias

It is considered that an artificial intelligence model produces historical biases, when, although it is trained with measurable and contrasted data sets, it produces incorrect results, when interpreting the present in terms that may belong to the past.

This type of bias is what causes some AI models to reproduce stereotypes that are harmful to a part of the population and that do not conform to reality. In it paper For example, published by MIT, it explains how a financial institution that uses an AI algorithm to predict the payment capacity of its customers could “learn” that it is better not to grant loans to African-Americans, based solely on the data ( and stereotypes) with which this algorithm has been trained.

representation bias

The representation bias occurs when the data with which the model has been trained underrepresents a part of the population and as a consequence, it is not capable of offering a broad and diverse image of society.

An example of the above is ImageNet, a huge database containing millions of tagged images that is commonly used for AI algorithms to be able to recognize objects. Well, as they point out from MIT, approximately half of these images have been taken in the United States and other Western countries and only between 1% and 2% of the images represent cultures such as Asia, different Africans and others. .

As a consequence, if we were to ask ImageNet for a wedding dress, for example, it would perfectly recognize that of a “Western bride” (white), but it would have enormous difficulty recognizing the ceremonial costumes used at weddings in South Korea or Nigeria.

Measurement bias

Measurement bias occurs when choosing, collecting, or calculating the features and labels to be used in a forward-forecasting process. Normally, it occurs when you want to analyze a characteristic or idea that is not directly observable.

For example, a person’s “creditworthiness” is an abstract concept that is often operationalized with a substitute that may be measurable, such as a credit score. In this case, we can find a measurement bias when the algorithm includes indirect indicators that do not adequately reflect the different groups that are analyzed.

In it paper The use of the controversial COMPAS algorithm by the US judicial system is considered a paradigmatic case of this bias. Supposedly, this algorithm predicts the probability that a defendant will commit a crime again and its data is used by judges and justice officials to, for example, decide if an arrested person should remain in pretrial detention (and for how long) or if later, he You can grant probation. However, and taking into account that in the United States the prison population is mostly Afro-American and Latino, the algorithm estimates not so much the possibility of a future crime, but of a future arrest, introducing higher risk variables associated with race.

Aggregation bias

Aggregation bias arises when a single model is used for data where there are underlying groups or types of data that should be considered differently. Aggregation bias is based on the assumption that the correspondence between entries and labels is consistent across all subsets of data when in reality, and often, it is not.

This type of bias leads to a model that ends up not being optimal for any of the groups it claims to represent or, in many cases, that ends up representing only the dominant group. This bias often occurs when data from social networks as a whole (for example, a Twitter hashtag) is taken without taking into account that they are spaces in which very different cultures and socio-demographic spaces “coexist”.

Assessment bias

This bias usually develops when the reference data used for a specific objective does not adequately represent the target population that should be used for the intended objective.

One of the most paradigmatic cases of evaluation bias is found in different facial recognition algorithms. As the MIT researchers point out in this case, some of these algorithms that clearly have a commercial purpose have no problem identifying white males, but they have much more problems when it comes to performing a analysis on images that represent dark-skinned women, confusing them with other things. Because? Often due to the underrepresentation of these people in the data used for their training.

deployment bias

Deployment or implementation bias occurs when there is a mismatch between the problem that a model is intended to solve and the way in which it is used in practice.

This often occurs when a system is built and evaluated as if it were totally autonomous, while in reality they operate in a complicated socio-technical system moderated by institutional structures and human managers. Thus, the systems produce results that must first be interpreted by the human managers, and then lead to decision-making. Despite their good performance in isolation, they can end up causing harmful consequences due to phenomena such as automation or confirmation bias.

In short, if we trust AI to promote the development of a more just and egalitarian society, we must start at the base, identifying the biases and previous conceptions from which we can start before implementing a training model.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *