⚡ Get All Content for 20% Off ⚡

Machine Learning Needs Bias Training to Overcome Stereotypes Humans have ingrained unconscious biases. So do the algorithms they create.

By Richard Sharp

entrepreneur daily

Opinions expressed by Entrepreneur contributors are their own.

Shutterstock

It's no secret that there is a wide gender gap in the tech industry. According to the Center for the Study of the Workplace, women represent around 20 percent of engineering graduates, but just 11 percent of practicing software engineers. Unconscious bias is one of the primary drivers of this disparity, which has led many of Silicon Valley's leading tech companies to introduce unconscious bias training to their employees. However, it's fair to say that its machine learning algorithms need it more.

What is unconscious bias?

In humans, unconscious biases are ingrained assumptions about particular personal attributes (including race or gender) that can influence decision making without the decision maker being explicitly aware. These biases are universal because they are the result of "mental shortcuts" people make based on social norms and stereotypes - this group is like X, that group does Y.

Numerous studies, such as this one from the UNC Kenan-Flagler Business School, have shown that unconscious biases have a significant influence on important decisions, such as hiring or promotions. In an effort to improve diversity and create a more welcoming work environment, companies are working hard to train their employees about unconscious bias, its implications and how to counteract it.

For example, Google put a majority of its employees through workshops on how to understand and stop unconscious bias, and Facebook developed an internal training course called Managing Unconscious Bias, which it released to the public.

Can an algorithm be biased?

While these companies are taking admirable and necessary measures to actively educate its tech employees about unconscious biases, the systems they are building still seem vulnerable.

Consider personalized online advertising.

Carnegie Mellon University conducted experiments on Google ads, which found that significantly fewer women than men were shown online ads promising to help them get jobs paying more than $200,000. According to CMU, this raised questions about the fairness of targeting ads online, as a gender bias was clear.

Related: How Entrepreneurs Can Spot Subtle Bias

Racial bias is also an issue. Latanya Sweeney, the former chief technologist at the Federal Trade Commission, uncovered racial bias on the basis of Google searches, according to a report in The Nation. Sweeney found that black-identifying names yielded a higher incidence of ads associated with "arrest" than white-identifying names.

In neither of these cases did a programmer sit down, and write an explicitly sexist or racist algorithm. Instead, these biases are the work of machine learning algorithms that learn patterns automatically on the basis of the large data sets they are presented with. Just like humans do. And just like humans, machine learning algorithms are susceptible to developing biases which, if not explicitly checked for and corrected, lead to discriminatory behavior.

How can an algorithm be biased?

There are many potential reasons why machine learning systems can learn discriminatory biases.

One is selection-bias in the training data. If the model is trained on a dataset that is not representative of the population, then it will make poor general inferences. For example, the miscategorization of a black man by Google Photos in 2015 led many to question whether the algorithm's training data had predominately comprised white people.

Hidden variables are another factor. It might seem possible to avoid biased machine learning algorithms by making sure you don't feed in data that could lead to such problems in the first place. If you remove race and gender data from the equation, how can the bias prevail?

Related: Meet RankBrain, The New AI Behind Google Search Results

Earlier this year, a study of Amazon Prime showed that predominantly black zipcode areas were conspicuously denied same-day delivery. Amazon won't reveal the details of how it determines eligibility for same-day delivery, but the company almost certainly does not feed "race" into its models explicitly. The problem is most likely that race turned out to be a hidden variable behind the model, meaning there were other reasons why Amazon's model excluded the zipcodes and race turned out to be highly correlated with those reasons.

Third, machine learning systems can discriminate by perpetuating existing social biases. Biases run rampant in our society. We know that women are heavily under-represented in the board room, and there are significant racial wealth gaps. If you train a machine learning algorithm on real data from the world we live in, it will pick up on these biases. And to make matters worse, such algorithms have the potential to perpetuate or even exacerbate these biases when deployed.

What can be done?

As machine learning expands into sensitive areas - such as credit scoring, hiring and even criminal sentencing -- it is imperative that we are careful and vigilant about keeping the algorithms fair.

Accomplishing this goal requires raising awareness about social biases in machine learning and the serious, negative consequences that it can have. Just as tech employees are educated about the negative implications of their own unconscious biases, so should they be educated about biases in the models they are building.

Related: Overcoming Unconcious Bias Is Key to Building an Inclusive Team

It also requires companies to explicitly test machine learning models for discriminatory biases and publish their results. Useful methods and datasets for performing such tests should be shared and reviewed publicly to make the process easier and more effective.

As an industry we need more research into how machine learning algorithms can be trained to avoid undesirable social biases. This issue is a relatively new phenomenon, and the examples we have seen so far are just the tip of the iceberg. More research needs to be done to better understand the problem and determine what technical solutions can be deployed to minimize the risk of unconscious bias creeping into machine learning systems.

It's time for the risks of social bias to be embedded deeply in data science codes of ethics and education.

Richard Sharp

CTO, Yieldify

Richard Sharp is the CTO of Yieldify. He joined Yieldify from Google where he lead the global product team responsible for banking comparison products. Previous to Google, Richard worked as a Senior Research Scientist for Intel before joining XenSource – a start-up formed from the University of Cambridge Computer Lab that was acquired by Citrix.

He has a PhD from the University of Cambridge and currently also works as Director of Studies for Computer Science and Fellow at the University of Cambridge.

Want to be an Entrepreneur Leadership Network contributor? Apply now to join.

Side Hustle

The Remote Side Hustle a 43-Year-Old Musician Works on for 1 Hour a Day Earns Nearly $3,000 a Month: 'All From the Comfort of Home'

Sam Ziegler wanted to supplement his income as a professional drummer — then his tech skills and desire to help people came together.

Marketing

Ever Wonder Why Certain Websites Rank Higher Than Yours? This SEO Expert Reveals The Secret to Dominating Search Results

It's often the smart use of SEO, now supercharged with AI, particularly in keyword optimization.

Leadership

Former Interrogator Shares 5 Behaviors Liars Exhibit and How to Handle Them

Five deceptive behaviors to look for and how to respond to those behaviors when you encounter them.

Business Ideas

55 Small Business Ideas to Start in 2024

We put together a list of the best, most profitable small business ideas for entrepreneurs to pursue in 2024.

Business News

AI Is Impacting Jobs. Here Are the Gigs Affected the Most, According to an Analysis of 5 Million Upwork Postings

The researcher said in the report that freelance jobs were analyzed first because that market will likely see AI's immediate impact.