Fighting AI Bias: Explainability, Holistic Data Sets and Other Best Practices
Most companies and consumers are familiar with AI bias and how it can manifest. Whether it’s AI-powered facial recognition technology that fails to accurately identify minorities and women, or automated loan review that continually rejects applications from historically less wealthy zip codes, biased AI unfairly harms a wide swath of the population.
There are significant technological advances to address the problem. But more emphasis needs to be placed on how AI bias occurs in the first place to ensure continued progress. For example, consider the following: A human being becomes biased after gathering data about their surroundings over an extended period and then creating assumptions. Humans aren’t born biased, they become biased.
The same goes for AI. Bias is introduced over time as AI models are trained. It’s therefore imperative there’s no bias in training data sets and in training techniques. Furthermore, it’s important to recognize that enforcing hard and fast rules that attempt to prevent bias -- such as not allowing any sensitive data attributes (e.g. race or gender) to be used when training the models – aren’t the answer. AI models will simply resort to creating proxies for blacklisted attributes (e.g. zip code or income level in lieu of race or gender), so a more nuanced approach is required.
3 Best Practices for Preventing Training Bias
Below are three tactical strategies organizations and their data engineering teams should incorporate to ensure their AI models are being trained without bias:
- Flip data attributes. To build truly unbiased models and determine with certainty if a sensitive data attribute is impacting the results, try flipping the attribute in question. By running models with the attribute “female” and then running the same model again with the attribute “male,” for instance, one can identify if the results remain the same, and adjust the model accordingly.
- Leverage holistic data sets. When companies rely on historical data to train their models and determine model accuracy based on historical decisions they’ve made, instances of bias are perpetuated. It’s therefore crucial that any training data sets include data collected beyond a company’s own domain and/or data that hasn’t previously been used. For example, a retail organization located in Seattle shouldn’t solely rely on facial recognition data from the Seattle area. To prevent bias, the retailer needs to expand their data set beyond Seattle and generalize that broader data to train more effective models.
- Incorporate explainability. One of the best ways to prevent bias in AI is by incorporating explainability, i.e. understanding exactly why a model is predicting something or recommending a particular outcome by being able to clearly view each and every factor that went into a model’s decision-making process. This is a technological advancement that’s gained healthy traction in recent years, although it’s not always tied back into the sensitive attribute and proxy issues mentioned earlier. To ensure comprehensive explainability, make sure it’s implemented at the onset (from the data acquisition stage and throughout testing).
Responsibility Ultimately Lies with Data Owners
The most concerning aspect of AI bias is how it impacts human beings -- particularly minorities, women or individuals living in certain zip codes. However it’s worth noting that bias in AI can also be a massive impediment for AI-driven automation. Any company that values its customers without prejudice and wishes to remain competitive in an increasingly automated world must prioritize overcoming this widespread problem. Rather than attempting to detect bias in the final results of models, the onus needs to fall on data scientists and engineers to embed checks and balances throughout the data ingestion, auditing, training and retraining process. In this way, data owners and their organizations can more reliably and scalably confirm fairness at every stage in even the most complex AI models.
Sudhir Jha is Mastercard senior vice president and head of Brighterion, provider of an AI and machine learning platform for real-time mission critical intelligence.