Machine learning (ML) and AI have the potential to drive incredible innovation and efficiency, but stakeholders also have concerns about the impact if ML models don’t work as intended. These fears range from violating AI regulations to damaging their organization’s reputation and negatively affecting human lives. According to a 2021 survey, respondents worry about the following consequences of AI bias:
fear they may lose customer trust.
are concerned their brand reputation may suffer, resulting in media and social media backlash.
worry about increased regulatory scrutiny.
have concerns regarding a loss of employee trust.
are concerned that ML bias will conflict with personal ethics.
fear the possibility of lawsuits.
worry about impacts on profits and shareholder value.
To avoid unintended consequences and catch issues early on, ML teams should establish a model monitoring framework from the start. In this article, we’ll cover what is monitoring in machine learning, why it matters, and how a model monitoring framework can set your ML solutions up for success.
A subset of AI model monitoring, ML model quality monitoring is a set of efforts that have the shared goal of making models more accurate and effective. It’s essential at every step of the process, from the early stages of development and training to deployment.
Proactive model monitoring techniques can reduce downtime, ensure model performance, and improve model effectiveness. Machine learning models are susceptible to data drift, propagating biases, and degrading in performance. And since models are constantly ingesting new data, model monitoring methods have to keep up.
Model monitoring isn’t a set-it-and-forget-it endeavor, but it no longer needs to feel like an impossible task. Models are often opaque and complex, making it difficult to understand the “how” and “why” behind their predictions. With the right monitoring methods and tools in place, ML teams can take a look inside to understand what is causing issues and how to fix them.
One of the challenges that businesses face is that data teams often work in silos. A well-designed model monitoring framework breaks down these silos and enables better communication and collaboration when problems first arise — before they grow into something even harder to deal with.
As one of the leading model monitoring best practices, developing a framework creates a feedback loop that involves all members of machine learning operations (MLOps) teams, such as data scientists, ML engineers, and business operations. Using an AI Observability platform, each stakeholder can access model alerts and insights to help root cause analyze any model issues.
Left to their own devices, machine learning models can make incorrect inferences from patterns and develop biases that may harm your reputation and end users. In other words, models are not immune to developing human prejudices. How does a ML model work, and what makes them susceptible to developing model bias? There are several reasons why models can become biased, including:
For businesses that rely on machine learning models for day-to-day operations as well as innovation, inaccuracies can have disastrous consequences. For example, in the following survey, 36% of businesses reported being negatively impacted by machine learning bias. Out of the businesses that were affected:
lost revenue
lost customers
lost employees
incurred legal fees from lawsuits
lost customer trust
Even enterprise-scale organizations working in the most prominent industries are at risk of suffering from machine learning biases. For example, the U.S. healthcare system, Facebook, and Amazon had to correct their ML models to account for AI fairness:
In 2019, a study found that a healthcare risk-prediction algorithm — which was used to evaluate over 200 million people in the US — demonstrated racial bias. The root of the issue was that the proxy data set contained patterns that reflected disparate care between white and black Americans.
In particular, the algorithm focused on how much patients spent on healthcare in the past to determine their current risk for chronic conditions. Using this spending data, the algorithm determined that since white patients spent more on healthcare, they were more likely to be at risk for chronic illnesses. In reality, black patients spent less on healthcare due to a variety of factors that were unrelated to their actual symptoms — like their income levels and confidence in the healthcare system.
This bias made it more challenging for black patients to receive care for chronic conditions, even though they had a high level of need. This not only harmed patients, but also weakened confidence in the fairness of the healthcare system.
In 2019, Facebook neglected to enforce constitutional requirements that prevent advertisers from directly targeting audiences based on gender, race, religion, and other protected classes. During this period, Facebook’s algorithm learned that advertisers were targeting these protected classes for different products and services, such as real estate. As a result, Facebook’s ad algorithm reflected the bias of advertisers and prioritized showing real estate ads to white audiences over members of minority groups. This limited housing opportunities for groups who have historically had limited chances for owning property.
This learned bias violated the Fair Housing Act, and as a result, the U.S. Department of Housing filed a lawsuit against the social media company.
In 2015, Amazon realized that its new automated job candidate review system had a noticeable gender bias. The issue began in the model’s training. After analyzing the application patterns throughout a 10-year period in Amazon’s history, the model recognized a pattern that most past team members were men.
Acting on this pattern, Amazon’s machine learning model identified and rejected women’s resumes. Graduates of women’s colleges and members of gender-specific extracurricular activities, such as a women's soccer team, were affected by this bias. These errors rejected candidates that could have had valuable experience and portrayed Amazon and the tech industry as a whole in a negative light.
An AI Observability platform makes it possible for each member of your MLOps team to identify and resolve model issues efficiently and at scale. From a unified dashboard, team members can uncover and share insights, and perform root cause analysis to understand how and why models make the predictions they do. Fiddler’s AI Observability platform features best-in-class machine learning model monitoring tools, including:
Having a dedicated AI Observability platform reduces your “time-to” factors: your time to market, your time to value, and your time to resolution.
Model monitoring metrics make it clear whether your model is performing properly or not. The five key metric categories are:
Depending on your specific project, these metric groups will have varying priorities. For example, if you are running a forecasting model, statistical accuracy is imperative.
To monitor your machine learning model, you should develop a framework that includes all relevant stakeholders. A comprehensive model monitoring framework has three key elements: your models, your teams, and an AI Observability platform that gives everyone the access they need to stay informed and connected.