“On artificial intelligence, trust is a must, not a nice to have. With these landmark rules, the EU is spearheading the development of new global norms to make sure A.I. can be trusted.”
- Margrethe Vestager, European Commission
Fiddler was formed with the mission to bring trust to AI. Given the frequency of news stories covering cases of algorithmic bias, the need for trustworthy AI has never been more important. The EU has been leading technology regulation globally since the historic introduction of GDPR that paved the way for privacy regulation. With AI’s adoption reaching critical mass and its risks becoming apparent, the EU has proposed what will be the first comprehensive regulation to govern AI. It provides insights into how the EU is looking to regulate the use and development of AI to ensure the trust of EU citizens. This post summarizes the key points in the regulation and focuses on how teams should think about its applicability to their machine learning (ML) practices.
(A full copy of the proposal can be found here.)
With the introduction of this “GDPR of AI” the EU is looking to encourage and enforce the development of human-centric, inclusive, and trustworthy AI. The policy recommends using a risk-based approach to classify AI applications:
- Unacceptable risk applications like behavior manipulation are banned
- High risk applications like autonomous driving and credit scoring will have new oversight
- Limited risk applications like chatbots have transparency obligations
- Minimal risk applications like spam filters, which form a majority of AI applications, have no proposed interventions
Let’s focus on the proposed guidelines for high risk applications. The regulation classifies high risk AI as one that could endanger people (specifically EU citizens) and their opportunities, property, society, essential services, or fundamental rights. Applications that fall under this classification include recruitment, creditworthiness, self-driving cars, and remote surgery, among others.
For these systems, the regulation stipulates requirements around high-quality data sets, system documentation, transparency, and human oversight, along with operational visibility into robustness, accuracy, and security. The EU also encourages the same stipulations for lower risk AI systems.
Let’s dig deeper into three critical topics within the guidelines for high risk AI systems.
1) Transparency
“Users should be able to understand and control how the AI system outputs are produced.”
With the transparency mandate, the EU is looking to address AI’s “black box” problem. In other words, most models are not currently transparent, due to two traits of ML:
- Unlike other algorithmic and statistical models that are designed by humans, ML models are trained on data automatically by algorithms.
- As a result of this automated generation, ML models can absorb complex nonlinear interactions from the data that humans cannot otherwise discern.
This complexity obscures how a model converts input to output thereby causing a trust and transparency problem. Model complexity gets worse for modern deep learning models, making them even more difficult to explain and reason about.
In addition, an ML model can also absorb, add or amplify bias from the data it was trained on. Without a good understanding of model behavior, practitioners cannot ensure the model is being fair, especially in high-risk applications that impact people’s opportunities.
2) Monitoring
“High risk AI systems should perform consistently throughout their lifecycle and meet a high level of accuracy, robustness, and security. Also in light of the probabilistic nature of certain AI systems’ outputs, the level of accuracy should be appropriate to the system’s intended purpose and the AI system should indicate to users when the declared level of accuracy is not met so that appropriate measures can be taken by the latter.”
ML models are unique software entities, as compared to traditional code, in that they are probabilistic in nature. They are trained for high performance on repeatable tasks using historical examples. As a result, their performance can fluctuate and degrade over time due to changes in the model input after deployment. Depending on the impact of a high risk AI application, a shift in its predictive power could have a significant consequence on the use case, e.g. an ML model for recruiting that was trained on a high percentage of employed candidates will degrade if the real-world data starts to contain a high percentage of unemployed candidates, say in the aftermath of a recession. It can also lead to the model making biased decisions.
Monitoring these systems enables continuous operational visibility to ensure their behavior does not drift from the intention of the model developers and cause unintended consequences.
3) Record-keeping
“High risk AI systems shall be designed and developed with capabilities enabling the automatic recording of events (‘logs’) while the high risk AI system is operating. The logging capabilities shall ensure a level of traceability of the AI system’s functioning throughout its lifecycle that is appropriate to the intended purpose of the system.”
Since the ML models and data behind AI systems are constantly changing, any operational use case will now require model behavior to be continuously recorded. This necessitates logging all the model inferences to allow them to be replayed and explained at a future time, allowing for auditing and remediation.
Impact to MLOps
This regulation is still in the proposal stage, and just like GDPR, it will likely pass through several iterations in its journey towards approval by the EU. Once approved, there is typically a defined period of time before the regulation goes into effect to give companies time to adopt the new rules. As an example, while GDPR was adopted in April 2016, it became enforceable in May 2018.
Enterprise teams accountable for ML Operations or MLOps will therefore need to prepare for process and tooling updates as they incorporate these guidelines into their development. While the EU provides oversight for high risk AI applications, they encourage the same guidelines for lower risk applications as well. This could simplify the ML development process for teams in that all ML models can internally follow the same guidelines.
Although the enforcement details will become clearer once the law passes, we can infer what changes are needed in the areas of model understanding, monitoring, and audit from best practices today. The good news is that adopting these guidelines will not only help in building AI responsibly but also keep ML models operating at high performance.
To ensure compliance with the model understanding guidelines of the proposal, teams should adopt tooling that provides insights into model behavior to all the stakeholders of AI, not just the model developers. These tools need to work for internal collaborators with different levels of technical understanding and also allow for model explanations to be surfaced correctly to the end user.
- Explanations are a critical tool in achieving trust with AI. Context is key in understanding and explaining models, and these explanations need to adapt to work across model development teams, internal business stakeholders, regulators and auditors, and end users. For example, while an explanation given to a technical model developer would be a lot more detailed than one given to a compliance officer, an explanation given to the end user would be simple and actionable. In the case of loan denial, the applicant should be able to understand how the decision was made and view suggestions on actions to take for increasing their chances of loan approval.
- Model developers need to additionally assess whether the model will behave correctly when confronted with real-word data. This requires deeper model analysis tools to probe the model for complex interactions to unlock any risk.
- Bias detection needs separate process and tooling to provide visibility into model discrimination on protected attributes which can typically be domain dependent.
Explainable AI, a recent research advancement, is the technology that unlocks the AI black box so humans can understand what’s going on inside AI models to ensure AI-driven decisions are transparent, accountable, and trustworthy. This explainability powers a deep understanding of model behavior. Enterprises need to have Explainable AI solutions in place to allow their teams to debug, and provide transparency around a wide range of models.
To address the visibility, degradation, and bias challenges for deployed models, teams need to ensure they have monitoring in place for ML models. The monitoring systems allow the deployment teams to assess that model behavior is not drifting and causing unintended consequences. These systems typically provide alerting options to react to immediate operational issues. Since ML is an iterative process that creates several improved versions of the model, monitoring systems should provide comparative capabilities so developers can swap models with full behavioral visibility.
For the record keeping aspects, teams need to ensure they record all or at least some sample the predictions of the model so they can replay it for troubleshooting later.
Broader impact
The proposed regulation is applicable to all providers deploying AI systems in the EU, whether they are based in the EU or a third party country. As teams scale their ML development, their processes will need to provide a robust collection of validation and monitoring tools to better equip model developers and IT for operational visibility.
It is clear that MLOps teams need to explore ways to bolster their ML development with updated processes and tools to bring in transparency across model understanding, robustness, and fairness so they are better prepared for upcoming guidelines. If you would like to get started quickly, read more about our Model Performance Management framework that explains how to put structured observability into your MLOps.