Two weeks ago, I had the opportunity to spend a day with 40 researchers, technologists, regulators, policy-makers, lawyers, and social scientists at PAI’s workshop on Explainability in ML. PAI is a nonprofit that collaborates with over 100 organizations, across academic, governmental, industrial, and nonprofit sectors, to evaluate the social impact of artificial intelligence and to establish best practices around its use.
Define "explainability"
We warmed-up in small groups by reflecting on what "explainability" means. My favorite ideas captured both our humanistic goals and were sufficiently specific to be practical objectives. In the context of machine learning, an "explanation" is:
- An expression that satisfies a human by describing a model’s decision process in human-accessible language and representations
- A succinct demonstration that a model’s behavior is consistent with its purpose
Over the course of the day, we discussed a number of case studies relating to data-collection and modeling; considered approaches for balancing fairness, explainability, and efficiency; and brainstormed about stakeholders and their needs within various social contexts like healthcare and the media.
Challenges in Explainability
Some parts of the conversation reflected consternation around the challenging nature of the problem-space. A few prominent difficulties are:
- Widely varying needs of human stakeholders
- Urgency for AI governance in the face of an uncertain regulatory landscape
- Unavoidable tradeoffs between ML efficiency and individual and group fairness
- Hypothetical black-box models of such complexity so as to be hopelessly beyond interpretability
However, several novel concepts emerged and recurred, consistent with the above definitions, that either offer promising approaches or opportunities for greater honesty around specific pitfalls.
Important ideas in Explainability
Context is Crucial – there’s an enormous range of model purpose and complexity and a similar range of stakeholder need and expertise. ML generated driving directions warrant a different kind of explanation than what’s needed for a ML proposed medical treatment plan. In either of these cases the explanations appropriate for different stakeholders vary. In the medical case, roles include, at the very least, model-developer, physician, patient, medical treatment companies, insurers, and clinic-managers. There’s certainly no single solution for all of these scenarios– but a crucial question is: is there a common toolbox from which appropriate explanations can be constructed for most cases?
Concern around Overtrust – While a primary function of explainability is to establish human trust in algorithms, there was significant concern expressed for unjustified or inappropriate trust. A few examples are:
- Standard or commoditized explainability techniques could lure stakeholders into carelessly assuming "we've got Explainability– we don’t have to be concerned."
- Inappropriately applied techniques can surface incorrect or unsatisfying details to end-users.
- Techniques may tell a partial story, leaving stakeholders with a false sense of understanding and security.
Again, there’s no single solution here, but developing responsible, context appropriate explanations with technical transparency and disclosure of limitations is a starting point.
Human-in-the-Loop – While we started the day discussing explainability techniques in terms of feature and sample attributions and counterfactual/recourse techniques, typically static analyses, ideas of extending these techniques using human guidance around domain-specifics arose:
- A crucial piece of human-to-human explainability is the ability for one human to inquire and dig deeper into the reasoning of the other. This led to a variety of ideas about providing stakeholders a way to interrogate explainability systems where static explainability techniques might not be completely satisfying. For example: summarized feature importance measures that can be drilled into to reveal additional detail, interfaces that permit the human to propose counterfactual scenarios against which to compare a prediction, or systems by which users can explore options for recourse based on human-guided causal-constraints. I’ve subsequently been introduced to this paper which lays out human-explanations beautifully (see Section 4).
- In the context of optimizing model recommendations, where unavoidable tradeoffs between explainability, fairness metrics, and efficiency are poorly constrained for a typical algorithm, there was a great deal of optimism around interactive interfaces that allow domain-experts to balance these objectives by considering the results of a variety of options.
Explainable AI: Going forward
So, while challenges exist, there was significant optimism that these rapidly evolving tools are starting to look like a foundation for explainability. And further, I’m delighted to add that the kinds of tools discussed look a lot like the toolset either available today on the Fiddler platform or are high-priority items on our near-term product roadmap.
Learn more about PAI here.