Fiddler's Image Explainability for Computer Vision Models

Table of content

Watch our demo to see how Fiddler’s advanced image explainability helps you gain deep insights into the predictions and behaviors of computer vision models. 

What you’ll learn in this demo: 

  • Identify anomalies and patterns in image data to understand how to improve predictions
  • Visualize qualitative insights by identifying data patterns and trends on the 3D UMAP
  • Incorporate human-in-the-loop to enhance decision-making in business-critical ML applications
Thumbnail image for video titled 'Fiddler's Image Explainability for Computer Vision Models.'
Video transcript

[00:00:00] I'm Josh. I'm here to tell you a little bit about what Fiddler can provide in terms of our computer vision workflow. So, what we've had over the last six months or so, we've had a lot of developments. And I think one thing that's cool about this stuff is, like, some of the underlying technology for doing things like you'll see with tracking and unstructured drift.

[00:00:19] Um, is the, has the same bones as the large language model stuff. Um, you know, we thought about, you know, how to deal with unstructured data a long time ago, and it's, it's, I think it's served the platform really well. Um, so this will be a combination of, of monitoring, monitoring and explainability for, for images now.

[00:00:34] We have mechanisms now for providing sophisticated image explainability for both image classification models and object detection models. Um, this helps to do things like identify anomalies and patterns. Um, that might be throwing your model off, um, you know, uh, it can help, uh, humans who are working in high stakes environments and making decisions based on the recommendations of models understand why the models made those decisions.

[00:01:00] Um, so, uh, here, let's look at the thing on the right, um, and you see an image with the output of an object detection model overlaid, so you can see some bounding boxes. So, there's a tennis court with a couple of boxes around some, uh, humans and Um, You know, there's a tennis racket and that's all the model output.

[00:01:22] And what you'll notice is that there is, um, uh, there's a set of, uh, colored pixels, uh, sort of overlaid on top of the image. And that's Fiddler's analysis from an explainer that we call SIG, or Segmentation Integrated Gradients. I'll get to that a little bit more further on, um, but that's telling you what parts of the image the model is paying attention to when it makes that particular assessment for that bounding box and that particular confidence level.

[00:01:48] Um, one of the things I like to call out in this example is that, uh, you know, the model doesn't just look at the racket, the contents of the bounding box to figure out, um, what it is. It cares about the rest of the context of the scene. And so So in this case, you see that the, um, you know, the arms of the human, from the model's perspective, are also a really important part of reinforcing the hypothesis that, uh, that thing is indeed a tennis racket.

[00:02:12] We see a similar thing with, like, snowskis and ski poles, where, you know, the model says these are skis, but it's also paying attention to the fact that there are poles in the hands of the, of the human. Um, so, you know, you could imagine a situation where there's some sort of, um, spurious association that the models learn that's not appropriate.

[00:02:30] Um, this could give a human the ability to veto a bad decision based on, uh, you know, human contextual understanding, um, and help a model developer improve the model.

[00:02:40] And then, uh, just like Will showed you for the uh, the natural language, large language model case, uh, we also take advantage of this UMAP. Um, uh, uh, sort of manifold representation to, uh, help users understand, um, you know, where their data might be, uh, you know, that's coming in production time, how that might be different from, from baseline data.

[00:03:03] So you can see on the left side, we have, um, uh, Uh, blue and orange dots, uh, blue represents points that are, uh, um, you know, baseline model inputs, model image inputs. The orange is coming in from production time, and you see there's a little bit of a discrepancy between the blue band that the model may have been trained on and the orange band that's production.

[00:03:24] Um, so that's telling you something about what's changing in the world, and it helps you diagnose what, you know, uh, uh, Whether your model may be operating in a regime where it's not, uh, you know, uh, not an expert. Um, the other thing that this is really helpful for is, as Will showed you for, um, the language use case, you can overlay additional information.

[00:03:44] So if you have, uh, various kinds of human feedback or metadata, you can layer that on top of this. You know, in a world of unstructured data, where it's a little bit hard to draw a box around, you know, a certain, you know, uh, You know, uh, loan applicants with an income above, you know, 50, 000 on the East Coast.

[00:04:02] You could do that with structured data. In the unstructured data world, uh, you need tools that can help you localize failure modes of your model. And that's what these, um, this UMAP, um, manifold representation, uh, provides for you. So it helps you isolate problem modes also.

[00:04:17] Here's a, um, uh, an image classification use case. So instead of bounding boxes, Uh, in this case, we're talking about a model, uh, that is, in this case, looking at sonar images of wrecks on the seafloor. So, there are three different classes, planes, ships, and then just empty seabed. Um, and in this case, if you look at the image on the left, the model was, uh, you know, it thought that that was a ship.

[00:04:41] Um, and so, the explainability capability in Fiddler, using the integrated gradients explainer, Um, is telling you, uh, what the model's looking at. So if you look at the blue pixels on the right side that are overlaid, what you can see is this particular model pays a whole lot of attention to the acoustic shadow that's cast by the, the object on the seabed.

[00:05:01] And if you squint your eyes, you might, Uh, sort of convince yourself that that looks a little bit like a ship in this scenario. Um, so, you know, identifying spurious kind of, uh, inappropriate things that the model may have learned from your dataset, from an oversight in your dataset, uh, or, um, uh, you know, um, just, just, uh, characteristics that are spuriously correlated there.

[00:05:24] Um, you know, that can be really important to understanding failure modes of your model. In this case, what we see is that, uh, this particular model with the sonar has vulnerabilities to uneven seafloor, like if there's a lot of rocks or there are cracks. Um, you know, that's a case where the acoustic shadow is thrown in a particularly funny way.

[00:05:44] Um, and, uh, so, you know, for a model developer, this is a really helpful tool for identifying vulnerabilities. Um, yeah, great. So, yeah, so I mentioned two explainability algorithms. One is integrated gradients for image classification and one is SIG for object detection.

[00:06:02] Awesome. And now here's how we monitor drift, right? Um, and this again is this kind of common blood between, um, the natural language and the computer vision. It's that, uh, we're dealing with unstructured data. Um, we've developed and patented an algorithm for tracking drift in vector distributions. And so You can publish, you can represent model inputs, whether they are images or natural language, as embeddings.

[00:06:27] And you can look for how those embedding vectors, how those distributions shift around in that high dimensional space. And by measuring that, you can determine whether you're, um, uh, there is some change. In, uh, in the world, or wherever your model is being operated, that could be affecting, uh, its, its behavior, its performance.

[00:06:46] Uh, what you'll see in this graph here, there's three different images overlaid, and that's just overlaid for kind of the, the, uh, the slide. But, um, what we've done is taken a set of images and blurred them in sort of three different groups. So as they get successively blurred, um, the model gets, uh, gets less comfortable with them.

[00:07:05] And, uh, we take the embeddings from the model, and what we can see is the model perceives them differently. The way it's distributing these embeddings, uh, is different, and that's trackable here. So, um, you can use this to throw alerts, just like Will showed you, um, when your model is operating in a regime that it's unfamiliar with.

[00:07:23] And, you know, in the majority of cases, this provides a really nice proxy for model performance. In lots of scenarios, we don't have ground truth labels, so it's not straightforward to measure the model performance directly, or that may not be available immediately, um, but tracking these shifts in these vector distributions of the input images, or input language in Will's case, um, often provides a really good proxy, and it's a great signal for model developers to take a look and see if there, there might be a problem there.

[00:07:53] Great, and then this is just jumping back into this, um, using that UMAP representation for root cause analysis. It allows you to identify data patterns that differ from your production, where your production differs from your reference data. It lets you pinpoint sort of high density clusters that may have emerged, new kinds of patterns.

[00:08:19] Uh, um, data points that may not be reflected in your training data. Um, you know, and it can help improve that human in the loop decision making. And again, all of this is accessible via that query interface for producing reports or, uh, you know, lists of drill down deeper, um, you know, if you need to do further investigation outside of the platform.

[00:08:39] Super. Okay, so now we're going to cut to a little demo of that computer vision workflow end to end, which I think is I think it's pretty cool.

[00:08:46] Awesome. Okay, so now, uh, what we're looking at here is a dashboard that's been composed uh, for this computer vision model, um, and, uh, you can see there's a variety of different metrics. How much traffic's going to the model, the model accuracy on the upper left. Vector drift is the one I was just showing you where we're looking at the change in the distribution of embeddings of the input images.

[00:09:11] And what you see here when we zoom in on this plot is there's kind of a stairstep. So for this example, we've injected a, uh, a different distribution of images that have some different characteristics in that second time period. So when that, um, that drift number goes up, that's telling you that something is changing, uh, in your inbound data.

[00:09:30] with respect to your your reference data. And, you know, in order to recause what's going on here, we'll click it into one of these windows, um, and Fiddler is going to pull up the data from that that particular time window. So then we can click that embeddings button, and it takes us to this representation where we can overlay Uh, in this dimensionality reduced representation, um, baseline data and, uh, production data.

[00:09:57] And what you can see is there are certain, um, areas in which the, um, production data is differing from the baseline. So that's the thing that's representing the change in the world, that is, uh, You know, potentially, uh, causing, I don't know if you saw on the upper left, but there was a, a drift in the accuracy also.

[00:10:15] Um, so we can, we can, you know, drill into this. We can root cause, uh, specifically the orange points that are not close to the blue points. Um, and that's going to show us examples from this data that is dissimilar from our reference data. And in this particular case, there's some, uh, distorted images.

[00:10:32] There's some, uh, broken ship hulls and wrecks that are particularly, uh, distorted. Uh, broken into little pieces. So, um, you know, with this, this distributional drift is, um, uh, you know, is real and we're detecting it. And then we can overlay the explainability analysis on top of this. So we can look at, uh, this kind of interactive overlay of XAI on top of this to try to understand some of these things about what's driving the model in these predictions.

[00:11:01] So here, again, you'll see the same thing about the model over indexing non acoustic shadows. Um, and all kind of in one workflow, so I think that's pretty sweet.

[00:11:11] Awesome, so hopefully you guys can all see this other window. Landing page of the Google platform. I want to jump into an object detection example. So this is, uh, and I'll pull up this, uh, we'll just pull some reference images from our, uh, our reference set. So I've just retrieved a bunch of data from, uh, you know, our, our, our reference set. Um, and I'm going to click this lightbulb to get the explain, um, interface. And, you know, in real time, in about 10 seconds right now, we're running this segmented integrated gradients, um, algorithm.

[00:11:50] This is a proprietary algorithm that, um, is, uh, quite performant. So that calculation was just performed on the CPU. Uh, you know, in six seconds, that time usually takes about ten. Uh, but, um, it seems to provide really good results, so we're looking forward to, you know, uh, writing a paper on this, because I think, uh, the alternative algorithms for object detection are very computationally intensive.

[00:12:12] Um, so here you can see an image, this is just a scene from a kitchen, and the model, the user's model, your model, has proposed a bunch of bounding boxes around different objects. So that's been fed into Fiddler, and what Fiddler provides then is the ability to overlay information about why the different object proposal was made.

[00:12:31] So this is a very interactive interface. You can adjust things about the way that, um, you know, that information is overlaid, and you can adjust some sort of contrast settings. Um, you know, what you may need for a particular use case can, can vary a little bit, so there's some, some nice operator settings.

[00:12:45] We can look at what's causing the oven hypothesis, what, what regions of pixels. Um, there's an interesting thing about the knives on the wall. It turns out that the model, uh, when it sees knives, um, it's more confident that it's a knife if it's with other knives. That tells you a little something about, um, the dataset that the model was trained on.

[00:13:02] Um, again, we see this behavior where, you know, yes, the sync matters for the, uh, the model's decision making, um, but there's also context outside of the bounding box that's important for that sync hypothesis, so some of the strongest attribution is coming from the faucet. Um, so, uh, yeah, I just wanted to show you one interactive example.

[00:13:22] I think, uh, you know, the users of this feature have been really, uh, you know, really pleased with, uh, you know, their kind of ability to, to tinker a little bit. Um, and as I said, it's, it's real time and quite fast.