AI Forward - Fiddler AI Observability Platform for MLOps

Table of content

The Fiddler AI Observability platform for MLOps enables data scientists and AI practitioners to:

  • Streamlined End-to-End MLOps: Fiddler AI Observability platform monitors, explains, analyzes, and improves ML models
  • Advanced Model Explainability: Gain deep insights and understand the 'why' behind model predictions, fostering trust and transparency. The expanded capability of image XAI provides better understanding and interpretation of computer vision models
  • Root Cause Analysis with Feature Importance: Fiddler aids in root cause analysis by highlighting feature importance and impact, helping to pinpoint specific areas affecting model performance
  • Enhancing Team Collaboration and Decision-Making: The platform fosters increased collaboration across organizational teams, providing the tools and insights needed to make informed decisions for continuous model improvement.

Speaker: Sabina Cartacio - Staff Product Manager, Fiddler AI

Video transcript

[00:00:00]

[00:00:04] Well, hello everyone. Thank you for coming to our session. Um, before we jump into some really cool demos, I'm going to start by giving an overview of what the Fiddler AI Observability platform offers for your ML machine learning model processes.

[00:00:21] So before I jump into it, let's talk a little bit about all the different players that are keeping tabs on your model's health. So, you know, outside of data scientists and ML engineers, there's business stakeholders, customer success, you know, auditors, regulators, so many players here. And they're all approaching this from different angles and asking different questions in order to not only understand, but also trust the ML systems that they have in place.

[00:00:47] So, you know, the line of business might be asking, is the model meeting my business KPIs? Whereas auditors and regulators are wondering, are the decisions my machine learning model making fair? So there's a lot of perspectives that we need to consider here.

[00:01:00] Now, if you're one of the stakeholders that's Building the machine learning model, um, Fiddler supports you across the three stages in the entire ML lifecycle.

[00:01:09] Now we're going to talk about the three stages. The first phase in the first stage is pre production. And Fiddler really supports you here via model validation from a bias or fairness perspective, as well as a performance perspective. We also enable deep explainability, so visibility into your model's behavior.

[00:01:28] The second phase in this lifecycle is production, and Fiddler helps here by enabling monitoring, real time alerts across data drift, performance, custom defined metrics, and many, many more. We also support root cause analysis and explainability so that you can resolve your issues. And then flexible dashboards and charts so that you can track both, you know, data science metrics as well as business KPIs.

[00:01:51] And if this is a lot, don't worry, we're going to be demoing most, if not all of this really shortly. Now, one of the key steps in this lifecycle is a continuous feedback loop for model retraining. So as your business evolves, as new data comes in, you're going to have to retrain your model. And Fiddler can help you pinpoint problematic predictions, and potentially you can leverage those to choose to retrain your ML model.

[00:02:17] I want to quickly note that Fiddler does support LLMs, so large language models, and generative AI models across the same ML lifecycle. I'm not going to go deep into this as Barun is going to do a deep dive in the second half of this session on LLMs. But what I want to stress is that this is a unified platform for predictive and generative AI models.

[00:02:39] Uh, one, you know, recent announcement that Fiddler had that I want to share with you all before we continue is, uh, our image explainability or explainability for computer vision. So classification and object detection models. So we can actually do this by showcasing a quick, uh, computer vision or object detection use case.

[00:02:59] So on the right hand side here, you can see an input image with proposals from a user's model overlaid. And I can see a proposal for a sync on the bottom left. Now, when I click onto a specific box, I can see either red or blue overlay, depending on which direction these group of pixels are leading the proposal.

[00:03:21] Now, if we focus on the sink, I can see that the shape of the sink, so that square shape, the four corners, is related to the positive attribution. But I can also see that the faucet really matters, which is interesting because that means that these models are taking cues from context outside of the bounding box to make its decisions.

[00:03:41] So if this piques your interest, you want to learn a little bit more, please reach out to us. This is just a quick preview of what we can do on the image explainability front. So to recap, we support, you know, identifying anomalies and patterns to improve your predictions, make better real time decisions, as well as allowing you to incorporate human in the loop to enhance the decision making.

[00:04:01] This is the last PowerPoint slide on my part before we jump into demos, so bear with me a little longer. So the last thing I want to show are the four pillars that Fiddler AI offers. What this is really telling you is the four main functions we hope to offer you.

[00:04:15] So the first is ML monitoring. So monitoring your models day in and day out to ensure that they're performing as well as when you first train the model. The second is explainability, so understanding, why is my model making the decisions that it made? The third is analytics, so as your model starts to degrade, you're going to want to do deep root cause analysis, pinpoint problematic cohorts, and make sure that you walk away with actionable takeaways.

[00:04:40] The last, but definitely not the least, is fairness, right? So understanding, is my model biased, and what can I do to mitigate that bias?

[00:04:48] So with that, like I promised, that was our last slide, we're going to jump over to our demo.

[00:04:53] So, um, before I actually start the demo, I'm going to go ahead and put on my customer hat, and I'm going to pretend that I'm a data scientist at NewAge, and this is going to complement Barun's half of his demo, but essentially NewAge is our made up banking company.

[00:05:08] So, as a data scientist at NewAge, I am on the churn analytics team. So, just to paint a little bit of context. So, as a data scientist, I log into the Fiddler deployment, and I can see a model monitoring summary. So I can see, uh, you know, a little donut overview of my performance, data drift, data integrity alerts, as well as an overview of every single model that I have registered into the Fiddler platform.

[00:05:31] Again, being part of the churn analytics team, I'm really interested in this bank churn project and churn classifier model, and I can see a number of alerts that pique my interest. So I'm going to go ahead and investigate these triggered alerts by clicking on this little bell icon. So where this takes me is to a list of triggered alerts.

[00:05:48] So I can see that there's a number of data integrity, performance, and data drift alerts that have been triggered for this model. Now, I'm particularly interested in this output drift alert. So in order to further investigate this alert and what's going on, I'm going to go ahead and click on the inspect button.

[00:06:07] Clicking on this inspect button navigates me to a monitoring chart, and we'll visit these. A few times in these demos. So in this monitoring chart, what I'm seeing is my output column plotted over time. And because I came from an alert, I can actually see the warning and critical alert context overlaid on this chart.

[00:06:26] What that means is that it's very easy for me to see which points are triggered. So I can see that there's a number of warning, The yellow and critical alerts triggered for this time period. Now before I try to start, you know, analyzing what went wrong and digging into these alerts, let me try to really understand what this chart is communicating to me.

[00:06:47] So in order to do this, I'm going to click on a historic time point where the drift is really low and not a concern. And the first thing I noticed is that my Root Cause Analysis tab got enabled. And this Root Cause Analysis offers more detailed information and tables to better understand what's happening with my features at this particular point in time.

[00:07:08] Now, the first thing I'm going to do is sort by prediction drift impact. And what I can see here is in that point where there's no alert, the drift is very low, 47 percent of my prediction drift is attributed to this H column. So I know that's what's happening when the drift is really low and not a concern.

[00:07:26] Now what happens if we compare that to a point where an alert was triggered? So let's go ahead and look at this data point for November 3rd, where a warning alert was triggered. Now I see that my root cause analysis table was recalculated. And specifically, I can tell that 66 percent of this prediction drift that is Unacceptable is happening across this number of products feature.

[00:07:49] And I want to pause really fast and just explain that prediction drift impact is calculated by a combination of how much the feature has drifted as well as how impactful that feature is. So I can go a little further and go ahead and expand this number of products feature and I'm provided with a data distribution chart.

[00:08:07] And what this is telling me is my baseline and production data. So my baseline here could be maybe my training data set and I can see that my training data. So, this is a set of products that featured mostly values of 1 and 2 for this number of products feature. But I can very easily see that production is seeing an increase in the number of 3 and 4 values.

[00:08:24] So, I can visually see this drift happening. Now, a lot of products will stop here, and it's kind of on you to figure out what's next. But we're actually going to go a step further and leverage the Fiddler Analyze experience in order to do this. Or, in order to look a little deeper. Now, what I want to figure out So the question is, is this drift happening across my data widespread, or are there specific problematic cohorts where we're seeing this drift, right?

[00:08:51] So I'm going to pop over to Analyze, and here's where I can do that kind of deep digging. So the first thing I want to talk about is what we're seeing on this page. So the first thing is the slice query, or SQL. So I can go ahead and construct the query here. I've selected all the columns from that churn classifier model for that problematic time point we were looking at, so that November 3rd area.

[00:09:14] And then I'm going to run this to provide a little more context. Below what I'm getting is basically events or data points for this time period. And on the right, we call this the console, is where I can overlay a variety of different analytic charts to learn more. So I could choose to look at feature distribution, feature impact, slice eval, and more.

[00:09:34] Let's go ahead and look at slice evaluation. So what slice evaluation provides is a number of different performance visualizations. So this can be metric cards, confusion matrix, and more charts. Now, I want to note that the accuracy is quite poor for this alerted time period, which makes sense. And the number of false negatives is also quite high.

[00:09:56] So this is a little, you know, model's not doing too great. But what we do know is that number of products is one of the features that is drifting. And I want to know, are there other features that are correlated to this drift in this feature? So I can use my feature correlation chart. And I can definitely choose number of products because I know that that's drifting.

[00:10:14] And I want to know across other features, is this drift correlated? So maybe I'll pick something like geography. And I'm saying something really interesting here and we'll walk through it together. But I'm seeing, you know, in our RCA, in our data distribution, we saw that there was an increase of 3 and 4 values for number of products.

[00:10:32] And I can see that 3 and 4 value is predominantly happening in this whole wide geography. So, Let's dig into it a little bit more. So we have a really flexible SQL query on the left. Let's go ahead and check out where the geography is equal to Hawaii. Let's see what's happening. So we're going to rerun this query.

[00:10:52] And this isn't going to change too much, but what we want to see is the slice evaluation, the performance. So sure enough, we can see that the performance has taken quite a dip. And the number of false negatives has increased significantly. Now, again, this is telling me, hey, definitely Hawaii is a problematic, uh, cohort, right?

[00:11:10] Problematic, uh, slice of data. But let's see what happens if I were to not include Hawaii. So if I don't include Hawaii in this slice of data, I can see that my accuracy is much, much better. The number of false negatives has decreased. Immensely, but I want to dig into this a little further. I don't even want to end here.

[00:11:27] We've already identified, you know, a problematic cohort. We can definitely take this back to the team and run with it, but I want to go a little further. So I'm going to go back and look at the geography, go to Hawaii one more time. And what I'm trying to do is dig into these false negative samples. So the way that I'm going to do this is by saying I want to know where churn actually did happen, so where churn was yes, but the model, so the probability churn, was low, meaning the model did not think that this user was going to churn.

[00:11:56] So again, I just went ahead and clicked into those false negatives. I see the same 60 samples. And to explain in a second, but in order to showcase this a little bit more, I'm actually going to order by Probability turn ascending. And this doesn't impact the chart on the right, but it does impact the data table below.

[00:12:14] What this has done is basically bubbled up the places where the model was the most wrong, right? So we can see the model is almost certain this customer is not going to turn, but when we pass in our ground truth labels, we can see that this customer did in fact turn. Right, so our model got it really, really wrong.

[00:12:33] And what we can do straight from here is we can choose to do point level explanations. So I can click on this little light bulb, and Fiddler is generating this for me. It's trying to explain what features are impacting my prediction. Now, I can choose a number of different out of the box explainers, or I could even bring my own.

[00:12:50] So you can bring your own explainer and get access to it straight from this drop down. What I'm going to do is I'm going to view this tornado plot. Now, this tornado plot, again, is just trying to tell me which features are impacting this prediction, this prediction, right, and driving it to go so low. And I can see pretty obviously what we already know, which is the number of products, right, seems to be driving this prediction quite a bit.

[00:13:11] Now, I can actually do what if analysis here. So I can go here and say, what if I were to change for this time period? What if I were to change this 3 to 2? So number of products is equal to 2. And I'm going to rerun this calculation. And not only am I going to rerun it, I actually want to compare it to the model's initial decisions.

[00:13:30] And I can see visually, as well as with the prediction, that the model went from being pretty darn sure That this customer was not going to churn, to pretty sure they are going to churn by a very small change that I made using what if analysis. So this is, again, surfacing a lot of feature sensitivity to this number of products feature.

[00:13:50] Maybe too much sensitivity. And so, I'm going to go ahead and close out of here. So what we've talked about here is, hey, I identified an alert, I did a little bit of root cause analysis, identified a problematic cohort, performed explainability, and this is great when I get a triggered alert, when I'm going down that monitoring path.

[00:14:09] But I also want to keep tabs on my model over a longer period of time. I want to communicate to business stakeholders and more. So let's talk a little bit about how you would do that using Fiddler. So I'm going to pop over to this other tab, and we're going to do this by showcasing both the charts, flexible charts, as well as the flexible dashboards.

[00:14:28] So coming into here, I'm going to showcase a chart that I created before. Um, if I can find it, I will search for it, give me one second. If not, we will create it. Yeah, here it is. So, we have a revenue loss. It's demo magic, right? Um, we have a revenue loss chart. Um, so here I'll explain a little bit about what's going on.

[00:14:50] So these flexible charts allow us to add multiple metrics to a single view. So if you want to visualize performance and traffic and average value all at the same time, You can do that in a single chart. But what I chose to use this for is a little bit more of creating a business KPI. So I defined separately two custom metrics, one that does false positive count, as you can see here, and the second that does revenue lost from false positives.

[00:15:16] Now, revenue lost is, uh, something I defined as a dollar amount attributed to every false positive. So I said that the business would lose 100 for every false positive made. And, and I also put false positives so that you could see that trend. Now I added these two custom metrics as queries into the chart, and I saved this chart.

[00:15:37] And now what I can do is I can go to my flexible dashboards, and I've created some. Dashboards. There we go. I created some dashboards here, so I can go to churn model, and I can see that my revenue loss dashboard or chart that I created is in my dashboard. I can share this with business stakeholders, etc.

[00:15:55] But I can also plot some out of the box metrics, such as prediction drift, DI violations, feature drift, accuracy, traffic, and more. I can continue to add. Even in DI violations, again, you can see that multi metric happening. I'm showing missing values, range, type violations, and you can do so much with these charts.

[00:16:12] Now that does conclude my portion of the demo, but I do want to give a quick overview of everything we went over. So we covered viewing a monitoring summary, we talked about digging into an alert, as well as doing root cause analysis, explainability, pinpointing problematic cohorts, and talking about these dashboards and reports that you can share with different stakeholders.