Should you Observe ML Metrics or Inferences?
In this episode, we explore two key approaches for monitoring AI models: metrics and inference observation. We break down their trade-offs and provide real-world examples from various industries to illustrate the advantages of each model monitoring strategy for driving responsible AI development.
Read the article by Fiddler AI and explore additional resources for more information on how AI observability can help developers build trust into AI services.
[00:00:00] Welcome back to Safe and Sound AI. You know, we're all about making sure those AI models of yours are behaving themselves. And, uh, today we're diving deep in this, uh, this debate that's been brewing in the world of AI observability.
[00:00:17] Yeah, a really interesting one.
[00:00:18] Absolutely. It's all about two main approaches: observing metrics versus observing inferences.
[00:00:24] Mm, big question is...
[00:00:26] Which one is better for your specific AI models?
[00:00:29] Right, which one's right for you?
[00:00:31] We're gonna break it all down, get into the nitty gritty, look at what the experts are saying.
[00:00:35] Exactly. So, um, I think the thing that's fascinating about this is both approaches, you know, they have the same goal, make sure your models are reliable, they're doing what they're supposed to do, but the way they go about it, that's where things get interesting.
[00:00:48] So let's just jump right into it. What's the core of these methods?
[00:00:50] Okay, so, when we talk about observing metrics, think about it like you're getting like an executive summary.
[00:00:57] Like the TLDR.
[00:00:58] Exactly. Of how healthy your model is. Right. It's all about looking at this, um, pre-aggregated data. So we're talking things like accuracy, drift, performance, all this is being calculated like right next to the model.
[00:01:10] Got it.
[00:01:11] And just that summary. Those results get sent to your platform, so you're not getting all the nitty gritty details, just the overall picture.
[00:01:20] So it's like you're looking at your car's dashboard, getting a quick read on speed, fuel level, temperature, but you're not seeing, like, how the engine's actually working, right?
[00:01:29] Exactly. Perfect analogy.
[00:01:31] What are the upsides to this approach? Why would someone choose to just Get that high level view.
[00:01:36] Well, there are a couple of really big advantages, and I think this is especially true for organizations where security is a top priority.
[00:01:43] Okay.
[00:01:43] So first off, think about it. Because that data is being aggregated before it ever hits the platform, you know, you're drastically reducing the risk of exposing any PII, any sensitive information.
[00:01:55] That's huge, right? Especially in today's world where data privacy is everything.
[00:01:59] Exactly.
[00:02:00] So big win there. What's the second advantage?
[00:02:03] Cost savings. Straight up.
[00:02:05] Makes sense.
[00:02:05] You're transferring less data, you're storing less data, and that often translates to just lower software costs overall.
[00:02:11] I like that. Bonus points.
[00:02:13] It's a good bonus.
[00:02:14] So we've got the dashboard view with metrics. Is observing inferences more like popping the hood, getting your hands dirty?
[00:02:21] Yes, you got it. With inference-based observability, we're sending all the raw stuff, the inputs, the outputs, every prediction the model makes, straight to the platform to be analyzed.
[00:02:32] Okay, so
[00:02:33] lots of data.
[00:02:34] It can seem overwhelming, but that raw data, that's the key.
[00:02:38] That's the treasure trove.
[00:02:39] It is. You unlock true root cause analysis.
[00:02:43] Now we're talking. See, this is where things get really interesting for me, because we all know model performance can drift over time. But figuring out why that's a holy grail, and that's what inferences give you, right?
[00:02:55] Exactly. So let's say your model's accuracy just takes a nosedive. Okay. Metrics. They'll tell you it happened.
[00:03:02] That's it. But with inferences, you can zoom in and figure out why. Was there a certain type of input that messed things up? Or is performance tanking for, like, a certain group of users? These are the questions inference-based observability lets you answer.
[00:03:18] So it's not just being reactive, it's proactive.
[00:03:21] You're getting to know your model at such a deeper level, and you can actually go in and improve it.
[00:03:25] Absolutely.
[00:03:26] I'm guessing this also saves data scientists a ton of time.
[00:03:29] It does. Think about it. It's like trying to debug some super complex software, but you only have a couple error codes.
[00:03:35] Nightmare.
[00:03:36] Total nightmare.
[00:03:36] Right. With inferences, you've got that full picture so you can pinpoint the root cause fast. No more wasting days sifting through data in a million different notebooks.
[00:03:46] And that time, that is precious. That's time they can spend developing new models, refining existing ones, you know, working on all those high value tasks.
[00:03:54] What else can we do with this approach?
[00:03:56] Well, I mean, we've talked about root cause analysis, time savings, but there's a lot more. Like, you can do segmentation monitoring, which means you're analyzing performance across specific groups, cohorts. This is especially helpful if you think certain segments might be harder for your model to handle accurately.
[00:04:14] Can you give us an example of that? Like, in the real world, how's that used?
[00:04:18] Sure. Imagine you've got a model predicting, let's say, loan repayment likelihood. Yeah. Now, some applicants, they might have low credit scores, but they have a lot of money in the bank.
[00:04:29] Right.
[00:04:30] Those are tricky.
[00:04:31] Makes sense.
[00:04:32] Your model might struggle with those, but inference-based observability, it lets you zoom in on how that specific segment is performing, compare it to others.
[00:04:41] You can even set alerts if accuracy drops below a certain level.
[00:04:45] Wow. So it's not just that bird's eye view, it's getting really granular.
[00:04:49] Exactly.
[00:04:49] Imagine that's really valuable for, you know, applications where fairness and accuracy are super critical.
[00:04:54] Yeah.
[00:04:55] Anything else?
[00:04:55] Oh, we're just getting started.
[00:04:56] It also enables techniques like, um, Shapley values, LIME. These are used to get what we call local explanations.
[00:05:05] What does that mean?
[00:05:06] It means you're not just understanding how the model behaves overall, you're understanding why it made a specific prediction for a specific person.
[00:05:14] Hold on. Are we saying we can actually peek inside that black box? box and understand why it approved that loan or recommended a particular product.
[00:05:23] Exactly. You're seeing the inner workings.
[00:05:24] Wow. That's incredible.
[00:05:26] And that level of transparency, it's not just about performance. It's about trust.
[00:05:30] Right.
[00:05:30] It's about trust with users, regulatory compliance.
[00:05:33] Yeah. It's about demystifying AI. It's about helping us grasp that why behind those decisions that impact people's lives.
[00:05:40] Exactly.
[00:05:41] Okay. This is powerful stuff. It really seems like observing inferences, it takes model monitoring to a whole new level. But there's got to be a trade off, right? All this extra data, the analysis, it can't be cheap.
[00:05:52] You're right. Platforms that use inferences, they do tend to be more complex, potentially more expensive up front compared to those metrics based solutions. But you have to think about it. What's the cost of not having this level of insight?
[00:06:04] Yeah, that's a good point.
[00:06:05] Yeah.
[00:06:06] I mean, think about what could happen if a model starts misbehaving and they can notice this.
[00:06:10] You could have financial losses, reputational damage, even legal issues.
[00:06:15] Exactly. All those things.
[00:06:16] Suddenly that upfront investment in a more powerful observability platform seems like a bargain.
[00:06:21] Exactly. And remember, you know, it's not just about Yeah. You know, mitigating risk, it's about maximizing the good stuff, the opportunity, the insights you get from observing inferences, they can lead to better performance, which means higher revenue, more efficiency, a stronger advantage over your competitors.
[00:06:36] It's about moving from just monitoring to actively optimizing your AI.
[00:06:42] Okay, so we've got these two approaches. Metrics and inferences, both have their pros and cons. I mean, it's kind of like choosing between, I don't know, a Swiss Army knife and a, like, specialized surgical tool.
[00:06:53] Yeah, I like that.
[00:06:55] Both super useful, but for different things, right? But how do our listeners figure out which one is right for them?
[00:07:04] That's the million dollar question, isn't it?
[00:07:06] It is. What questions should they be asking themselves?
[00:07:09] Well, I think, first and foremost, you gotta think about the data. How sensitive is the information your models are handling? If you're dealing with, like, really sensitive stuff, financial data, that kind of thing.
[00:07:20] Right.
[00:07:21] Then those privacy advantages of the metrics based approach, that might be super appealing.
[00:07:26] Yeah, I mean, you're aggregating that data before it even gets to the platform. So that's an extra layer of protection.
[00:07:31] Exactly. Especially with all the data privacy regulations these days.
[00:07:34] For sure. But what if you're in an industry where understanding the why behind your model's decisions is mission critical?
[00:07:41] Like, let's say healthcare, where you've got a model that's helping diagnose patients or recommending treatment plans.
[00:07:48] Right, high stakes situations.
[00:07:49] Exactly. Wouldn't you need that deep dive that inferences provide in those cases?
[00:07:54] Absolutely. When the stakes are that high, being able to do root cause analysis, identify those tricky cohorts, understand individual predictions, that's not just a nice to have, it's a must have.
[00:08:05] And that's where inference-based observability really shines.
[00:08:08] So the level of risk associated with your AI application.That's a major factor. Higher the stakes, the more likely you need those granular insights from inferences.
[00:08:19] Exactly.
[00:08:20] But we got to talk about cost too, right? And we mentioned it earlier, but for companies that are, you know, working with tight budgets, wouldn't that lower cost of a metrics based solution be a big advantage?
[00:08:30] It's a good point. Cost is always a consideration, but I think it's about looking at the big picture, weighing those upfront savings against the potential cost of, you know, not having enough visibility into your models down the line.
[00:08:41] That's a really good point.
[00:08:42] Like, imagine your model starts making biased decisions, right? And that leads to unfair outcomes for certain groups of customers.
[00:08:48] Oof. That's not good.
[00:08:50] Right. The damage to your reputation, the potential legal issues, that could end up costing way more than any initial savings you got on your monitoring platform.
[00:09:00] Yeah. It's about thinking long term. It's easy to just look at that initial price tag, but we have to remember that AI observability, it's an investment.
[00:09:08] It's about the long term success and sustainability of all your AI initiatives.
[00:09:13] Exactly. And at the end of the day, the choice really comes down to your organizations, you know, where are you in your AI journey? What are your goals? If you're just getting started with AI, you know, you're focused on simpler use cases, a metrics based approach might be all you need.
[00:09:28] That's a good starting point.
[00:09:29] Yeah. But. As your AI footprint grows, you start tackling those more complex challenges, you're going to need the power and flexibility of inference-based observability.
[00:09:38] So let's say a company decides, okay, inference-based observability, that's a way to go for us. What are some real world examples of how this approach is being used to solve problems and, you know, make things better?
[00:09:50] Oh, there are tons. We've seen companies using it to, like, really improve their fraud detection models.
[00:09:57] Yeah.
[00:09:57] They're able to pinpoint those specific transaction patterns that were slipping through the cracks.
[00:10:02] Interesting.
[00:10:02] Yeah, or, you know, optimizing those recommendation engines by really understanding how individual users interact with different types of content and even, uh, we've seen it used to improve medical diagnosis models, you know, by identifying subtle biases that were leading to inaccurate predictions for certain patient groups. So it's really making a difference across industries.
[00:10:24] It's great to see those real world results, but it can't all be sunshine and roses, right?
[00:10:30] Are there any downsides to observing inferences that we should be aware of?
[00:10:34] Well, I mean, of course, nothing's perfect. One potential downside is that, you know, storing and processing all that raw inference data, it can be a lot, especially for applications with high volume.
[00:10:44] Yeah, lots of data to manage.
[00:10:45] It could mean higher storage costs, you know, put a bigger strain on your platform's computational resources.
[00:10:52] Right. Okay, so what level of transparency and understanding do we need to feel comfortable and confident in AI's decisions? Are we okay with just seeing the what's, or do we need to understand the why?
[00:11:05] You know, that's a really good question. And, uh, I think it really speaks to the bigger picture here.
[00:11:10] It's about trust.
[00:11:11] Absolutely. If we want people to embrace AI, to really see its potential, we need to make it understandable. Not just for the data scientists, but for everyone.
[00:11:19] Yeah. It's about bridging that gap, you know, between those complex inner workings of AI and the people who are actually being impacted by its decisions.
[00:11:28] Exactly. I mean, think about it. If people can see that our models are making decisions in a way that's clear and makes sense, they're going to be much more comfortable using them.
[00:11:35] Right. It's about empowering users, regulators, developers. Everybody should be able to ask those tough questions, demand explanations, and really shape the future of AI in a way that aligns with our values.
[00:11:48] I completely agree. So while this whole debate between observing metrics and observing inferences might seem like it's just a technical choice, it actually has much bigger implications.
[00:11:58] It does. It's about how we build AI, how we deploy it, how we govern it in the years ahead. It's about choosing a philosophy for AI development, one that puts transparency, accountability, and explainability front and center.
[00:12:11] This has been a truly fascinating deep dive into the world of AI observability. Hopefully you're all walking away with some new insights and knowledge to help you make informed decisions about how you're monitoring and managing your own AI models.
[00:12:23] And remember, this is just the beginning.
[00:12:25] As AI keeps evolving, so will the tools and techniques we use to understand it.
[00:12:30] It's an ongoing journey. Stay curious, stay informed, and stay engaged in this important conversation about how we can harness the power of AI for good.
[00:12:38] I couldn't have said it better myself.
[00:12:40] This podcast was brought to you by Fiddler AI.
[00:12:42] For more on monitoring LLMOps performance, see the article in the description.