AI Observability for LLM Applications and ML Models

Table of content

Learn how to monitor, get alerts, and troubleshoot LLM applications and ML models with Fiddler’s AI Observability platform. Whether you’re working with generative AI and LLMs, or predictive ML models, Fiddler helps you track performance, drift, accuracy, safety, and other LLM/ML metrics, identify issues, and resolve them quickly.

In this product tour, see how to build custom dashboards, visualize model performance, detect prediction drift, and dive into root cause analysis for issues like data integrity problems or feature drift. Plus, discover how to track critical signals like hallucination detection and jailbreak attempts in generative AI. Get alerts directly in Slack, Teams, or PagerDuty, and take action faster. Perfect for model developers, app engineers, trust and safety teams and business stakeholders with rich capabilities that are actionable and enable them to deliver high performance AI, reduce costs, and be responsible with governance.

Thumbnail image for product tour video titled 'Fiddler AI Observability for LLM Applications and ML Models'

Video transcript

[00:00:00] Fiddler is an AI observability platform that lets you track all the AI in your environment, be it LLMs and generative AI like this Fiddler RAG chatbot, or predictive ML models like this churn classifier right here. You can build specific visualizations like charts and dashboards and track alerts on different conditions on these models over time to make sure your team understands exactly how AI is impacting your environment, and what are the signals that you need to track to make sure you can reduce your mean time to detection for model issues?

[00:00:33] You can build out specialized views like this custom dashboard I've created to understand the performance of the model over time. They can help you troubleshoot issues very quickly. Like in my case, I'm seeing a huge drop in revenue specifically for my users in Hawaii.

[00:00:48] I can relate that to the drop in accuracy of the Hawaii model over time, and as you see, around September 30th. I can even go ahead and relate this same signal to something like the prediction drift of the model over time. And if I want open root cause analysis model to access things like feature level drift impacts to understand which features specifically in my models are causing these issues.

[00:01:14] In my case, drift specifically on this bin of this feature that is throwing my model off.

[00:01:20] I can do similar analysis on data integrity issues with missing values, type violations, or range violations. But more importantly, build out key visualizations to help me and my team understand the model performance in a few clicks.

[00:01:35] Our goal is always to help your team identify issues and resolve them as quickly as possible. If you want to start digging into the data and you know that this is important for this model, you can open Query Analytics, which Fiddler will automatically build out a query for your team and run it for you.

[00:01:51] So now you have a sample of the data. on that September 30th window where you know things were going wrong. You can download and share this data with your team to help them better understand how the models are performing. Once you've done this, you can also build out similar visualizations for your models in the generative AI world.

[00:02:10] Here, we have specialized tools like this prompt UMAP to track the model performance in the unstructured data space, but also looking at hallucination signals like the faithful and relevancy of your answers, or looking for safety issues like jailbreak attempts and user feedback on these specific models.

[00:02:30] You can open a deeper analysis of these embedding visualizations by digging into the UMAP and really finding out that needle in the haystack. In a bunch of interactions, where do I see issues like jailbreak happen? Can I identify a jailbreak just by looking at this chart and then extract the entire trace to see exactly what kind of jailbreak attempt happened on my model.

[00:02:53] Looks like somebody was trying to steal social security numbers from our chatbot. I'm glad Fiddler could track that and prevent that for my team.

[00:03:00] Once you have put all of this together, your team can create alerts to help you identify whenever these issues happen, like your model's out of compliance.

[00:03:09] You can click on the alerts page, which will show you all the alerts being fired off for these models. You can receive this alert in your Slack, Teams environment, or via PagerDuty. But specifically clicking into the alert will show you exactly when things go wrong, and for what segment on what KPI. And once you click into a data point where things are going wrong, you can again perform the same root cause analysis we saw on the charts, build out the visualizations to help your team best understand why this model is having issues and exactly where.

[00:03:43] And that helps your team identify issues and address issues with your ML and LLM models very quickly with Fiddler.