AI Forward - Effective Enterprise Compliance: Make a Leap of Trust with FRoG
In this workshop, Bashir Rastegarpanah shows how Fiddler users can use the Fiddler Report Generator (FRoG) to create customizable reports for model risk management (MRM) and periodic model health reviews. FRoG's modular design enables users to generate reports with various formats and contents, including model evaluations, risk assessments, and explanations for AI models.
Key takeaways
- Reports for Model Risk Management and Periodic Reviews: FRoG is vital for sectors such as finance and healthcare where AI risk management is crucial. It provides comprehensive, regular reports on AI models' performance and health, supporting effective Model Risk Management strategies.
- Flexible and Customizable Reporting Tool: The tool is a versatile Python library that allows for fully customizable reports. Its design enables users to tailor report components to their specific needs, with support for various output formats like PDF and DOCX, catering to diverse data science and machine learning team requirements
- User-Friendly with Modular Architecture: FRoG's modular architecture promotes ease of use and adaptability. It not only offers ready-to-use templates for common reporting needs but also allows for quick development of new modules to meet unique requirements, making it an adaptable tool for a range of applications
Speaker: Bashir Rastegarpanah - Data Scientist, Fiddler AI
[00:00:00]
[00:00:04] Karen He: Okay, so let's get started. Next up, we have, uh, Bashir Rastegarpanah, our resident data scientist who will walk us through effective enterprise compliance. Make a Leap of Trust with FRoG.
[00:00:18] If you have any questions or comments throughout the workshop, please put them on the chat or Q& A box and we'll address them. Over to you, Bashir.
[00:00:35] Bashir Rastegarpanah: Hi, everyone. Uh, thank you very much, Karen. Uh, I'm super excited to, uh, for like, uh, talking about the Fiddler Report Generator in this part of the webinar. Let me... Share my screen.
[00:00:53] All right. Great.
[00:00:59] Awesome. So I am Bashir, uh, a data scientist at Fiddler, and I'm going to talk about a reporting tool that we have created at Fiddler called Fiddler Report Generator or FRoG. Uh, the agenda for this presentation is the following. First, I'm going to start by talking about what are the compliance reports and why they are needed.
[00:01:22] Then I'm going to introduce the Fiddler Report Generator, and I'm going to provide some details about some of the design decisions and the architecture of this library. Then I will show you how you can use the FRoG library and I will finish by presenting some live demos of the actual reports that you can get from some of the projects that are deployed on our demo deployment at Fiddler.
[00:01:53] So why there is a need for AI risk and governance reports or in other words why there Uh, when interactive dashboards are not enough. Uh, in sensitive decision making domains, such as finance and healthcare. Effective model risk management frameworks are needed to minimize potential risks of using AI.
[00:02:14] And therefore, regular static reports on the health and performance of machine learning models at production are an essential part of MRM workflows. In addition to that, data science and ML teams may prefer static reports for their internal model evaluations. And also, static evaluation reports can be used for periodic updates of the system users as well as, uh, as well for, as well and also for marketing purposes.
[00:02:52] So, what is Fiddler Report Generator? Uh, Fiddler Report Generator is a stand alone Python library that enables Fiddler users to create fully customizable reports. Uh, By exploiting a modular design, users can easily customize reports, report components, using a library of analysis modules. FRoG communicates with the Fiddler backend to fetch the required data sketches and metrics.
[00:03:23] And finally, reports can be generated in different output formats, such as PDF and docx.
[00:03:33] Here you can see an overview of, uh, FRoG's architecture. So basically the users start by, uh, calling different analysis modules. You can think of an analysis modules. Uh, uh, you can think of an analysis module as a component or a section of your report. So. Let's say a table that provides a summary of different model metrics or a graph which is like the performance of your model at pre production time.
[00:04:04] So each component of your report, you can think of it as an analysis module. And so these analysis modules are created such that each analysis module returns one or multiple output modules. So, what you get after running a bunch of analysis modules, you get like a sequence or a stack of these output modules.
[00:04:27] And what you can think of this part as an abstraction of the final report that you want to have. Once you have all of these output modules, Uh, in a sequence, uh, we have the, the next stage of the library, which is the GenerateOutput. It's called the GenerateOutput component. And, uh, what it does, it, uh, transforms the, these, uh, abstract output modules to actual, uh, output files.
[00:04:57] So, we have implemented different rendering functions based on each output format so we can transfer, uh, we can generate the actual output from the abstraction of the report that we have in the output modules. Uh, here I'm gonna show you, uh, some examples of the analysis modules that are currently supported in FRoG.
[00:05:23] So, we have, uh, things like project, model, and dataset summaries. Uh, we have modules for, uh, uh, different type of, uh, analyzes that you want to have for your alerts. So alert summary and alert incident details. Uh, we have model evaluation metrics and model performance comparison. So let's say you have multiple models in a project, uh, you can, uh, add, uh, performance, uh, charts and then.
[00:05:56] Uh, for example, here you see an example of, uh, uh, ROC curve with multiple models in a single chart. And things like, uh, uh, confusion metrics and so on. Uh, also we provide performance charts with data segmentation feature. And, uh, all sorts of analyses related to XAI, like Feature Importance Charts and Point Explanations for Model Failures.
[00:06:27] Uh, some other comments about, uh, the FRoG library. So although users can create a report by passing a list of analysis modules, The manual work of specifying each report component might be tedious. Therefore, as a solution, FRoG provides report templates, which include common analysis modules and can be called in a single easy step.
[00:06:54] And in addition to that, FRoG's modular architecture allows rapid prototyping of new analysis modules based on specific customer needs and design partnerships. Now I'm going to show you Examples of how you can work with this library and actual outputs, the output reports that you can get by using the report generator library.
[00:07:19] So the Fiddler report generator is currently online. It's on a public GitHub repo. The link to the GitHub repo is shared in the chat. Uh, to you. And once you, you are here, you can download the library and there are the installation instructions. It's pretty easy. Uh, so I'm gonna show you, uh, how you can use once the, once the library is installed, how you can use it in a notebook.
[00:07:54] Uh, so the first step is to import the library, the report generation library. And then from that, I'm importing the Fiddler report generator class. And the output types, which is a helper class for specifying the type of output that you want to get. The first step is to initiate the report generator. So currently, you can initiate a report generator using two methods.
[00:08:17] The first method is by directly providing The connection and credential information of your Fiddler deployment. So you need to provide the URL, the token, the organization ID, and some additional metadata information like the author name for the report. There is a second method for initiating a report generator, and that's by directly passing an instance of the Fiddler client.
[00:08:50] So Fiddler client. Uh, is an API used, uh, for, by Fiddler users to do things like providing datasets or setting up their model and projects. So if you already have a client, you can just pass that to the Fiddler report generator. But here I'm using the direct method, so I'm sending the information of the demo deployment.
[00:09:11] to generate the report. So I get this FRoG instance, and then from there I'm going to show you some simple examples of generating reports. So as I said previously, we have generated these templates. So the simplest template is called Project Summary, and I'm importing that module here, and that's like the two, I'm creating two reports here.
[00:09:37] So I'm creating a report for a Predictive model that is on the, on the demo deployment. Fiddler's demo deployment, it's called the Lending Project. It's a tabular model. And I'm also, uh, calling another, uh, report. I'm trying to, here I call the report generator to generate another report for the IMDB RNN project.
[00:10:01] So, since this is like an LLM workshop today here, I'm showing you an NLP example as well. And in this example, We have the, the IMDB reviews a bunch of IM db, a set of, uh, IMM DB reviews. And for that there is like a, a deep neural network trained to, to predict the sentiment of each review. So let's see what we get from first, from the landing project.
[00:10:29] So here I'm, uh, passing the list of analog, these modules that I'm gonna see in my report. And as I said, I'm just using the, the template that we have, the project summary to show you the simplest. Use case that you can have. And although it's a template There is still like, uh, options to, uh, customize your report to some extent.
[00:10:50] For example, here I'm specifying the, the time interval for which I wanna get the report. Uh, I'm looking at the past three months and I'm specifying the, the output type, which is APDF uh, file. Uh, once you run this, uh, piece of code, we get like the basic report for the. Landing project. So what you can see here, we see a summary of the, of the, this, the feeder cluster, which is like, we call it a demo cluster.
[00:11:22] Then starting from here, we have a specified the project for which we wanna see a report. We see a summary of the models. There are two models deployed on this project. Uh, log, logistic regression, and one xj, uh, xg booster classifier, and uh, the details of the data sets. Here you see the charts for alert summary, so in Fiddler you have the option to set up alerts on different metrics, and then you can see a summary of them as well as the detailed incidents of the triggered alerts.
[00:12:02] As I said, ROC curves, different performance charts, the XAI related summaries, so we see like the The global feature impact for each feature in this tabular model. And so that's what you get, like the very basic template that you get by running the project summary. And in a minute, I'm going to show you how you can make it more evaluated, so complex by adding more specific modules for your use cases.
[00:12:35] Now let's look at the IMDB project. So for the IMDB project. Uh, I'm, in addition to the very basic template, I'm also adding the failed, uh, I'm adding the, the failed case analysis module. So let's look at what we get for it, for the IMDB project. So everything is basically similar to what I had for the previous project, but the interesting part here is this module called the model failure analysis.
[00:13:08] So here we investigate the top examples for which the. Predicted label is incorrect while the model is confident about its prediction. So, basically, I'm, uh, this shows you the top false positive and top false negative predictions. And, uh, everything here is customizable, so you can choose, like, if you want to see, like, the top five.
[00:13:31] What is the number of examples you want to see? So here, for example, we see the top three examples. And then what is provided here, so as I said, this is a sentiment uh, analysis of the IMDB reports and it is in the, it's, this is like the, the toppest false positive example, meaning that the, the model prediction for this For the sentiment of this review is 0.
[00:13:57] 7, so it's a positive sentiment with high confidence, but the actual label for this review is negative. So if you read this paragraph, that's the actual customer review, that's a negative review. And, uh, what, uh, what we provide here is We look at the tokens that are high impact, uh, in making this prediction, and we see, like, tokens with positive, uh, with positive sentiment are at the top here.
[00:14:33] So we see words like great, fine, They're also highlighted in the text here, so you see that, like, those, uh, positive tokens, uh, basically they have, they have made the model this confusion and model is confused about the true, uh, the actual sentiment of this, uh, This piece of text. So this is like a really powerful method for debugging your models.
[00:15:03] When you see some false positives or false negatives, you can come here, you can generate as many examples as you want, and you can look at the tokens, the highlighted, the tokens with highlighted attributions, and actually look at the actual piece of text that you have.
[00:15:21] Next component that I want to present here is Some other modules in addition to the project summary time template that we can have.
[00:15:33] So here I'm showing the performance analysis module and the segmentation feature. So we can provide time series of metrics over time, and you can segment it by different segments of your data. Uh, in order to use that, uh, we have a, we need to define these analyzes, these, uh, objects of the performance analysis specification.
[00:16:01] So, for example, here for the tabular model, uh, I want to look at the The accuracy of the model, so I'm specifying the metric, the time interval, this is like the interval of the beans that you want to see in the time series, and the feature based on which I want to segment my data. So in this tabular data, there is a feature called home ownership, and I'm going to segment, I'm asking the report generator to segment the charts based on this feature.
[00:16:31] I'm creating the same, uh, similar. performance specification for the F1 score. And once I have these performance, uh, analysis specifications, I again call the report generator function, but now I'm passing these performance analysis, uh, specification objects to the, to this argument that we have. And, uh, I'm again running the same call here, so what we get, it's Similar to what we had previously, but in addition to all the previous sections, I get this performance analysis section here.
[00:17:09] So you see two time series. We, the one for, uh, F1 score and one for accuracy. And then here, what we have, we have, uh, the, the bold black line shows the value of that metric over this time interval, the past two months, uh, and, uh, the, the length of the bins are set to seven days. So we see weekly updates here, and then we see the segmented, uh, performance, uh, charts for.
[00:17:38] Each value, each unique value that we have for the home ownership feature. So the values here are mortgage, own, and rent. So for each, uh, of them you can see the, the model accuracy and the F1 score of this model. Uh, So these are the examples, the simple examples that I wanted to show today, and that's it.
[00:18:04] We are happy to discuss any questions or any comments that you have, and as I said, this tool is Very flexible and we're happy to work with design partnerships to build a specific report, uh, modules for their needs. Uh, thank you very much. I'm going to stop here and see if Karen, if you have, get some questions.
[00:18:30] Karen He: Yes. Um, all right. So great. Thank you so much for sharing. So there's a few questions, uh, that came through the chat and the Q& A, um, section while you were presenting. Um, one of the main questions that they had is, um, can non Fiddler customers get access to FRoG? Can they use FRoG? If they don't have a Fiddler platform installed or licensed?
[00:19:01] Bashir Rastegarpanah: Unfortunately not. So this is a tool that is built for Fiddler. So if you have, if you are a Fiddler user, you can use this as an additional tool to generate reports. Uh, instead of using the graphical user interface of our monitoring platform, you can use it to get static reports. But, as I said, the code is open source.
[00:19:25] Uh, this can be used by like... People want to get ideas and build a similar tool for similar applications. I think the modular design is very helpful, and the type of analysis that we are making here, everything is open source, so you can use it for inspiration.
[00:19:43] Karen He: Okay. And, um, do you have a list of reports, um, in one place where they can peruse through it?
[00:19:54] Um, I mean, I'm assuming we, we can just share the recording, we can review the, the list of reports, right, that you just commented.
[00:20:01] Bashir Rastegarpanah: Yes. And there is, there is also a blog post that summarizes all the, the modules that we have. Uh, you can share the link to the, to the blog post as well. Uh, yes. It's published a co couple of months ago.
[00:20:17] Karen He: Yes, I, I did, uh, share it with the team, uh, with the audience. Uh, the attendees here. On the blog, the next question I have here is, can I use FRoG for custom reports such as pre production model evaluations?
[00:20:34] Bashir Rastegarpanah: Yes, you can do it. So if Any, for any data that is ingested into Fiddler, uh, you can use FRoG, uh, FRoG, uh, as I said, it has, it makes a connection to Fiddler backend, so any data that is ingested to Fiddler, uh, in terms of like a baseline dataset, if it is your pre production data, it's usually in the format of a baseline production, so it can be used to generate reports, and also in the future.
[00:20:59] We have plans to integrate FRoG with the mother Fiddler Auditor. That was the workshop right before me, presented by Amal. So you can use it, you can integrate these two tools and get a report for auditing your LLMs, different metrics, and report robustness as well.
[00:21:20] Karen He: And, uh, what output formats does FRoG currently support?
[00:21:26] Bashir Rastegarpanah: Currently we support PDF and Microsoft DocX. Uh, and, uh, Given the modular design of RAG, so the output, uh, the output components are like an abstraction and then you can easily add rendering functions. We can easily support other formats by just, uh, implementing the necessary rendering functions.
[00:21:50] Karen He: Great. One last question, and that's, uh, personal for me. So how easy is it, how easy is it to generate a custom report using FRoG?
[00:22:00] Bashir Rastegarpanah: It's, it's, uh, super easy, so you can, you just need to provide a list of modules, and, uh, we will soon, uh, publish the, the, like, detailed documentation of using each module, and then, uh, basically you need to think about, very abstractly, what you want to get in a report, just get, like, a sequence of those modules, each module is, like, very customizable, uh, uh, each detail, Of each section is customizable, uh, you can just use the docs and get what you want.
[00:22:33] Karen He: All right. Thank you so much.