Season 1 | Episode 4

Industry’s Fastest Guardrails Now Native to NVIDIA NeMo

‍

In this episode, we discuss the new integration between Fiddler Guardrails with NVIDIA Nemo Guardrails, pairing the industry's fastest guardrails with your secure environment. We explore the setup process, practical implications, and the role of the Fiddler Trust Service in providing guardrails, monitoring and custom metrics. Plus, we highlight the free trial opportunity to experience Fiddler Guardrails firsthand.

Read the article to learn more, or sign up for the Fiddler Guardrails free trial to test the integration for yourself.

About the guest

Transcript

[00:00:00] Welcome back to Safe and Sound AI.

[00:00:04] You know, last time we really dug into the launch of Fiddler Guardrails and honestly, I was blown away by those sub 100 millisecond response times but today we're going even deeper.

[00:00:14] We've got some fascinating sources on how Fiddler Guardrails is now natively integrated with NVIDIA NeMo Guardrails.

[00:00:21] Right. It's like they took an already impressive tool and just plugged it right into one of the leading frameworks for building with LLMs.

[00:00:28] Exactly. And that's got big implications for anyone actually building and deploying these models.

[00:00:34] So our mission today is to really break down what this integration means, particularly for those of you working with NeMo.

[00:00:39] You know, we'll be looking at how it helps you build those safer, more reliable LLM applications, especially when it comes to those persistent challenges, hallucination, toxicity, even jailbreak attempts.

[00:00:51] So, all right, let's jump right in.

[00:00:53] NVIDIA NeMo Guardrails. For those who may not be familiar, can you give us a quick rundown of what it is and what it does?

[00:00:59] Sure. NeMo Guardrails is, at its core, a scalable platform for managing these AI guardrails. It allows you to implement and manage various guardrails effectively, all in one place. Instead of having different solutions for safety, security, and other constraints, you have a unified framework.

[00:01:16] So it's like a central command center for all your LLM safeguards.

[00:01:19] Precisely.

[00:01:20] So I think what's really cool here is, you know, Fiddler Guardrails isn't just sort of like an add-on or something. It's actually a native part of NVIDIA NeMo Guardrails now.

[00:01:28] And that built in aspect really. I think that's key, isn't it? Because what that means is that incredibly fast response time that Fiddler Guardrails has.

[00:01:37] Which we talked about, you know, the sub 100 millisecond speed that's now operating like directly within NVIDIA's secure environment.

[00:01:44] So it's not going out to some external server or something.

[00:01:47] Nope. Right within their own infrastructure.

[00:01:49] So what does that mean practically?

[00:01:50] No data leaves your deployment. No external calls.

[00:01:54] Okay, so we've talked about all the high level benefits, but let's get practical for a second.

[00:01:59] I know we've got a lot of engineering folks listening who are probably wondering, okay, this all sounds great, but how hard is it to actually set this thing up? And the sources talk about a pretty seamless implementation of Fiddler Guardrails within NeMo guardrails with minimal setup requirements. What does that actually look like in practice?

[00:02:19] The first thing you do is you obtain what's called a Fiddler platform key.

[00:02:22] That's your authentication.

[00:02:23] Then you set that key as an environment variable in your NeMo environment.

[00:02:28] And then finally, you update a file called config dot y ml which is a standard configuration file in NeMo Guardrails with a couple of key pieces of information.

[00:02:38] The Fiddler Guardrails endpoint and the specific thresholds that you want to set for things like moderation.

[00:02:45] Ah. Okay, so like how sensitive you want the system to be.

[00:02:48] Exactly. So how sensitive you want to be to potentially toxic language, for example.

[00:02:53] And what's great is, you know, there's not a lot of complex coding involved here.

[00:02:57] Mostly configuring things. You're pointing it in the right direction.

[00:03:00] And it just slots right in.

[00:03:01] And that significantly reduces the time and effort it takes to get started.

[00:03:05] That's huge, especially if you're trying to get a pilot up and running quickly.

[00:03:08] Exactly.

[00:03:09] Now, one thing I find reassuring is that it sounds like even if there are, you know, temporary API issues, Fiddler Guardrails will continue to moderate inputs and outputs.

[00:03:20] You know, it won't just completely disrupt the service.

[00:03:23] Right, you don't want the whole thing to fall apart if there's a hiccup.

[00:03:26] Exactly. And that you can fine tune those threshold values. You know, you can adjust that moderation sensitivity as needed. It gives you that granular control.

[00:03:34] So, for those of you listening who really want to get into the weeds of implementation, the NVIDIA NeMo Guardrails GitHub repo is your go to resource.

[00:03:42] All the details you could ever want.

[00:03:44] They've got it all there.
[00:03:45] All right, so we've talked about the integration, but now I want to shift gears a little bit and talk about the foundational importance of metrics in all of this.

[00:03:52] Because you can have all the guardrails in the world, but if you're not tracking what's actually happening, are you really in control?

[00:03:59] Without quality metrics, you're essentially flying blind.

[00:04:03] You need that visibility to understand how your LLM is behaving in the real world to see where those potential issues might be cropping up.

[00:04:11] Exactly. Are you seeing a sudden spike in hallucinations? Are there emerging patterns of misuse?

[00:04:20] You need to know this stuff.

[00:04:21] And without the data, you're just guessing.

[00:04:23] Right, you're relying on anecdotes. So this is where the Fiddler Trust Service comes into play. It's kind of the engine behind all of this.

[00:04:30] The brains of the operation.

[00:04:32] And it's highlighted as providing 50 out-of-the-box metrics, and then you can even customize your own.

[00:04:37] Right, it's very comprehensive.

[00:04:39] So you've got this wealth of information.

[00:04:41] How do those metrics actually translate into the protection that the guardrails and those broader monitoring solutions provide?

[00:04:49] Okay, so think of it this way. The Fiddler Trust Service is constantly evaluating the LLM's outputs based on all these different metrics. Is it being factually accurate? Is the language appropriate?

[00:05:00] Okay, so it's looking for all the red flags.

[00:05:02] Exactly. And then that data feeds directly into Fiddler

[00:05:06] Okay.

[00:05:07] So it's informing those real-time decisions about whether to allow a response, flag it for review, or block it entirely.

[00:05:15] So it's not just a static set of rules.

[00:05:17] No, it's dynamic.

[00:05:18] It's responding to what the Trust Service is seeing.

[00:05:20] Exactly.

[00:05:21] And then, beyond that immediate guardrail function, you've also got all that data that's being collected that you can use for ongoing monitoring, right?

[00:05:30] Absolutely. You can track trends over time to see if there any anomalies popping up.

[00:05:35] So it's not just reactive?

[00:05:36] It's proactive, too. You can potentially catch issues before they become major problems.

[00:05:41] That's great. Now, let's talk about how the Fiddler Trust Service actually generates those metrics.

[00:05:46] Sure.

[00:05:47] Because it uses a couple of really interesting approaches.

[00:05:49] On the one hand, you've got those proprietary Fiddler Trust Models, which are highly trained for specific tasks.

[00:05:56] But then you've also got this capability where enterprises can define their own custom metrics using a hosted Llama 3.1 8B model.

[00:06:04] That's pretty powerful.

[00:06:05] That's really cool. So that means if you've got some very specific domain related risks, you can tailor your monitoring to that. You're not limited to just those pre-built metrics.

[00:06:15] Let's say you're working in a legal setting. You might want to track how well the LLM is adhering to specific citation formats or if you're in healthcare, there might be very specific terminology that you need to make sure is being used correctly.
[00:06:30] And what's great is that this custom metric functionality is provided as a fully managed service.

[00:06:35] Right, so you don't have to worry about the infrastructure.

[00:06:37] You don't have to set up your own Llama model, you don't have to manage all of that. It's all handled for you.

[00:06:41] They take care of all the heavy lifting.

[00:06:43] And we're talking about handling a lot of data here. They're talking about hundreds of thousands of daily events.

[00:06:50] That's serious scale.

[00:06:51] It's impressive. And then the other approach they mentioned is this concept of LLM-as-a-judge, where they're using APIs from OpenAI models to score various metrics.

[00:07:02] So you've got these two powerful approaches working together, the efficiency of the specialized filler trust models, and then that broader understanding that you get from those more general purpose LLMs.

[00:07:14] And ultimately, it's all about giving you, the user, the insights you need to make informed decisions about your LLM applications.

[00:07:22] Transparency and control.

[00:07:23] Absolutely.

[00:07:24] That's what it's all about.

[00:07:25] Okay, I want to touch on the partnership itself. Because it sounds like this integration between Fiddler and NVIDIA wasn't just some random thing that happened overnight .

[00:07:34] It was a strategic collaboration.

[00:07:36] They've been working on this since the early days of NeMo Guardrails and NVIDIA Inference Manager.

[00:07:41] So, what's the significance of that kind of close partnership for the end user?

[00:07:48] It's about alignment. It means that Fiddler and NVIDIA are both committed to solving these really tough challenges in deploying generative AI and LLMs.

[00:07:58] They're in it together.

[00:07:59] They're in it together. And that benefits you because you're getting this more cohesive ecosystem, you're getting deeper integration.

[00:08:06] These are just going to work better together.

[00:08:07] Exactly. And you're likely to see more innovation, more features being rolled out as they continue to work closely together.

[00:08:13] Now their initial integration focused on capturing a ton of data, prompts, responses, metadata, even details about how the NeMo guardrails themselves were being executed.

[00:08:26] And all of that was feeding into the Fiddler platform. That must have provided some really valuable early insights.

[00:08:31] Oh, absolutely. It gave users a really granular understanding of what was actually happening within their LLM applications.

[00:08:38] Like under the hood.

[00:08:39] Yeah, where were those guardrails being triggered? What types of comps were causing problems? It helped them pinpoint issues and really refine how they were using the system.

[00:08:48] And then more recently, there's been this integration with NVIDIA Inference Manager, NIM, which is all about scaling those secure LLM deployments. Logging those prompts, routing them to Fiddler for monitoring.

[00:09:01] Making sure that you don't sacrifice security as you scale.

[00:09:04] Because as we've said, security and scalability, they need to go hand in hand.

[00:09:07] Absolutely. You can't have one without the other.

[00:09:10] Now, for those of you listening who are really intrigued by all of this and want to kick the tires a bit, there's great news.

[00:09:17] There's a free trial of Fiddler Guardrails available.

[00:09:20] It's a great opportunity to see it in action.

[00:09:23] So the free trial gives you 14 days of access, 200 API requests per day, so you can really test things out.

[00:09:30] Yeah. Plenty to work with.

[00:09:31] You get real-time moderation against all those key risks that we've been talking about, and they've got tons of documentation to guide you through it.

[00:09:38] They even have a guided walkthrough.

[00:09:40] They make it as easy as possible to get started.

[00:09:42] That's fantastic. So getting started is simple.

[00:09:44] Grab your API key and you're off to the races. So for you, the listener, you know, if you're working with NeMo, think about what this level of speed and integrated security could mean for your LLM applications.

[00:09:54] It could be a game changer.

[00:09:56] It really could.

[00:09:57] This podcast is brought to you by Fiddler AI. For more on NVIDIA NeMo, Fiddler Guardrails, and the new free trial, see the article in the description.