Season 1 | Episode 14

AI Observability and Security for Agentic Workflows with Karthik Bharathy

‍

In this episode of AI Explained, we are joined by Karthik Bharathy, General Manager, AI Ops & Governance for Amazon SageMaker AI at AWS.

He discusses the critical aspects of AI security and observability for agentic workflows. He covers the evolution of AI Ops, end-to-end observability, human oversight, the current state of AI in enterprises, and the ways agentic AI systems are transforming business operations. He also dives into the challenges of implementing AI security, evaluating AI decisions, and ensuring transparency and compliance.

About the guest

Karthik Bharathy is a leader with over 20 years of experience driving innovation in AI/ML, cloud technologies, and database services. As the General Manager for AI Ops and Governance for SageMaker AI at AWS, Karthik leads the development and implementation of cutting-edge generative AI capabilities in Amazon SageMaker AI.

Transcript

[00:00:06] Krishna Gade: Welcome and thank you everyone for joining us today's AI Explained. Today's topic is AI Security and Observability for Agentic Workflows.

[00:00:17] Krishna Gade: Everyone touts that this year is going to be the year of AI agents. Let's see how we need to address these issues. I am your host today. I'm Krishna Gade. I'm one of the Founders and CEO of Fiddler AI. Again, please put your questions in your Q&A box at any time during the fireside chat. Today's session will also be recorded and sent to all the attendees after the session.

[00:00:44] Krishna Gade: Okay. So with, without further ado, um, I want to welcome Karthik Bharti, um, General Manager for AI Ops and Governance for Amazon SageMaker AI at AWS. Karthik, um, if you could turn on your camera, um,

[00:01:01] Karthik Bharathy: Hey Krishna

[00:01:02] Krishna Gade: to a thank you. Welcome to AI Explain. So it's just a brief bio of Karthik. Karthik is a leader with over 20 years of experience driving innovation in AI and ML.

[00:01:11] Krishna Gade: Um, as a General Manager for AI Ops and Governance for SageMaker AI, Karthik leads the development of cutting edge generative AI capabilities in Amazon SageMaker AI. Karthik, uh, you know, thank you so much for joining us. Uh, maybe let's start you know, with your background. You know, how has your role in AI Ops and Governance at AWS shaped your perspective on you know, monitoring and securing AI workflows in the enterprises.

[00:01:39] Karthik Bharathy: Yeah, that's a great question, Krishna. And, um, if I think about how AI Ops and governance has evolved over the years, um, in fact, a lot of the changes has been in tandem with innovations that we've seen in AI ML over the last few years, you know, starting with traditional ML systems to more recently GenAI and Uh, agentic workflows as as you aptly put, um, and throughout these years from what I've seen, um, there are three things that that stand out really one is, um, security and governance are built into ML workflows from the ground up.

[00:02:24] Karthik Bharathy: It's, it's not an afterthought anymore. Um, what essentially that means is, uh, enterprises are thinking about, uh, robust data governance techniques, access controls, and, and how they, uh, incorporate audit trails from day one. Um, and, and effective security isn't about just, uh, you know, protecting your models.

[00:02:47] Karthik Bharathy: It's about creating a comprehensive system that includes, uh, uh, looking at your, uh, monitoring in an automated manner. Uh, doing version control and, and also having audit trails. The, um, second one I'll call out is, uh, the need for end-to-end observability, um, and this is across both the data and the ML workflows, um, right from, you know, how data is ingested, how you can have lineage starting from all the way to data to ML, um, and all the way to, uh, observability during, um, model deployment to look for drift and so on.

[00:03:27] Karthik Bharathy: Um, and, and finally, I would call out the third thing is, uh, while all these sophisticated tooling is in place, um, you want to have the necessary, um, human element, um, to sort of oversee the process, uh, while it's automated, there are critical junctures where human, uh, oversight is needed, and that helps in the decision making process.

[00:03:51] Krishna Gade: Awesome. So, uh, you know, being at, you know, at the helm of SageMaker, you're probably seeing the current state of AI in the enterprise, you know, it's adoption. Um, you know, how would you describe it? You know, did you shed some light for our audience?

[00:04:05] Karthik Bharathy: Yeah, yeah. Um, I, I think if you look at it again, the last four or five years, right, the, the enterprise landscape is evolving.

[00:04:14] Karthik Bharathy: Uh. pretty rapidly, right? And you can notice, um, several distinct patterns, right? Um, and for what's worth, like, we are in the third year of generative AI was, right? Uh, I think the first year was more around, hey, there's this cool thing, like, what can GenAI do, right? But last year, Based on customer conversations, we saw that, um, customer conversation moving from, "Hey, what is GenAI" to, "Hey, is this right for me?"

[00:04:43] Karthik Bharathy: And how can I adapt this, um, into having a real impact for my business? Um, and this year, um, we are hearing customers want to go big with generative AI. You know, both in terms of going wide and going deep and, you know, deploying these systems at scale. and also leverage the promise of agentic AI that can create tangible business value, right?

[00:05:09] Karthik Bharathy: And as, as we see more of these systems being developed, like AI systems, we, there is a need to integrate these different AI systems so you can orchestrate more complex workflows and while at the same time you want to keep in mind aspects of security and reliability. So that's definitely one trend and the other one I would call out is as you bring in these systems and want to make complex decision making, you want to do so in an automated manner.

[00:05:44] Karthik Bharathy: While keeping in mind, hey, there is transparency and accountability, right? So there is increasing, uh, increasingly customers are looking for ways to have human oversight and they want to scale their AI operations.

[00:05:58] Karthik Bharathy: That's right. Yeah, especially in the. Regulated industry, which we play in, uh, there is some, um, cautious approach behind, you know, you know, with respect to the usage of generative AI or AI agents, like the whole human in the loop.

[00:06:10] Krishna Gade: Um, so I guess that's that begs the question, right? What potential are you seeing for these agentic AI systems? You know, how are they going to transform the business operations? Any real life examples that would be amazing.

[00:06:23] Karthik Bharathy: Yeah, yeah, I think there are quite a few, right? And, um, let me first break it down into sort of the different patterns we see, um, based on the customer conversations in AWS, and then sort of look at, um, examples for each of those, right?

[00:06:39] Karthik Bharathy: Um, so with agentic AI, um, And the business value that provides it fall largely in three different categories. Um, the first one, um, would be using agentic AI to accelerate, um, workplace productivity, right? So think of these as, um, day to day repetitive tasks that employees are doing, and they want to automate this and, and gain the advantage, uh, of using such an agentic system.

[00:07:12] Karthik Bharathy: Right. A good example is NFL media. They use business agents today to help their producers and editors to accelerate their content production. They have a research tool that allows them to gather insights from video footage from a specific place. And, um, what essentially that provides is, uh, when, when you're onboarding a new hire, it reduces the training time, um, by up to 67%.

[00:07:43] Karthik Bharathy: And, um, um, and when employees, uh, their employees ask questions, um, about what, what's going on, that can be surfaced in less than 10 minutes, uh, of what used to take, um, close to 24 hours. So that's one such example. And, uh, more closer to the to the software world. We're all familiar with coding assistance.

[00:08:06] Karthik Bharathy: And many of you may have already used coding assistance in one shape or the other. Um, and and largely, well, they help with, um, building better code or providing. Um, documentation or explaining, um, existing code, it's, it's not just about the code itself, but it's, it's more about automating the entire software development lifecycle, um, including, you know, upgrading software.

[00:08:30] Karthik Bharathy: Um, or, um, modernizing a legacy application for

[00:08:34] Krishna Gade: Migrating to new languages.

[00:08:35] Karthik Bharathy: Absolutely. Absolutely. So case in point, um, within Amazon, we had, um, these agents for transforming our code base from an older version of Java to a newer version. And, uh, there was savings of, you know, a mammoth, like 4, 4,500 developer years worth of effort, right?

[00:08:55] Karthik Bharathy: Roughly translate to, um, you know, 260 million annual CapEx savings. Um, so that's that's the first trend I would think in terms of using it to accelerate workplace productivity. The second one would be in transforming business workflows and uncovering new insights, right? What I mean by that is, uh, as enterprises are adopting agents, they want to streamline their operations and gain insights on their data, right?

[00:09:27] Karthik Bharathy: Um, and the example that comes to mind is, is. Of cognizant. They're using business agents to automate mortgage compliance workflows, and they've seen improvements of more than 50 percent that reduces, um, uh, errors and rework. Um, similarly. Um, Moody's is another great example. They've used a multi agentic system, um, that looks at generating credit risk reports.

[00:09:55] Karthik Bharathy: Um, and again, the benefit is, uh, what used to, uh, take humans about one week to generate a specific report is now cut down to just one hour, right? So that's the magnitude of, um, impact that, that customers are seeing. Finally, the third one I would call out is more in the research area that's sort of, uh, fueling, you know, industry transformation and innovation.

[00:10:20] Karthik Bharathy: Um, a good example there is from Genentech. Um, they've deployed a genetic solution running on AWS, um, and, and they're improving their, uh, drug research process. So what they've done is, uh, Their solution roughly automates, um, you know, huge, about five years worth of research, right, across different therapeutic areas, uh, right, and then it, what it does, it helps them speed up the, uh, drug target identification and also improve their research efficiency, um, ultimately leading to, you know, faster drug development.

[00:10:55] Karthik Bharathy: So, um, net net, we're seeing systems, agentic systems deployed broadly in these three categories.

[00:11:02] Krishna Gade: Absolutely. So it's like workplace productivity, business transformations, and then, you know, new, you know, new, new product innovations. Um, so one thing that you mentioned and business transformations, you know, you mentioned a few examples, especially like generating credit reports and claims processing, right?

[00:11:19] Krishna Gade: These are, you know, high stakes AI use cases. So there is a need for, you know, security, transparency into how, you know, AI is working. You know, what are some of the challenges that we're trying to address. Yeah, you think our organizations are facing when they're implementing agentic workflows for these, you know, for these use cases or in general, other use cases too?

[00:11:37] Karthik Bharathy: Yeah, yeah, I think that's a, that's a great call out, right? So, um, while you're looking at these, uh, systems and I think they're definitely Um, you know, security and visibility challenges that organization need to look into. Um, I'll, I'll call out a few that, that we have seen and, uh, by no means this is comprehensive, but it, it sort of comes down to, um, the stage of the ML workflow, if you will, right?

[00:12:04] Karthik Bharathy: And, uh, If you think about it at the very beginning, when you're trying to use a specific model, um, it's, it's quite possible that the data that's you being used, either to train a model or, you know, fine tune model, use rag, whatever technique that you use, um, to use a data that's, that's not authentic. And this might just compromise the performance of the model.

[00:12:26] Karthik Bharathy: That's definitely. Um, you know, it's concerning at the same time, harder to detect, but until the model is being used and the interactions that are going on. So that's one category. Um, the second would be when, um, you know, the model is being used, and again, it depends on the model. And in the case of a proprietary model, where the model weights are not exposed.

[00:12:50] Karthik Bharathy: Um, it might be an actor trying to attempt to reverse engineer saying, what is the specific, um, uh, weights that were used at what level and so on and so forth. And that essentially, um, um, you know, exposes. The how of the model, if you will, um, and the third one, I would think is when the model is actually being used, um, and there can, you know, actors can attempt to, uh, extract information, which otherwise the model would not emit, uh, it might be sensitive information about the training data, it might be information that may not be what the model is intended for, or the use case that's being deployed.

[00:13:29] Karthik Bharathy: Um, so. Um, net net, I think, um, uh, organizations would need to protect against these model weights. Um, you know, how necessary controls around, um, uh, access, um, you know, ensure that there's data privacy and so on. And more importantly, uh, ensure that there's this observability that's end-to-end. So you can, you are having necessary checks to see how the model is performing.

[00:13:54] Karthik Bharathy: Um, and more often than not, you probably have a sandbox environment where you're testing it, have tooling, you know, there are a few tools like Bedrock Guardrails is an excellent tool. So you sort of incorporate that, you know, Fiddler has an observability tool as well. So these provide sufficient insights into what is going on in the system, being it agentic or an automated workflow, and you sort of take actions based on

[00:14:17] Krishna Gade: Absolutely. So I think you touched upon a few things like, you know, uh, adversarial attacks on models. And now there's this whole, um, field of AI security and model security coming up. Um, you know, I remember like, I think a few, few weeks ago when DeepSeek launched, everyone was producing benchmarks about how accurate it is or how close it is to accuracy when it comes to closed source models.

[00:14:39] Krishna Gade: But it was pretty vulnerable for security attacks. People were able to easily, you know, make it, uh, leak PII content and whatnot in RAG workflows. So how, how, how do you think about, you know, uh, what, what are some of the, uh, you know, best practices that organizations should think about AI security and what, what, what are some of the, you know, how do you think about that versus application level security in general that has been around for a while?

[00:15:04] Karthik Bharathy: Yeah, I think, um, at the end of the day, you need a comprehensive security approach, right? You want to operate at the different levels. Um, you mentioned about model level security, right? So let's start from there. Um, so when you're thinking about the model, um, like I mentioned, you want to protect the model weights, right?

[00:15:27] Karthik Bharathy: Um, and in addition to model weights, you want to protect the, um, access to the data. Um, you know, ensuring that the data is, is, is authentic and so on. Um, and to address these, you would, you would use, like, you would encrypt where the model is being stored, the actual file um, or, uh, to your point on adversarial examples, you would, you would have a test environment where you would exercise the model, monitor its output for, for some of these adversarial examples.

[00:15:57] Karthik Bharathy: And um, at the end of the day, you need continuous monitoring, right? Um, just to look at the input and output patterns, but also look for drifts, drifts in the model, drifts in the data, and have necessarily, uh, necessary, um, uh, alerts, so you can trigger, like, a retraining, for example. So that's at the model level.

[00:16:16] Karthik Bharathy: Um, at, at the application level, I think, um, there, there are the well known security practices, like, you know, you enforce access controls, you have encryption in place, um, You have logging off the interaction patterns and so on. Um, but in addition to that, tooling is often needed. Uh, like I mentioned, the Bedrock example, uh, Bedrock Guardrails example earlier.

[00:16:39] Karthik Bharathy: You, you want to think about how you audit certain topics, um, be it at an input level or the output level. What is relevant to your use case? What should not be emitted? Or if there's certain information that's being emitted like a PII data, how do you redact the information and so on and so forth. So I think net net, the two layers of model security and application level security need to integrate seamlessly.

[00:17:06] Karthik Bharathy: So in many ways, uh, uh, these are complementary than treating it as separate constants.

[00:17:13] Krishna Gade: Awesome. That's great. So I guess, uh, you know, we talked about, uh, a little bit about, uh, some high stakes use cases, right? So when it comes to, uh, you know, transparency of AI decisions for these, you know, for regulators or business stakeholders, how do you think, you know, this is going to change when, you know, agents come about and, um, you know, and organizations employ agentic workflows and what happens to the transparency behind AI?

[00:17:42] Karthik Bharathy: Yeah, I think fundamentally, um, enterprise would benefit from having a governance model. That's, that's more federated, right? Meaning you have standards, policies in place, that sort of dictate how these systems need to be developed across the organization. But at the same time, you want to provide enough flexibility, uh, where each team or business unit can adapt these standards in a way that they can implement for their specific use cases.

[00:18:22] Karthik Bharathy: So that's sort of the trade off. And it's, it's a good one, uh, in the sense that you want to provide the flexibility. Um, of developing these different systems across different units. Um, and there are, again, tools, like, for example, uh, just purely taking the example of SageMaker here, you have SageMaker projects where you can automate, um, the ML workflow, say, how should it be standardized, what pipelines do you need to use, what models, and what quality, and so on.

[00:18:51] Krishna Gade: So the governance is like both a tools problem as well as a people's problem, right? Like, you know, essentially, you know, many companies do not have, The governance structures today, you know, to sort of ensure that you know, AI is tested, monitored and secured and securely operated. You know, what, what are some of the best practices that you have seen in terms of, you know, customers employing AI governance today across, you know, different business units?

[00:19:14] Karthik Bharathy: I think fundamentally, um, at the highest level of abstraction, you have, um, you know, business stakeholders, like the so called risk officers, if you will, who understand the domain of what is being developed, and they would enforce certain standards on what needs to be Um, at her too. And it's important that they work in tandem with the technical team who are well versed with what's being done with the model, right?

[00:19:43] Karthik Bharathy: For example, a model may have a toxic score of like 0.1. But what does that mean is from a use case perspective, whether this model can be approved and deployed an organization is very specific to the domain they're operating. Um, I think successful organizations have a good mix of both where, um, you have the necessary tooling, um, where these, uh, different levels, for example, toxicity is being, uh, uh, monitored for, they're being documented, uh, either through a model card or you have enough properties, maybe in a model registry, for example.

[00:20:22] Karthik Bharathy: And this translates into visibility from the risk officer who can effectively say whether this model or the system is approved for deployment or not. So the two systems working together, I think, definitely is a recipe for success.

[00:20:37] Krishna Gade: Got it. So are there any specific metrics that you recommend that organizations need to track, like whether it's about security or, you know, governance of AI, you know, when they're testing it or, you know, when deploying to production?

[00:20:51] Karthik Bharathy: Yeah, so if you look at the metrics again at the technical level, you have a set of metrics right at the most foundation level. Um, you know, if you have to document it, document what the model is doing as a model card. You would look at, um, the purpose of the model, what data it's trained on, uh, what is the validation rules, what is the quality of the model, and so on.

[00:21:16] Karthik Bharathy: Um, and going a little bit beyond that, um, you may want to document how the model is, um, uh, emitting or predicting a response, right? So, for example, with, with, with, you may want to look at explainability. Approaches like you may look at a SHAP score, for example, or you may look at a LIME score, for example, and these may be documented with the model that those are good metrics to look at.

[00:21:40] Karthik Bharathy: And again, with GenAI, you can look at additional metrics around toxicity, fairness, and so on. You can test these models. You can have periodic evaluations on the level. of these metrics and test it against, um, standardized data sets that are available today. Or you can use custom data sets that are very specific to your, um, use case.

[00:22:03] Karthik Bharathy: Um, and then again, at the business level, you want to interpret these as saying, uh, with a combination of these, uh, objective metrics. How does the subjective standards and policies play in and what does that mean from a risk perspective?

[00:22:19] Krishna Gade: So there is always this tension, uh, within the organizations to adopt AI faster.

[00:22:24] Krishna Gade: Versus doing it right, right? So there's this like, you know, how do you make sure you do it properly so that you don't get into trouble? Like what, how should like, you know, organizations think about like, you know, this balance?

[00:22:38] Karthik Bharathy: Yeah, I think um, that, that's a key one, right? I think there's no one easy answer, if you will, right?

[00:22:43] Karthik Bharathy: And the, and the key to balance, uh, robustness in, in having those security controls with the operational, uh, efficiency lies in having some, having the right guardrails, right? Instead of, uh, creating or looking at the problem as saying, "Hey, here's one way to do it," or one set of, "Hey, this is risky versus non-risky."

[00:23:04] Karthik Bharathy: You're probably looking at, uh, a, uh, a set of, um. A range of values, if you will, right in terms of how to look at risk. Um, a good example would be, uh, let's say you have the model or the system deployed. Um, and you notice that certain changes introduce a higher risk. Um, it's better to trigger additional approval workflows, um, rather than, um, you know, just waiting on it and saying, here's a single way to do it.

[00:23:34] Karthik Bharathy: In contrast, if the same set of changes result in a relatively lower risk, Um, you may want to proceed through standardized approvals instead of, you know, requiring additional approvals. Um, a good example again would be, let's say, there's a drift in the model, right, which is fairly common and you have an observability solution in place.

[00:23:55] Karthik Bharathy: If the drift is, um, not significant from the current state of the model, you may be okay with treating that as an alert and being in the know how of what is being happened and you may just trigger a retraining of the workflow. But on the other hand, if the drift is significant and it exceeds what is the threshold that you've defined, um, you may trigger additional approvals or in, in, in some extreme cases, you might even consider rolling back to the previous version.

[00:24:24] Karthik Bharathy: Uh, so those are, uh, different options that you can consider and, and, and the key is to maintain that configurable. So you can trade off between the, the rigor and the robustness of security control with the efficiency that it

[00:24:36] Krishna Gade: Right. So when it comes to evaluation of AI, right? So in the past for classical machine learning, you could do things like ROC curve, AUC scores, you know, precision recall, and maybe even do like SHAP, SHAP plots and understand the feature importance.

[00:24:50] Krishna Gade: But now with in a generative AI and agentic workflows, evaluating the performance is not straightforward, right? You know, there's no ground truth. So how do you, you know, can you shed some light on like, you know, how, uh, customers are going about this, you know, in, in, in sort of in the, in, in, in sectors that you have been exposed to so far and what are some of the best practices?

[00:25:11] Karthik Bharathy: I think the ones that I've seen are the areas where customers are exploring are, um, evaluating the system end-to-end, right? There's no one unique metric like going back to the example that mentioned earlier. Um, concretely, you can think off having a pipeline, um, that that triggers, um, on either manually or on a periodic way.

[00:25:36] Karthik Bharathy: And that evaluates the model on certain dimensions, right? Um, and, and evaluation is sort of a broad topic. But, um, if, if there are certain aspects of the model that you want, let, let's say, be it fairness. Or, um, uh, robust, um, toxicity, for example, like you can look at evaluating, for example, a model against a toxigen model and seeing, hey, if these inputs were send to the model. What is the output?

[00:26:03] Karthik Bharathy: And once you know the expected output and the actual output, you can actually see the difference. Okay, the model is working on expected lines. Therefore, this is the score that you want to assign for that particular category, right?

[00:26:16] Karthik Bharathy: So developing that comprehensive pipelining workflow and making sure you have observability and each of the places and saying as a system, you do it first at the model level, and then you do it at the system level when there are multiple models interacting with each other. And then saying, given the behavior of the system, what what is the sort of the score that you want to, in some cases, you know, you can be creative and creating a composite score.

[00:26:42] Karthik Bharathy: It purely depends on how how much weight that you assign to each of these individuals score to create this composite score and how you gauge that composite score with respect to use case.

[00:26:54] Krishna Gade: Especially for agentic workflows when in some cases when they are automating the decision process right in the enterprise space, there is a need to measure like whether the decisions are optimal or not.

[00:27:05] Krishna Gade: You know, uh, it's a pretty hard problem. Uh, any, any, any thoughts on that? Like, you know, for example, you know, this. So take, take the example that you mentioned, like the claims processing workflow, right? Like which, which was probably like much more manual in the past. Now it's like, you know, automated. How, how can, how can, you know, customers measure like, you know, if it's working properly and if it's actually working optimally for the business?

[00:27:30] Karthik Bharathy: Yeah, while while you can have, um, you know, objective metrics at the end of the day, it's the business use case, right? And I think, um, it would involve human, um, in, um, human processes or seeing the sort of outputs from the system. Um, and I think that the key is to have the necessary hooks in place, right?

[00:27:56] Karthik Bharathy: For example, while on one end you want to enforce controls on like what data is being accessed or what output is being generated or like what toxicity is the scoring or the evaluation model being done, you want to make sure there's human insight. Um, and every decision, especially in the early phases of when the system is deployed, you want to have these human evaluation on on the system output.

[00:28:21] Karthik Bharathy: Um, more importantly, you also want to have some sort of a pause switch, if you will, to say that if the model deviates from the known patterns, what is the way to quickly have the humans come in and have this pause switch or even a kill switch for that matter to make sure that corrective actions can be taken.

[00:28:42] Krishna Gade: Yeah. And so, so I think basically, you know, this might change from industry to industry, right? So, you know, like for example, you know, what do you want to measure or what do you want to control around AI? Can can be different for different domains, you know, have you seen any, any sort of, uh, insights, like, for example, finance versus healthcare versus like, you know, you know, some other industries, like what do what they make care about in terms of, uh, the measuring and putting security controls.

[00:29:13] Karthik Bharathy: Yeah, it's, it's, um, more than the industry. I think, um, like you call out, it also depends on what set of policies and standards they're hearing to. And then, yes, it goes by also the regions in which they are like the EU AI Act or of ISO 42001, the different regulations that that come in. So there's no one size fits all, but the more effective use cases that I've seen, or the ones that have been deployed successfully, factor in both the subjectiveness of The standards that require you, uh, to adhere to certain things like, hey, where the data are stored, um, and sort of answer the different questions related to the standard along with the objectiveness of the metrics that's being tracked.

[00:30:01] Karthik Bharathy: Um, so the more successful use cases, um, uh, and they do vary across like the healthcare and financial services. Um, and, you know, even in the case of retail, there are examples where a combination of the two is needed.

[00:30:16] Krishna Gade: So what are some of the warning signs that, you know, one can, one can like actually see that an agentic system may have security vulnerabilities or monitoring gaps? Like how can an organization be aware of that?

[00:30:30] Karthik Bharathy: Yeah, I think the first one to look for, um, um, is, is the data quality, right? You want to make sure, um, you know, the, the model, um, data input and the model, what, what it's trained on is, uh, secure and robust. That's, that's, uh, that's important. And once you have those in place, um, I think you want to have an effective testing strategy, um, to ensure that you, you defend against adversarial attacks.

[00:31:03] Karthik Bharathy: Um, so there's even, even if, for example, there's a manipulation in the input, you want to make sure that the security of the model and the system is taken care of. Um, and then the one that we talked about about on model drift and looking for any degradations and performance. So continuously monitoring and looking for those key parameters is important.

[00:31:26] Karthik Bharathy: Um, and from, uh. The system application standpoint, um, you want to ensure that, uh, the API endpoints are. Are, uh, secured, um, again, data transmission is secure and so on and so forth. And you have robust, um, controls for both the authentication and authorization piece. Um, at the end of the day, I would think of it as, uh, as an employee, right?

[00:31:50] Karthik Bharathy: An employee badges in and the employee in many organization badges out of the building as well. And the next time you come in, you badge in again. So you sort of re authenticate and make sure. That, you know, you are aware of, like, this person who is authorized to do this particular job. It's very similar to an agentic system.

[00:32:09] Karthik Bharathy: Um, so you want to ensure that, um, another one that comes to mind is, uh, um, the principle of least privilege, right? You, you provide access only when that's needed, right? And very similar, again, to the employee example that I called out. Um, an employee may not have access to all data, but when it's needed, You sort of ensure that, hey, the person who really needs that information has access to it.

[00:32:33] Karthik Bharathy: So those would be some signs to look for when you're designing situations.

[00:32:37] Krishna Gade: Got it. So there's an audience question here. So any specific frameworks, tools you're using for agentic workflows to evaluate robustness and accuracy? This is probably a good time to talk about our partnership between SageMaker and Fiddler.

[00:32:48] Krishna Gade: You know, can you share your thoughts on that?

[00:32:51] Karthik Bharathy: Yeah, no, absolutely. Like, we're thrilled to be working with Fiddler. And at the outset, you know, partnership is something that's absolutely critical for AWS and SageMaker specifically. Um, as we look at extending the core AI/ML capabilities, um, and to provide specialized solutions for different industry needs.

[00:33:14] Karthik Bharathy: I think, uh, partnering with a company like Fiddler is absolutely paramount. Um, and what the intent is really simple, right? We want to make sure the best of class solutions are available to our customers, right? So with Fiddler, we've combined the power of SageMaker AI, where you can train and deploy your models with Fiddler AI, Fiddler AI, which brings in observability to monitor and improve the ML models.

[00:33:37] Karthik Bharathy: So, so net net customers have a one click way to do observability with SageMaker AI. Um, and this experience is available in SageMaker Unified Studio. It provides a seamless experience and, and I'm pretty excited about how customers can use these two capabilities in a seamless manner.

[00:33:58] Krishna Gade: Absolutely. Yeah, we share the same excitement.

[00:34:00] Krishna Gade: And for those of you who are on AWS SageMaker today on this call, feel free to, you know, use the one click experience and one click integration that we built together with working with AWS with Fiddler for monitoring and evaluation of your AI models. So let's actually, you know, maybe take a few more audience questions here.

[00:34:18] Krishna Gade: There are some questions around different industries. There's a question actually about code migration. We touched, touched upon it early in the call. What are some of the best practices for verifying large code changes or migrating from one language to another? This is, I think, using AI based code migration.

[00:34:35] Karthik Bharathy: Yeah. Um, I think the specifics actually depend on the language itself, right? Depending on whether you're looking at a more modern language or like a traditional language, like COBOL, for example, right? Um, so I think given that While the migration is being assisted, you want to look for, um, patterns of like translation between the two systems.

[00:35:02] Karthik Bharathy: Sometimes the logic may be inherently complex, so there's human in the loop, there's assisted AI that comes into play. Um, you should definitely try out some of the tooling that's already available. Um, with Amazon queue, um, we recently launched a train went the ability to look at the system workflow end-to-end.

[00:35:21] Karthik Bharathy: Um, and there are obviously pieces around security that that's very specific to the organization as well. Um, so in terms of best practices, I believe there's also a, um, detailed documentation. Um, we can find a way to share that with you. Um, on, on what does need to be looked at as, as you do

[00:35:45] Krishna Gade: And so, uh, there's another question on like specific industry. Could you shed some lights on, on business use cases within financial services or FinOps and plus, uh, you know, uh, where AI observability makes sense.

[00:35:57] Karthik Bharathy: Yeah, I think there are quite a few. Um, you know, the top two or three that come to mind are the automated financial reporting that I called out.

[00:36:08] Karthik Bharathy: Um, you know, I mentioned about, uh, uh, Moody's use case about generating credit reports or cognizance use case about mortgage compliance workflows. Um, demand forecasting, uh, is another one, um, that's, it's sort of relevant in the context of, uh, financial services as well. Um, and more generally, I would say incident management that applies across different industries is, is also relevant as you look at more data.

[00:36:36] Karthik Bharathy: And you want to uncover insights from that data.

[00:36:40] Krishna Gade: And then another question from the insurance industry, you know, beyond models, what recommendation of metrics would you have, for instance, for claims processing? Can you explain specific measures you suggested to clients and share what your assessment is on the quality improvements of business outcomes.

[00:36:56] Karthik Bharathy: To be honest, I'm not from the insurance industry, so I'm commenting on that. Um, that said, I'm happy to take that question back and come back if we have the contact information. I don't, I don't represent the insurance industry, so I just don't want to give out the wrong answer.

[00:37:11] Krishna Gade: So Priya, feel free to reach out to Uh, Karthik on further information. Awesome. So I guess, uh, you know, finally, as we sort of get into the the last few minutes of the podcast, right? So, you know, what, what are some of the things like, you know, maybe like, uh, sort of a life cycle workflow when, you know, You know, organizations that thinking about, because life is moving very fast in the last few years, you know, you were talking about ML and all of a sudden there's generative AI.

[00:37:40] Krishna Gade: Now there's AI agents, like, you know, when an organization thinking about it, how do they go about it? You know, you know, implementing these things, what should be the priority? What should they be the best practices?

[00:37:51] Karthik Bharathy: I think the playbook, if you will, right. It's certainly, there are a few common things across these different systems, and I'm sure there will be a lot more coming in the next few years.

[00:38:03] Karthik Bharathy: Um, but fundamentally, I think what has not changed is starting with data, right? Um, I, I can't emphasize this enough that the, the better your data is, you know, the better pretty much your AI model, the genetic system, all of the goodness that's, that's out there. Um, so have a robust data infrastructure, quality data, um, that feeds into your machine learning processes.

[00:38:25] Karthik Bharathy: Um, and if you're starting off with, um, you know, GenAI and agentic system, uh, I would start with one high value use case, uh, prototype it to your business problem, demonstrate the value quickly, um, and taking it to the next level, you want to establish the, um, the necessary MLOps foundations saying, how does monitoring play into the system?

[00:38:51] Karthik Bharathy: What does versioning mean? Um, how can I go from one version to the other? These are fundamental as you think about taking a system from just a POC to a production, these play in and just building on that and very relevant to the topic of today is, uh, looking at the governance frameworks. Um, what does it mean to have a simple.

[00:39:10] Karthik Bharathy: Approval workflow that needs to be set up as you're scaling through the system. And a lot of this, um, also requires that you invest in your own team and train them. So they are aware of the different elements of taking or going live with these different systems in place. Um, just with AWS, there are enough training and certification.

[00:39:30] Karthik Bharathy: I'm sure Fiddler has their training and certification available. So those help you build your internal expertise. And, and finally. Um, plan for scale, right? What worked for you when you started off with a small system may not be applicable when you go for like a 10x or 100x of what you intend to build.

[00:39:49] Karthik Bharathy: But there are, the goodness is there are enough enterprise features at AWS, SageMaker, in Fiddler, that help you scale as you go through this journey. In converse, what you want to avoid is rushing through a system quickly to demonstrate value, not having a good data or like a data quality approach, um, not in engaging a lot of stakeholders.

[00:40:15] Karthik Bharathy: Um, and, and then you, you have very little insight into how you would do maintenance or upgrade or deployment. So the, that, that is a recipe, recipe for failure. So as long as you avoid that so on the fundamentals.

[00:40:28] Krishna Gade: No more colloquially don't do vibe checking, you know, vibe checking and vibe testing of your models, you know, actually, and actually know what you're doing.

[00:40:35] Krishna Gade: That's a great point. So I guess actually it's a very related question. Someone is asking, you know, things are moving really fast, even for us in the technology area, right? Like, you know, what type of problems in two to three years within AI agents that you, that will keep you up at night? You know, what do you foresee?

[00:40:51] Karthik Bharathy: Yeah, so that's, that's, um, there's so much I could predict, but see, that's a question, you know, I ask myself every day, right? Fundamentally, you know, I go back to, um, you know, back when I joined AWS many, many years ago, um, there was this interesting quote by Jeff that still resonates with me. It's something around, "hey, what, what will not change" as opposed to saying, "hey, what will change that part?"

[00:41:15] Karthik Bharathy: The second part, what will change is sort of, you know, each of us can debate like for hours or days together. Um, but what does not change is fundamentally customers asking for better value and what it translates to something that's more performance, something that's robust, something that's secure. Those fundamentals are not going to change or something that's cheaper, right?

[00:41:36] Karthik Bharathy: Uh, uh, like the way Bezos put it, no one's going to come to you and say, hey, give me something that's more expensive or slower to perform, right? So fundamentally looking at the system, um, and seeing what value it adds to your business use case, what does it translate to your customers? I think those would be paramount as you look at these, uh, innovations that are happening in GenAI industries.

[00:41:58] Krishna Gade: Yeah, and there's an innovation happening across like small and big players, right? So there's, you know, there's a question around how, you know, small, you know, there's a lot of, you know, new AI agentic applications that are coming up. And, you know, how do you think like they're playing within the, you know, within the, you know, the big, you know, big players in ecosystem, you know, you know, building also building agentic workflows.

[00:42:19] Krishna Gade: Any thoughts on that? How AWS might be encouraging on the ecosystem side as well?

[00:42:24] Karthik Bharathy: Yeah, absolutely. And I think, um, one is definitely through the partners. We work closely with, um, companies like Fiddler. I think the second dimension to that question is um, AWS providing the choice to the customers, right? So there is not a single model that we say, hey, this is what you need to do.

[00:42:42] Karthik Bharathy: That's something that um, you as a customer can decide, um, right from DeepSeek. to, um, the latest Llama models to our own in house Amazon Nova models. You have all of those available to experiment and try for your use case. I'm sure a lot of it will be applicable even in the world tomorrow, where you have the choice in choosing the best of what's applicable for you.

[00:43:05] Krishna Gade: Awesome. Great. I think, uh, with that, we are coming to the end of the podcast. You know, thank you so much, Karthik for spending time with us today. Um, you know, I think one of the things that I took away is that quote that you mentioned that Jeff said, like, what is not going to change? And I, I believe what is not going to change with AI is going to be, whether it's your simple statistical model or deep learning model or generative AI or AI agents.

[00:43:30] Krishna Gade: You need to test it properly. You need to monitor it properly and you need to make sure it's, you know, it's secure and, and, and it's working for your, for your business. So, and so I think that's, that's not going to change. Um, I think I, and so I think, you know, that's, that's kind of where our partnership with Amazon comes in.

[00:43:46] Krishna Gade: And so, you know, thank you so much for being on the show today and, um, you know, look forward to, you know, further more conversations in the future.

[00:43:54] Karthik Bharathy: Thank you for having me, Krishna. This was great chatting with you.

[00:43:56] Krishna Gade: Awesome. Thank you. Thanks everyone.

‍