Software Huddle - Making Data Agent Ready with Andre Elizondo

Starting point is 00:00:00 Hello, everyone. Welcome to Software Huddle. I'm Sean Falkner, and today we are talking with Andre Elizondah, the director of innovation at Mesmo, about their open source agent harness for SREs called ORA. Mesmo got their start handling observability data at scale, you know, logs, traces metrics, you know, the user stuff. And ORA is their answer to a growing problem. As system complexity outpaces human's ability to make sense of all that data, how do you actually make it actionable for AI agents? We get into their approach to context engineering, essentially making data agent ready before it hits the model, why they built their own orchestrator and rust,

Starting point is 00:00:37 how they handle memory and self-correction in agent loops, their take on MCP and where it fits versus skills in code sandboxing and how the SRE role is evolving as agents become trusted teammates. With that, let's get into it. Andre, welcome to Software Huddle. Thanks for having me. Yeah, thanks so much for being here. Just to start, could you give a little bit of background on Mesmo,

Starting point is 00:01:02 in particular that ORA open source project? Yeah, absolutely. So Mesmo got our start. We got our start handling observability data at scale. So logs, ensuring that you can ingest logs at scale, process them, transform them, store them when you need them or store them somewhere else when you don't. And really what we've been focused on prior to ORA is you know, how do we get you to the point where we could serve you those logs in the cheapest possible way in the most meaningful way?

Starting point is 00:01:35 And what we found was that, you know, at some point when you when you start to hit this data at scale, you're not really running into the problem of that data existing or having that data or ensuring that, you know, that data is stored somewhere. It was really around how do you make that data actionable in some way or form through investigation, through understanding, And so that's really where we came with ORA and open sourced it to the public so that we can move the industry forward and really focus on, hey, AISRI, AIOps, like, how do we make this an open source standard that others can build on, integrate, and ultimately run in their own environment to bridge the gap between a lot of data and understanding of that data. Yeah. So I know that ORA site sort of framed some of this core problem as like it's not about necessarily the data involvement. volume kind of eluding what you were saying there. It's all about context. I certainly, I think that's the case across not just when we're talking about things like, you know, SRE agents, it's agents in general. Like, how do you attribute meaning to the data without that? It's kind of

Starting point is 00:02:43 hard for them to understand what's going on. But can you maybe, you know, provide a little bit more detail on like what specifically breaks in the way teams do things like incident or for wants if you don't have essentially that context around the data. Yeah. So when you think about how an incident happens, right, it typically happens when you don't expect it, first off. You know, a lot of my career has been spent running systems at scale, at production scale at a number of different companies.

Starting point is 00:03:13 And, you know, the challenge typically is not just, hey, how do I grab a bunch of traces, how to grab a bunch of logs, how do I grab a bunch of metrics? but it's really generating that understanding of the incident, right? Like what's happening at this point in time? What's the blast radius? What's impacted? And then ultimately, once I form that initial context, okay, now how do I focus on a remediation plan?

Starting point is 00:03:35 How do I ensure that I escalate this appropriately? How do I ensure I get the right people involved? And so that's typically what's broken is that part of like, okay, I have all the data, but how do I actually make sense of that data? And when you think about how people have done this in the past, right, up until recently, it's typically been driven by humans. It's typically been driven by systems. It's typically driven by humans and manual processes.

Starting point is 00:04:03 And we've had some minor pushes towards utilizing systems to do that. But what we believe is that process is fundamentally broken, because as the data increases linearly, the complexity of that data and the complexity of our modern services is really increasing, you know, quadratically or logmatic. Like it's really outpacing the ability for SREs to be able to understand that data as that data continues to increase. So what specifically are you doing in order to like make this, you know, solvable?

Starting point is 00:04:38 Like how do you add essentially, you know, meaning to all this data so that it is, you know, findable and usable? Yeah. So there's really two parts. And so this is really why for us. us, ORA being kind of the other part that was missing in the equation is so important because, you know, it's not just about ingesting that data, transforming that data, but it's really about making it agent ready.

Starting point is 00:05:01 So when you think about like how Mesmo really compliments ORA, you know, we have the ability to make sense of that data in a way that we know that agents are going to be the downstream consumer versus a human, which, you know, is primarily where the industry has been focused on creating specific dashboards for a human to look at it and say, okay, I understand what's going wrong. With us, with Mesmo, we're able to actually engineer that data before it actually makes it to the agent. And so that's really our secret sauce is, you know, we can take in this high-scale data, we can engineer it, we can transform it, we can ensure that we make it as agentic or as agent-ready as possible. And so that's really our strong suit there is like doing that before it even hits

Starting point is 00:05:45 the tool, before it even hits the agent. The other half of this is really, you know, when it comes to actually taking those insights, taking what we're serving to the agent, and doing an investigation, typically this involves a number of different steps, right? I want to be able to understand what's happening. I want to be able to execute a number of different tools, understand my environment. And if those tools are optimized for my workflow, my workflow is going to be much more targeted, much more reliable, much more predictable. And then I can orchestrate what I want to happen during that incident investigation process.

Starting point is 00:06:18 And that's really where ORA allows any user, any SRE, any platform engineering team to be able to define those standard operating procedures, run books, things like that within an ORA workflow, and really take that useful data that we're engineering behind the scenes and then put it in practice in a way that we're going to provide you batteries included out of the box. But ultimately, you have full control to modify,

Starting point is 00:06:44 improve, make changes to that workflow in a way that makes sense for your environment. Yeah, so going back to what you were saying around, like, making all this agent ready. And I do think that there's, especially, you know, if you look at the last, you know, four months or something like that, like the complete, the developer experience has changed massively over the last, you know, four months. And it's clear even like the laggards that kind of didn't necessarily believe in AI generated code, I think are starting to come around to see like, okay, well, you know, this isn't just a trend. We're moving to a world where probably most code is written by AI. AI is becoming sort of this like core persona of interaction

Starting point is 00:07:21 with technology. What is it that you need to do that's different to make something sort of agent ready or agent enabled versus you're ready for human consumption? Yeah, great question. You know, when you think about like how folks are exposing these tools

Starting point is 00:07:41 on top of their platforms today, MCP has been an incredible accelerant to making these tools standardized across the industry. But there's a pretty big challenge there in that, like, you know, a lot of these tools, for example, with like some observability MCPs, you see it just says like get traces or get logs or these very generic style types of tools that can be utilized by your agentic system. And, you know, the challenge there is it's like if you wanted to buy a protein shake or an energy drink and every single time you wanted a single one, you go to Costco and you get a pallet.

Starting point is 00:08:16 Like, that doesn't make sense. And what ends up happening is that's incredibly expensive. You're paying per token. And so because of that, your models become more costly to run. It's harder to run those higher scale models for these tasks that are executing its scale. And so, you know, when you look at how we ensure that we engineer that data for the agent, you know, it's one providing the most relevant tools for the agent that we've tested, we validated that, hey, this is the type of tool that the agent is really good at using,

Starting point is 00:08:48 and this is the format of the data that the agent is really good at using, versus just giving it kind of a blanket response of every possible thing, like, you know, you'd normally get from like just doing a get on an API. Instead, we can tailor that specifically to, hey, how does the LOM behind this best interpret the data that we are serving to it. Yeah, I suspect you're kind of filling the gap where, you know, Historically, with, let's take an API as an example, that's a design for a human consumer.

Starting point is 00:09:20 And an engineer is going to read the documentation, the spec around it, and then they're going to map that response into their own data model that makes sense within their own application. You throw out what they don't need, use the subset of information. They need to satisfy some sort of business requirement. So in a lot of ways, they're contextualizing that data and creating sort of a derived dataset from it using, a bunch of external sources to do that.

Starting point is 00:09:45 But with AI systems, you know, consuming essentially the raw data, they don't necessarily have that luxury of being able to provide that contextualization in the subset of data. And even that they could, then you're just kind of spending excess token on it. So you really to have sort of, you know, data be the fuel for AI. It's not just about raw data. It's about sort of purpose-built data to solve a particular need for AI to drive up reliability.

Starting point is 00:10:12 Yeah, exactly. And that takes a variety of different formats, right? And I think that's really where our expertise of both building agents, building things within our platform that directly resulted in open source, or what we ended up putting out into the public. You know, we've had a lot of these lessons. We've learned exactly, you know, how do we best possibly engineer these tools, this data set in a way that is going to give us the best possible outcome on the other end?

Starting point is 00:10:39 And so really what we've done is, you know, we've open, source that, our own learnings, and in a way that we believe moves the entire industry forward so that everybody else can gain that benefit. And what are some of those warnings that how do you kind of test these ideas? How do you make sure that, like, this new version of the data is better than the last version of the data? Yeah, great question. I'll bring up a very simple example, right, which is, you know, how, when you think about

Starting point is 00:11:08 like a metric, a trace, a log, what is every single one of those things? have. When you think about it, it's a timestamp, right? Like, it tells you about the point in time when that metric occurred, when that log occurred, when that trace executed, it really gives you an understanding of the point in time when that, you know, maps to. And just as a simple example, right, the LLMs don't consider all timestamps in the same weighted way, right? And so, like, ensuring that we actually do the transformation, we actually expose us in a way that, hey, we know that that timestamp on that specific metric, trace, log, whatever, is optimized for the LM to be able to consume it and make sense of it and not just hallucinate when it sees a timestamp

Starting point is 00:11:51 that it doesn't actually understand. I think that's like a very simple example of, you know, what we're seeing really across the entire industry where all these different data sources have a variety of different schemas. They have a variety of different ways that they're defined, what type of information they're exposing. And then there's also, you know, how we really short-circuit the thinking process of the models once it's being executed within aura, where, hey, if we can calculate a derivative, we can calculate a profile, we can distill patterns before it even makes it to the agent. Well, we're going to expose those as specific tools, right? So instead of the agent saying, let me go get a bunch of traces from this time period to this time

Starting point is 00:12:34 period, and then I'm going to compare it from this time period to this time period, what we're doing instead is we're automatically exposing tools where we're actually asking those questions in the back end. We're asking those questions before the agent even needs it, because then once it's able to ask the questions of those specific tools, it's able to not only short-sure the process, burn free your tokens, but get a much more reliable result at the end of it. Do you think maybe an appropriate analogy here is like the, if you, if you, you're take someone's user interface. So in a user interface, like some console for Mesmo or, you know, some piece of infrastructure, typically when you click a button, there's probably like a collection

Starting point is 00:13:17 of API calls that are happening when you click that button or you fill something out. And it's been designed for, to facilitate some human workflow versus you could, you know, presumably just put like the actual like prud endpoints of the API there. But that wouldn't be very satisfying as like a human to navigate it. And in some sense, what you're doing is kind of the UI equivalent of the agent to make things like agent ready. Yeah, totally. And we've seen this across software in a variety of different ways, right? Like, you know, you think about like things like SDKs, right? As soon as folks went to automatically generated SDKs where great, I define a schema, I define a spec, and then I want a system that's just going to build that in Python, Ruby, Pearl, whatever, right? You know, oftentimes what you get,

Starting point is 00:14:03 is like an incredibly overly verbose SDK, an incredibly overly verbose, you know, set of tools that, you know, they don't have the same meaning. They don't have the same feel as if somebody designed that SDK in a way that it was specifically, you know, intended to be consumed by the folks that they expect to be consuming it. It's very similar with, with the Model Context Protocol with MCP and, you know, how we do that in a way that's specifically designed for who we know our end customer is going to be, which is ultimately more and more AI agents as the targeting customer for consuming that data. So you made this available in open source. Why that choice? There's lots of ways that you can

Starting point is 00:14:45 distribute a product. Why decide to go with essentially a free distribution model? Yeah. You know, it's hilarious. As somebody who is, you know, my blood runs with production operations with S-S-R-Eve just because I've lived it and I have that kind of deep within me. You know, the biggest thing that is slowing the industry down right now is trust and transparency, right? As soon as something is happening that looks like a black box, it's happening behind a paywall, it's happening behind a system that you can't actually get access to. You can't validate claims yourself of what the system is doing, how it's validating what it's looking at. you really start to break down that trust between what we're exposing the industry and how people consume it.

Starting point is 00:15:36 And so the open source decision was actually a very strategic decision for us because, you know, as much as we're engineering the data at scale behind the scenes, we really wanted that orchestration layer or control plane to be very transparent in a way that, you know, anybody can see exactly, you know, how the agent is meant to make decisions, how we're doing things like orchestrating those tool calls, how we're doing things like writing a memory for the agent to consume as it's accomplishing its task. And so open source was really a very core strategic decision for us

Starting point is 00:16:14 because we didn't want operations with agents to be just another black box in the industry, but rather something that folks could initially gain that trust and transparency. There's a number of other things there too that I wanted to highlight besides just the trust and transparency. One, it makes it really easy to integrate with. We wanted it to be open source because we wanted folks to be able to use ORA in a way that uses Mesmo with it as well. But you can also tie in your other observability solutions, your other service management solutions, your other, you know, if you wanted to grab a memory server that is just the open source memory server or one that you're already using within your company, you know, we wanted to make sure that ORA was compatible in a way that really extended the horizon

Starting point is 00:17:03 of what people could integrate with and what that ecosystem ultimately looked like. And then the last piece I think is just, it's really just the right thing to do, right? When you think about like how this industry is progressing and how really we've gained the benefits of open source over the past couple decades, it's, you know, as companies put this out in the open, as they build out in the open, as they really do things for the benefit of the entire industry. It really moves everybody forward in the best possible way, and we believe that to be true here as well,

Starting point is 00:17:34 and that we really want SREs to trust the agents, they're building, understand exactly what they're doing, integrate with the tools that they're already familiar with, and ultimately push the industry towards doing the exact same thing. Okay. So I wanted to get into some of the sort of technical decisions that you made around ORA. So, you know, to start off with like ORAs written in Rust,

Starting point is 00:17:56 Can you walk through that decision? Was there, is that driven by certain properties that you wanted from Rust to translate in the runtime? Or was there other things that sort of drove the decision around the language of choice? Yeah. You know, we didn't want to just be a company that says it's better because it's in Rust. I think we all have seen this where it's like, oh, yeah, we're doing the same thing, but it's written in Rust, so that's why it's better.

Starting point is 00:18:22 But Rust itself has a ton of benefits as far as just being. production ready when it comes to the performance of the system, memory safety, concurrency, operational reliability. This is really where rust shines in that equation. And so we really wanted to make sure that we weren't just building a toy project. There's a number of different projects and things like that written in languages that, one, S-3s find challenging to operationalize. It's a little bit different than their current stack.

Starting point is 00:18:53 And it's not necessarily something that we wanted to go wrong when you're in the middle of the incident. Like when you think about your tools that are critical for keeping your systems reliable, those tools can't have an outage at the same time you're investigating another tool, right? And so the production readiness of rust in a way that we can do that, kind of do the hard work of building that in rust and ensuring that it has all these properties that ultimately make it better for being production ready in a way that is going to really complement, benefit their workflows without introducing yet another unreliable tool in the equation there.

Starting point is 00:19:35 Was there considerations for other languages? Did you do some, you know, compare and contrast or try building this out in other languages as like a, you know, initial POC to sort of compare? Yeah, good question. Before we started off with ORA, right? Like our teams, in fact, our SRE team internally to Mesmo was really the core driver of ultimately what created, you know, what ended up being ORA on the other side. And before, you know, we went down the path of creating our own solution for that orchestration and control plan, you know, we tried every common framework out there, every common harness, every kind of common solution for doing this.

Starting point is 00:20:17 And, you know, we just found a lot of opportunities, a lot of gaps and a lot of things that if we had, you know, if we were to build this ourselves from scratch, what would it look like? And, you know, one of the things is it being built in rust so that it's production ready. But then there's also a number of different things that are baked into ORA, besides just the fact that it's written in rust, that allow it to make good decisions and keep grounded, right? Like, when you think about, like, how ORA is able to make a lot of things easy, like, when you look at actually how you define these agents within ORA, it's, it's, it's, it's, It's almost simple in a way that each time I show people, they're like, okay, what else is there? How do I define all these other hard things that I'm used to defining in other languages,

Starting point is 00:20:59 other frameworks, whatever? And so with ORA, what we wanted to do is just like make it production ready, but then also make it batteries included in a way that we've built all these things internal to the project that are really just like, you know, that 80% of boilerplate that typically you have to define if you're, you know, choosing an off-the-shelf framework that's just, you know, meant for, kind of taking a bunch of Lego blocks and telling you go figure out what to do with it. ORA is definitely not that approach where we actually wanted to be able to say,

Starting point is 00:21:29 hey, we have opinions on how this should look like in production. And we know what makes a production-ready system. And so we wanted to build that in from day one. Makes sense. As part of this, you've built something, essentially like an intelligent prompt router. It decides whether to execute things directly, orchestrate, request, Can you explain like how does that work?

Starting point is 00:21:54 Like what is the routing logic? How does it make decisions based on, you know, what inputs are being received and what outcomes it wants to achieve? Yeah. The thing about ORA is it's, you know, it's, I like to say it's simple, but it's not simplistic in that when you want to define a workflow, right, depending on what you're defining, what level of, you know, control you want of that workflow where, hey, I just want something that's more like a co-pilot or more of an assistant versus something.

Starting point is 00:22:21 something that's like truly autonomous, truly running in the background, executing things for me. There's a variety of different, like there's a range there of, you know, how you're able to define that within ORA. One of the things that ORA, you know, hasn't baked into it is the ability to handle multi-agent, multi-step workflows, right? And so, you know, depending on what you've told, you know, ORA is important to you about orchestrating that, that workflow. So one of the things is the first thing that ORA does as soon as it gets a request or it's kind of investigating what it should do in a particular scenario is make a determination on like, you know, is the request narrow and low risk? Is it something that we have kind of known, a known understanding of exactly how this needs

Starting point is 00:23:10 to be resolved, right? Like that's where the first step is really ORA being able to short circuit that workflow. if it doesn't need to go through kind of the next, you know, downstream effects of like with things like multi-step workflows or multi-agent workflows, you know, there's a number of different things that ORA is going to do there as far as like ensuring that it's planning, executing, synthesizing its results, and evaluating what it's doing. But when it comes to the simple tasks executing quickly, that's really where ORA, the first thing it does is make a determination.

Starting point is 00:23:46 And if it has determined that it's fairly straightforward, fairly constrained, fairly narrow, then it will really just try to act in the fastest way possible. So it's really just, it's a range of how we're doing that intelligent routing, whether it's a quick, you know, short circuit, low amount of steps type of workflow, or if it's determined, hey, there's a lot of data that I need to build context around before I take any action

Starting point is 00:24:15 or come to any conclusion. Or maybe I also need to ask for clarification, right, in a way that if you're using ORA interactively and I say, hey, this particular agent, I wanna go ask it to do something, well, there may be some things that the agent needs to clarify it from you before it goes on to the next steps.

Starting point is 00:24:35 And so that's really where that intelligent routing and prompt handling kind of comes into play is ensuring that it fully understands what it's doing and ensures that what it's going to do, it has a pretty good idea what's needed in order to do that successfully. And are you using a particular model to help make those decisions? So we're model agnostic.

Starting point is 00:24:56 So you can actually plug any model. So we have adapters for any kind of like model running over an opening I API, any local models, you know, any anthropic models. And you know, we support a variety of different models there. So we're not using any specific model. Just whichever model you have given us, we will use in a way that is orchestrated for that task. Now, the thing that that actually allows us to do pretty incredibly,

Starting point is 00:25:24 this is kind of an emergent effect of us having this built into the system, is that then we can say, hey, for the larger planning steps, like the things that we want the model to really have a lot of intelligence for doing, we can have that with a bigger model behind the scenes. But then maybe for more of those like, downstream agents, downstream tasks that are like a little bit more focused. We can actually use smaller models, open source models, and actually like have a system that's developed and orchestrated, all from ORA.

Starting point is 00:25:58 Why build your own orchestrator? I mean, there's a lot of, you know, orchestration layers that are out there on market. Like, what is the advantage of kind of like rolling your own in this situation? Yeah, one is the ability for us to, to provide our own insights of running this at scale. A lot of the projects out there are toy projects, or they're incredibly generalized, right? So we think about a lot of agent frameworks,

Starting point is 00:26:28 a lot of agent runtimes, they kind of want you to be able to do everything. And so that's both a blessing and a curse. And so for us, we're really leaning into our own domain expertise in that we believe that we are the best possible people in the industry to be able to say, this is what you should do in order to utilize observability data, utilize data from the runtime, utilize data in a way that is focused on SREs and

Starting point is 00:26:57 platform engineers and ensuring that production is running effectively. That's really where the decision for do we invest in something out in the ecosystem in a way that we were just using another framework or recommending another framework or do we go and build her own and really like at the end of the day after evaluating everything and making that decision of, hey, we want this to be open source. We want this to be something that is meant to be batteries included, right? Like I don't need to expose something where we say, hey, there's a framework that you can kind of do this, but you have to set it up in a particular way, you have to orchestrate all these different things, you have to bring all these different tools with you.

Starting point is 00:27:36 We wanted to avoid that type of experience because in my opinion, that's really what's been stopping a lot of the adoption so far of like SREs, DevOps, operations engineers from utilizing agents in production is really just that hurdle of like, if I want to go build an agent today, I kind of need to go learn the ecosystem from the last three years. And that's just not a great, not a great pattern. So we wanted to short circuit that. How do you manage memory at the orchestration level? Like how are you passing sort of context around to like these, you know, sub agents and so forth? Yeah, great, great question. So, you know, the way that ORA will manage memory, it kind of depends on the task.

Starting point is 00:28:20 And so when you think about, like, as an agent is going through a specific type of task, they're going to need to be able to write an artifact. Like, as it's going through the task, like, hey, give me a scratch pad for that agent to be able to kind of orchestrate what it's doing, recall what it's doing without like constantly having to feed that into, constantly having to like blow the context window with that information. So that's kind of where ORA will come into play there is like adding that short term memory or that turn-based memory. And there is sort of like a like a data store in charge of that?

Starting point is 00:28:59 Like how is the that, you know, temporarily offloaded from the actual context window? Yeah, good question. So for all the short-term memory, like the things that help or accomplish its task, we didn't want to introduce yet another dependency, but we're doing a lot of it using just like file-based memory, which works incredibly well. But we also, like, you know, give you the ability to plug in your own memory server over MCP. And so you can keep memories in other places as well that kind of help to balance that workflow a little bit more if you wanted to have more persistent memory. And it kind of like, it's a range based on, you know, more of the turn-based and in-task type of memory.

Starting point is 00:29:41 That we want it to be like very simple. So that's why we went with file-based. And then for the longer-term memory, right, like the memory that helps you to learn over different types of tasks being executed, the same task being executed over different time periods, that's where like the longer-term memory comes into play that we typically, you know, see folks both like complementing the short-term memory. with the long-term memory. And is ORA handling the compaction of that memory or some other sort of long horizon context optimization

Starting point is 00:30:14 to make sure that you're not blowing through the token window? It does. Yeah. So, and we'll do that, like, you know, as an example, as ORA is making tool calls, right? Like, we want to make sure that we understand that that tool call isn't just going to be shoved into the context window. So we first write it to basically a scratch pad.

Starting point is 00:30:34 And then we recall specific parts that we want to feed into the model, right? So, like, that's where it's handling a lot of kind of the basic compaction, but it's also handling, like, some of the orchestration memory to make sure that the agent knows how to call the right tools each time. And the fact that, like, we want to use that in a way that both helps it execute more reliably, but then also learn over time as it gets into those longer horizons. And you're weaning, you know, heavily into MCP. for, you know, tool calling.

Starting point is 00:31:06 I guess, like, can you talk a little bit about your choice there? And I'm also curious on your thoughts. You know, there's a lot of recent, like pushback on MCP, especially in the dev tools market, like, in order to do things like token optimization, like, people are like, oh, MCP's dead. It's all about CLI and or APIs directly or just have the LM, write the code to make the, you know, a call.

Starting point is 00:31:28 I'm kind of curious, first of all, like, just from a product standpoint, a choice around MCP. And then secondly, like, Do you have sort of any hot takes essentially on what's going on in the industry and to push back against MCP? Yeah, I've actually been following personally MCP from day one. Like I remember November of, well, I guess not last year. Yeah, 24 now. It's already like it feels like three years at this point.

Starting point is 00:31:53 But every day in AI is like a, I don't know. Yeah, exactly. At least of like normal days. This week is like the entire, you know, year of innovation. in probably like 2017 or 2018 or something like that. Exactly. So going back to MCP, the reason why we wanted to support MCP from day one

Starting point is 00:32:12 and not create yet another standard there is because the tools that people want to tie into ORA, a lot of those tools already have MCP servers, right? So that creates a very wide net on like what people can integrate with. If it has an MCP server, great, we can go ahead and integrate with it. The challenge with MCP is typically that it's a protocol that is meant to serve a lot of different audiences, right? Kind of like what we were talking about before. You know, MCP, you can tie it into cloud code on your laptop.

Starting point is 00:32:46 You can use it with like your chat TVT instance. You can, you know, tie it into an agent that's running autonomously. And so it kind of has to serve a lot of different audiences. And, you know, it also has to do that securely. So when you think about like where MCP has been. primarily innovating in the last couple of specs. It's been like, hey, adding things like handling OAuth flows, handling things like, you know,

Starting point is 00:33:10 how it's going to expose, like which tools it's exposing to the agent, like before the agent consumes it or like in a public way. But that also ends up making it very chatty. And so that's kind of where like we've seen a lot of the whole MCP is dead emotion in the last few months, I think, is that, you know, we're starting to really see the downstream effect of, hey, if this MCP server is just always giving me, again, like the Costco-sized pallet of information, and I have to pay for every single token that it's giving me, well, I'd much rather

Starting point is 00:33:41 just utilize, you know, the agent writing code or the agent utilizing a skill to be able to accomplish that same thing. And I think like MCP still has its place in the longer-term horizon of things, mostly because like when you're thinking about things like shared tools, right, like especially internally shared tools where, hey, I can't just have something. something that every single developer has with their agent, but I need to actually run it as a stateful service. That's really where MCP is good at doing that. But I think where it's being offset it is like it's now not only the only tool for the job, which that's also probably part of the problem that got us to where we are with MCP is it was kind of like, hey, we didn't

Starting point is 00:34:24 really have a hammer before us. Every time you saw a nail, we tried to hit it with our hand, right? And now all of a sudden we got a hammer, but now we're trying to hit screws. And so we really needed an entire arsenal, right, an entire toolbox of different things that made sense for the agent to interact with its environment at different levels of complexity. And I think MCP is still going to be a core of it. I think skills is going to stick around. That's really where we've been investing with aura as well. And, you know, code writing and code sandboxing is another piece as well that, you know, I think that's a pattern that is going to stick around as like, how the agent can potentially orchestrate its own multi-step tool calls

Starting point is 00:35:05 without you specifically telling it the pattern that it needs to execute, but also while not introducing a ton of context bloat or prompt stuffing of every single response from the MCP server. The other challenge I'll just end off with there is that, you know, I always say like not all MCP servers are created equally because you see a number of different companies that, you know, they say, hey, we just have an MCP server, like go use it, right? But when you actually look at it, their tools are not optimized for agents.

Starting point is 00:35:34 They're very basic. They're returning a lot of things in its responses. And then you see folks that are in the space that have invested in their MCP server. And it's a vastly different experience across the two. So I think that's also an opportunity for us to kind of sit in the middle and make sense of either scenario, whether that's a high-grade MCP server, low-grade MCP server, or is going to help you in both scenarios there. I think some of this too comes from the fact that where agents have really caught fire, at least in the last half year, has been a lot of agentic engineering.

Starting point is 00:36:13 And CLI makes a lot of sense in that context where you're running an agent locally on your machine. Maybe you're already running it in the CLI. The things that you're interfacing with probably have good CILIs or APIs that you can call directly. but that's probably different than building an agent to process loans in a bank. That's probably a different set of requirements and interfaces that you're going to be sort of connecting into. Yeah. Yeah.

Starting point is 00:36:40 And I think like what I keep coming back to where our approach is, is like, there's a lot of engineering opportunity behind the tool. So, you know, when you think about like how those tools expose the data for the agents to consume, I think where we'll see more and more focus is, you know, how do you, how do you with your specific perspective, with your specific understanding of how people are going to consume this in an agentic workflow, like how do you as a company, how do you as somebody as a, you know, with a particular perspective, engineer your MCP server, engineer that context before it even hits the tool. You know, a great example is like, you know, what we've seen with things like

Starting point is 00:37:16 even the playwright MCP server, right? Like it's very, it's very focused on the agent doing a specific thing, but it's not giving it like an entire blob of like HTML or JavaScript every single time it asked for it. It's kind of like doing the hard work of doing a little bit more like filtering and refinement. And that's really where it's beneficial. It's absolutely like solving a particular problem. I think we're just early in the adoption curve there where like, you know, we're in, we're in cloud in 2010 or 2009 where it's like everybody's just kind of trying to replicate what they did with their API. And now we're starting to see more and more focus around like, how do we do this in a way that is tailored to our audience?

Starting point is 00:37:55 Yeah, I mean, it's the same thing with when mobile bursts on the scene, too. Everybody's trying to take what they worked in desktop and just like jamming into this like tiny little form factor. And what ended up happening was you ended up spending like really the first couple of years of mobile at least was laying the kind of infrastructure groundwork to make it possible to build like mobile first companies. Like it took a while before you had Uber and Instagram and things like that that were kind of these like mobile first experiences. I think we're seeing an accelerated timeline with AI, partly because it's, you know, everywhere and there's a lot of money into it. But you're also using AI now to generate the thing that you want for AI too. So that's like accelerating timeline. So the laying the groundwork of the infrastructure is probably moving an accelerated pace.

Starting point is 00:38:43 But it does take that work. You can't just wholesale take this thing that worked previously in every situation anyway and just kind of like apply it to this new paradigm. 100%. Yeah. And that's really the, that's really the thesis or the hypothesis rather, that we've been really focusing on as we're developing ORA and as we're developing this ecosystem is exactly that, right? Like as the tools continue to get better, one, we're going to lead the way there. But like as the agents become better at utilizing those tools, we want to be able to make the

Starting point is 00:39:15 best possible experience for that interaction and not just give people, you know, a bunch of Lego blocks to go figure out how to do it themselves. And then how do you think about, you know, or taking sort of potential like destructive actions versus just like read only and being like guiding a user and then as a user needs to take a destructive action. They're taking it on their own behalf. How do you kind of like balance those tradeoffs? Yeah.

Starting point is 00:39:41 So great question. I think like especially with SREs, especially with DevOps, especially with platform engineering, like it is it is not an opportunity where we can have the moment that we had with Claude Code over the last year where, you know, there was a, there was a period of time. I always laugh about it where it's like you tell Cloud it was wrong and it's like, you're absolutely right. And you can't really do that in production. Because at that point, that's where you see these incidents where like, hey, the agent just deleted a database. Like it, it did something that it like either hallucinated or just thought it was doing the best thing with its, with its tools available to it. You know, that's like the risk that we're trying to avoid here is

Starting point is 00:40:20 that production is sensitive, production, you don't get the opportunity to make mistakes. Oftentimes, those mistakes are not just me rewriting the code and seeing if the test pass, but rather, like, that can be a production outage, that can introduce, you know, like a monetary cost of the business. There's a lot of things there that are much riskier than just writing code on your laptop or in, like, more of a developer environment. And so when you think about, like, you know, one, how aura allows you to progress, right? You have full control to have ORA act as a co-pilot and assistant.

Starting point is 00:40:55 You can kind of progress from more of the workflows where, hey, I want ORA to really ask me every single time it's about to do something. But as you start to gain trust in the system, right? Then you're going to say, okay, I want it to be more of like an assistant, where it's not going to actually take the action for me, but it's going to go all the way to the end state where it knows exactly which action is going to get me the best possible result at the end of it. And, you know, I think past that, that's really when we start to get to the point where after we've gained that trust, after we know that the agent is able to really get to the right solution every single time, or at least as close to possible, like the every single time, you know, then we can start going into, you know, truly autonomous, agentic seree that, you know, knows your system better than you do, never sleeps. It gives you the controls when you need it. But like, you know, that's where things like human in the loop,

Starting point is 00:41:48 really kind of come into play there where you can define how aura is going to ask for permission in certain scenarios. And again, like you can define that. You can iterate that over time. You can make changes and say, hey, this particular workflow, you know, has functioned well over the past six months. Like, I don't need the agent to ask me every single time it's going to do something at a low risk kind of point in time. There's also another piece besides like the human in the loop and like what we're doing there and how we're like letting ORA kind of progress from more like non-autonomous to autonomous. And a lot of that comes back to like, you know, the loop that's actually built in within ORA, right? So like, you know, building on top of some of the common like industry

Starting point is 00:42:33 trends around things like Ralph Loops, you know, what ORA does is like first it plans, right? It figures out what to do. It kind of extracts context. It executes. It calls tools. It runs queries. It it figures out what it could do to kind of build an understanding, build context. And then from there, it'll synthesize its data, right? Like it'll try to summarize what it just learned about and kind of make some hypotheses on what it should do next. And then from there, it evaluates. It figures out whether or not it's doing that over time better. Like it's coming to an inclusion or conclusion rather.

Starting point is 00:43:06 Or if it starts to see like, hey, it's like I'm contradicting my hypothesis. I need to replan. That's where like we built the safety. into the core of aura that you don't need to define yourself, where once you give us what you want the workflow to do, what you want the agent to act as, that loop is built in from day one that if the agent makes mistakes in its thinking process or if it's coming up

Starting point is 00:43:34 with a solution that it needs to correct itself on, it's not going to do that as it has already deleted resources or made a chain, then it's like, whoops, how do I recover from this? Right. Like we wanted to avoid that as much as possible. possible. And that's really where the agent loop that's built in is kind of like the core to, the core to that strategy.

Starting point is 00:43:54 Yeah. Where does things like self-correcting behavior, you kind of fit within these evaluating loops? Like, is that something that you're supporting? And I guess what would be a concrete example of that? Yeah. So that last step, right, like the evaluate step that's built into every single loop, that's really where the agent will self-correct. And the benefit here, the reason why open source is great and the reason we constructed ORA in a particular way, all of the trace data is exposed through an open telemetry standard that you can consume within Mesmo or your favorite tool. And you can see as the agent is coming to a specific conclusion, how it's correcting itself,

Starting point is 00:44:38 how it's making those decisions, how that thought process is happening. And so, you know, the evaluation step, that last step in the loop is really where that self-correcting comes into play. And then that also gives us the ability to, you know, debug that, figure out exactly what went wrong. How can I improve that loop over time as we see those traces actually being exposed in a way that you can interrogate them yourself? Or, you know, we'll have some out-of-the-box functionality for doing that for you as well. Yeah, it seems like one of the I think advantages that, like, could. coding agents have is that this self-correction loop and testing is there's kind of things built in to allow you to do that.

Starting point is 00:45:21 Like you can compile the code, you can run it, you can running against tests. So the feedback mechanism gets quite fast. With infrastructure, it's a bit more challenging. Like you could potentially spin up like a duplicate environment of production and try to mimic the situation, but it might be kind of expensive to do and how quickly can you do it and stuff. So I wonder if there's even outside of, you know, aura and stuff, but like, you know, as a thought experiment, like, do we need to be in a place where we could have, you know, the systems in a place where we'd really quickly spin up like these ephemeral environments where we could,

Starting point is 00:45:57 you know, run these types of tests to see, hey, can we, is this correct? And if not, like, how do we sort of self-correct from it? Yeah, I think absolutely. And we're starting to see like a lot of movement in the industry overall, right, around things like sandboxing. Yeah, and even like databases, you can kind of like spin up, tear down and stuff like that. Yeah, exactly, right? It's like if I need to have a specific step where the agent's going to validate this and, you know, the same way that we do in Nessori like today and for the last 20 years, right, is like, I'm going to go test this in stage.

Starting point is 00:46:27 I'm going to go test this in dev. I'm going to go test this in every lower level environment before I hit production. Because again, production is where the highest risk, the highest possibility for something to go wrong is really at. And so that's where, you know, I think like one, we're seeing more progression towards sandboxing. But also, like, you know, you can give that information to ORA, like in your workflow exactly, you know, how you wanted to evaluate it in maybe lower environments or sandbox environments or, you know, in like an ephemeral environment. And, you know, it'll make sure that as part of that agent loop, as the agent is kind of going through those steps of ensuring that that is the right change to make, that it follows. those steps in a way that it's going to test in the environment that is going to be the lowest

Starting point is 00:47:12 risk before doing anything in production. I think that's actually a huge unlock for some of these more autonomous workflows is really how do we give the agent the proper tools necessary to test in an environment that is not going to impact as highly when it goes wrong because like for example, if I can have the agent just validate what it would have done in a test environment, hand off to you and say, hey, this is exactly what I did. This is my resolve, if you run the same command, like, you'll get the exact same thing in production, but you've told me to kind of not touch production, so I'm going to give you this here. You know, that's like a potential path for folks to be able to go from, you know, more co-pilot,

Starting point is 00:47:52 assistant type of workflows to truly autonomous, agentic engineers. Yeah, I mean, how do you see, like, obviously, like, I think sort of core engineering is going through a huge transformation in terms of like what that role looks like right now. with, you know, assuming like ORA succeeds, also with everything that's kind of going on in the industry right now, like, how do you see the SRE role sort of evolving in the next couple years? Yeah, great question. You know, I think like we're hitting a scale problem. We've been hitting the scale problem for a while now, especially since like we, we saw this point when like when I started my career, you know, I was on an operations floor. where there was, you know, there was a assist admin, there was a network engineer, there was a

Starting point is 00:48:39 release engineer, there was a database engineer, database administrator. And, you know, we had these, like, very specialized roles that came together in a way that, like, you know, helped you to accomplish a task of, like, reliable systems. And we've seen a huge compression of that, right? Both because of things like platforms like Kubernetes, where a lot of that's just abstracted and kind of given away it for free. Like, it's very similar to kind of what we're doing with ORA, like similar pattern there. But like, the benefit of that is that now those SREs can just kind of focus on the things right in front of them. But the downfall of that is really like, you know, you lose full context into like exactly how to optimize. You lose that domain expertise. You lose, you know, the,

Starting point is 00:49:21 the like book knowledge that might be tricky for you to synthesize in an investigation because there's so many things in front of you, right? And so with ORA, where I see this progressing, is really, you know, we're giving SREs the co-pilot, the assistant that they haven't really had in a way that, you know, they can really, you know, 10x their ability to respond to incidents. It can also, you know, over time as it becomes more autonomous, you can use this as your agentic teammate, right? Like I always tell people like, if you had an empty Jira queue or you knew a bunch of postmortems where there was a bunch of tickets that were never actually resolved because they didn't have time

Starting point is 00:49:57 and you knew exactly what needed to happen, but like you just haven't had time to implement it. If you had a teammate in the background that you can define with the right tools, with the right ability to go optimize those, then you really just get down to planning and defining that spec and getting kind of more towards like spec engineering, right? Like, define the task, define the behavior, and then let the agent kind of execute that in the safest possible way. Yeah. Well, it's great. I mean, as we sort of come up on time here, is there anything else you'd like to share? I think that like the the first thing is we really are invested in open source.

Starting point is 00:50:38 Like that is, I think, one of the core things that I wanted to make sure is clear is that, you know, we built ORA to be Apache 2 licensed. We built it in a way that is, you know, integrated with Mesmo, integrated with like, you know, our tooling. But you can, you know, use ORA completely on your own, completely without Mesmo. and that's okay. And we really wanted to move the industry forward. And so, you know, really the big thing for us is, you know, if AISRE and AI operations is important to you, like we want to hear from you.

Starting point is 00:51:12 We want to kind of work with you and kind of push the industry forward in the same way that we did for cloud, DevOps, SRE, like, you know, this, we're standing on the shoulders of giants here. And so we just believe that as the community moves forward in this direction, in probably three, five years, this becomes just such a commonplace thing where SREs are using agents and those agents have the context that they need. And we just really want to be a driver moving the industry in that direction. Awesome. Well, Andre, thank you so much for joining me. This was fantastic. Yeah. Great to chat. Cheers.

Software Huddle - Making Data Agent Ready with Andre Elizondo

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.