The Data Stack Show - 176: The Fundamentals of Event-Driven Orchestration and How Generative AI Is Shaping Its Future with Viren Baraiya of orkes.io

Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. We're here on the Data Stack Show with Viren Bhairia and it's

Starting point is 00:00:30 so great to have you back on the show. Amazing that it's been, I guess we can say years, about a year and a half. So thanks for giving us some time, Viren. Absolutely. Nice to be here and thanks for hosting me. Absolutely. Well, you've covered quite a bit of ground since we last talked, but can you just give

Starting point is 00:00:49 us a quick overview of what you've been up to for the last year and a half and the company you've been building? Yeah, absolutely. So I remember the last time when we chatted, we were just kind of had to come out of the stealth mode. We were still kind of focusing on building the product. We were basically taking the conductor and building Orcus to

Starting point is 00:01:08 make it available on enterprises, on various clouds. And of course, it almost feels like now decades that we started this. But in the last couple of years, we have built out the product that works in all three

Starting point is 00:01:24 clouds, built the partnerships with all the cloud vendors, have customers onboarded, which is great, keeping us busy. And fortunately or unfortunately, they are across pretty much every time zone. So, you know, also keeping us busy all the time. But it has been an exciting journey building a company from ground up and going all the way from zero revenue to some revenue. Absolutely. Yeah, that's amazing. And Viren, last time we talked a lot about microservices, orchestration of microservices, right? And orchestration in general comes up like more and more often lately. There are like a lot of conversations about when it comes like to, let's say, more of like the application development layer and like this fusion of databases, like transactional systems together with orchestration.

Starting point is 00:02:19 And then of course there's AI, right? Which anyone who has tried like to build something around AI, they definitely know that it's all about how to guardrail all these different models and services to achieve a consistent result. So orchestration is becoming a very hot and interesting topic and broader topic to what we were discussing a year and a half ago. So I'm very excited to talk about that. And also hear about like the evolution of orchestras from back then to today.

Starting point is 00:02:54 Right. Yeah. What, what about you? What's you are excited to talk about today? Yeah, I think as, as you rightly pointed out, right. Like when you think about orchestration you know, the humble roots, right. Orchestration has been around for a long time. But lately, you know, it has kind of taken its own kind of form when it comes to AI. And just like everybody else, I think that's one thing that is very exciting that is happening. And you know, how that lands with orchestration and you know, where, I mean, now we hear a lot about AI orchestration, right? That definitely was a thing before, but nobody talked about it.

Starting point is 00:03:30 So I think overall, I think the entire orchestration space is kind of evolving very rapidly. And where it is going, I think there is certainly an exciting place today to be. Yeah, 100%. So I think we have a lot of very interesting things to talk about. What do you think, Eric?

Starting point is 00:03:46 Well, let's jump in. Let's do it. day before we recorded this, TechCrunch published an article about Netflix abandoning the Conductor project, which you helped to build inside of Netflix, and Orcus, your company that you've been building for the last couple of years, forking it and taking ownership. So love the timing on that. Can you give us a little bit of the backstory and sort of tell us about that news? Yeah, absolutely. I mean, title is a bit clickbaity,

Starting point is 00:04:31 but in the end, we have been working with Netflix for a while. And the idea was this, right? That Conductor has become very popular as an open-source project. Today, when you look at the number of companies using it, these are like who's who of a lot of tech companies and large enterprises right and supporting them as a community is definitely a full-time job it becomes a challenging thing so you know we kind of stepped in but started working with netflix and the overall goal there was that like you know how

Starting point is 00:04:58 do we enable community to be kind of you know partner here right how do we kind of get there get them to be more excited about it, you know, get them to give more ownership stake into, you know, the product roadmap, how everything kind of moves. And only feasible way there is, you know, essentially, you know, you create a foundation around it and, you know, get everybody to participate.

Starting point is 00:05:17 So I think that's basically what we decided to finally do. It took us a while because, you know, you have to kind of go through all the legal and other kind of, but good thing is it has happened now. So, you know, you have to kind of go through all the legal and other kind of things. But good thing is it has happened now. So, you know, there is an exciting place. You know, the initial feedback from the community also is very encouraging because suddenly they feel like, you know, now they have ability to kind of be part of just going to make the project much more stronger in terms of, you know, its adoption, its visibility, and, you know, how community can contribute. So it's very exciting. It's super exciting for us.

Starting point is 00:05:56 That's great. I think maybe what happened with the clickbait title was that originally it was Netflix hands off, but then, you know, they asked the ALM to write something that would get more clicks and they changed handoff to abandon. Well, let's just, you know, for the listeners who didn't catch our last episode, which if you didn't, you should go back and listen to it.

Starting point is 00:06:20 Just give us a breakdown of Conductor and Orcus and just describe what the products do. Yeah, absolutely. So Conductor, essentially, its core is an orchestration engine. And orchestration, of course, is an extremely loaded kind of term, right? It means different things to different people and personas. But at the core of it, the Conductor was designed and built to build event-driven applications, right?

Starting point is 00:06:43 Applications that respond to events that are happening in the business context. And this could be orchestrating microservices, orchestrating kind of events on different, you know, messaging bus and things like that. So that's what Conductor does. And it does it very well in terms of, you know, handling both business and process complexity, as well as, you know, being able to handle it

Starting point is 00:07:04 at much, much larger scale and Orcus basic was founded to kind of take conductor and, you know, provide an enterprise version kind of realizing that, you know, there is a need and a demand for a product like this. But at the same time, just like, for example, the Linux is completely open source. You can just go to kernel.org and build your entire Linux and get all the GNU projects.

Starting point is 00:07:32 But for an enterprise, you probably want to work with a vendor to get everything ready, right? Sure. And then this is the model that, I think over the last, I would say a few years, has been perfected by a number of companies

Starting point is 00:07:43 in terms of how do you build an open source project and also monetize that. Yeah, love it. And do a quick refresher for us, because like you said, orchestration is a very loaded term. And when we think about the world of data, you tend to think about pipeline jobs starting or completing or failing. But Conductor really includes pipelines, but it really encompasses sort of any microservice. And so just help us understand that's a much, much scope you know of orchestration that is correct yeah like when you think about data pipelines you know data pipeline tend to be you know a lot more kind of i would say a code screen right you have kind of pipelines running on a daily basis every step

Starting point is 00:08:39 in the pipeline runs for sometimes hours and hours at the end And then there's an extreme end of that where you have microservices or event orchestration where every step completes in milliseconds and then you are running millions and millions of them every day. And the audience is very different in terms of who is writing those things. On one hand, you have data engineers focusing on data pipelines. And the important thing there is also the dependency management, right? Like, in fact, data pipelines are a lot more, are kind of really event-driven systems. Because, you know, you start a pipeline when something happens, right? A file arrives or some job completes and whatever not. We don't really think about it that way, but that's really the essence of it.

Starting point is 00:09:22 And on the other hand, it's very similar as well. And what's interesting is somewhere in between comes this process orchestration, which is a lot more kind of human centric, where you also have human actors in inside the process, you know, taking different actions. And very good examples here are like, you know, the approval processes, you know, with various use cases, right? Loan application approval is a classic example where somebody has to kind of review it, right? Oh, yeah, yeah, sure. Right? So, and, like, you know, that's in terms of, you know, what kind of systems you are building. But then, like, you know, orchestration for, you know, the end user also means different things.

Starting point is 00:10:00 Like, for data practitioner, of course, it's data pipelines. For software engineers, it's more about micro services and events but when you start to go outside of the boundary of just the engineering right when it comes to product it's about how is my product built you know and what are the nuts and bolts right what are the optimization opportunities that i have a very good example i would say is let's say in a supply chain if you look at your as a product manager if I look at my process and see, you know, how long does it take

Starting point is 00:10:26 for somebody to place an order and until the order arrives at doorstep, if it takes three days, what are the steps? How long does it take? And if I want to cut it down from three days to two days,

Starting point is 00:10:35 where should I optimize? Like, where is that process? For them, it's a totally different, you know, thinking about it, right? And if I'm in the support, I want to know

Starting point is 00:10:44 what's going on and, you know, how can I fix things, you know, more than anything else.? And if I'm in the support, I want to know what's going on and, you know, how can I fix things, you know, more than anything else. I don't care about how it's built and what's the use case and things like that. So I think, and then I think on an extreme end of that, like as we go more closer to the metal, you have orchestration of the infrastructure, right?

Starting point is 00:11:03 Kubernetes is an orchestration engine in the end, right? Orchestrates your components. Sure. So I think when people think about it, depending upon what persona, what head they are wearing, you know, they think about it differently. Conductor is, as you said, it's a very broad, as a matter of fact, when we built Orcus,

Starting point is 00:11:19 our entire Orcus tech runs on Conductor, which means our CI, CD, our deployment, our entire cloud provisioning infrastructure runs on Conductor. So we use it as an infrastructure orchestration to process orchestration. We run our customer service on Conductor. We run our stand-up bots on Conductor. We run our AI bots on Conductor. It's like basically dog food.

Starting point is 00:11:39 Wow, that's a, what a cool opportunity to dog food your own product and sort of build your entire company infrastructure. Well, I'm interested to know, I want us to dig into the tech aspect of this on the show and just hear about what you've been building, right? Because it's been a year and a half. But I'm interested to know just on a personal level, you've tackled some gigantic engineering projects. Obviously, Cond know conductor still has a lasting legacy and you're a big part of that what has the transition been like going from being an engineering leader inside of these really large organizations with really gnarly engineering

Starting point is 00:12:19 problems to being a founder and you know a couple a couple of years ago, starting out with just a couple of people and, and building. Yeah. I think that's an interesting, uh, uh, you know, interesting question. Like when you think about like, you know, working at a large organizations, right. Like, and you know, Netflix, Google, for example, where, you know, as an engineering leader, you know, you are a lot, one is like, you know, you are part of a really big machine. So, you know, there are a few things that you never have to worry about. Things like, for example, you know, how much is it going to cost? Like a good example is, you know, when we ran predictions engineering at Google,

Starting point is 00:12:55 the amount of resources it took was insane. Like it took a data center to run those things, right? Because it's processing data from internet, which is kind of huge. So, you know, you go from that level of things, right? Because it's processing data from internet, which is kind of huge. So, you know, you go from that level of things, right? Like, and when you think about even the numbers, right? Either the revenue or the users you talk about in hundreds of millions and billions. Yeah. It's your kind of denominator, right?

Starting point is 00:13:15 Like when you say three, it means 3 billion and not 3 million or not 300 million. And then you go from there to being a startup founder you know you have to now think about everything right like you know is it cost 200 to run or 300 where can i optimize so you know cost is one part but more importantly now you know there is there's no support system you you are the support system you are at the end of the the chain right there is nobody to complain to which means you know you you are not only responsible for engineering decisions, but also business decisions, company strategy. You are an engineering leader. You are a product leader. You are also an HR person, right? At least in the early days. Um, and how you build

Starting point is 00:13:54 your product, how you build your team also has a lasting impact on how your company is going to grow. Because, you know, if you investing drone tech or wrong people wrong people, that's not going to kind of turn out very well, right? So there is definitely kind of that major shift in terms of, you know, how things kind of move. At the same time, like, you know, when you look at like big companies, when you're working for it, right, there is kind of a cushion, right?

Starting point is 00:14:19 What's the worst that can happen when you take on a project? The project can, you know, take longer to complete, it can fail. It does not necessarily materially impact you. Yeah. I think company is different. If a company fails, you fail and there's a lot more at stake, right?

Starting point is 00:14:33 So the stakes really go up quite a bit. Right. You have employees, you know, who are relying on you. And now suddenly you have to think about like, you know, if you have 20 people in your company now, you know, and if you make a wrong decision, you're going to impact the livelihood and basically you have to impact 20 families. You have to be very thoughtful about, you know, what you do, how you do things.

Starting point is 00:14:54 And this has nothing to do with the business. It is, it's purely just people, right? Yeah. So you have to also think about the people aspect of building a company, running a company, and It's very different. A lot of learning experiences. Early days, we were also the business development people. We were also doing sales.

Starting point is 00:15:13 I had no idea how sales worked. If somebody asked me, what's the sales cycle, I didn't know what it meant. I can today tell you what is my sales cycle, right? I can talk to, I can interview a sales leader. Let me tell you this. So you end up learning a lot more. So I think that's an interesting journey as well. Yeah. You're obviously the CTO and so your job is deeply technical. What, in terms of the non-technical aspects of your job as a founder, leader, you know, wearing multiple hats, which aspect of your job do you like the most from a non-technical aspects of your job as a founder leader you know wearing multiple hats which aspect

Starting point is 00:15:46 of your job do you like the most from a non-technical standpoint i would say in terms of non-technical understanding you know how do you kind of sell like we are an enterprise SaaS company right so learning and like you know understanding how to sell to enterprises is really eye-opener you know and like you know understanding those kind of processes you know how companies operate yeah it has been a very fascinating thing because i have always worked in large companies where like you know the purchasing decisions are always made by the purchasing team you never directly deal with them you are on the consumer side but you're on the other side. And like, you know, understanding those nuances is very interesting.

Starting point is 00:16:30 The other thing that I really love is, you know, we are a company founded on the foundations of an open source site. So I think putting out communities and working with community, it's not very technical, but it is also deeply satisfying because when you see people commenting good things about the product, you know, adopting that, even they don't have to be your paid customers. But, you know, it's deeply satisfying from that end that like, you know, what you did, like, you know, definitely helps people.

Starting point is 00:17:00 And, you know, so yeah, these are the two aspects that I really enjoy. That's so great. Well, congratulations. I mean, it sounds like it's been an incredible journey and learning experience. Let's start to focus a little bit more on the technical side. I know one of the things that you and Kassus are excited to talk about is, you know, AI and how orchestration and AI fit together. And you have some really interesting thoughts about software development. So as a preamble to that, what I want to ask about is the perspective that you're bringing to that from your time at Google. So at Google, you worked on a product that allowed people to take advantage of Google's machine learning infrastructure. So a product that allowed people to take advantage of Google's machine learning infrastructure.

Starting point is 00:17:47 So, you know, a product that made predictions. You know, when you sort of were working on that, you know, sort of maybe it seems like right, right up a little bit to the edge of, you know, this massive explosion and, you know, the AI craze driven by large LLMs. But what perspective do you bring to AI based on your experience at Google and actually building a product around that? Yeah, I would say, you know, when you think about like, you know, working on AI or machine learning, right, there are two aspects to it. Either you are a deeply technical person researching models or working on foundational frameworks like TensorFlow, for example,

Starting point is 00:18:32 or working on the hardware side and building chips. What was interesting about my time at Google was our focus was how do we democratize AI for an average developer whose job and kind of whose primary thing is like, you know, I'm building an app and my app kind of sustains itself through either ads or kind of, you know, in-app purchases and subscription management. And, you know, when you look at companies like Netflix, right, who actually pioneered, you know, how do you kind of create higher level of user engagement through A-B testing and personalization? How do you kind of make the same kind of technology available to kind of an average developer who does not have that kind of resources? And it's not even possible because they don't even have that kind of data available to begin with. Sure. Yeah, yeah. And the challenge there was like, you know, how do we kind of take, let's say you are

Starting point is 00:19:29 building an app and this is a simple game with maybe say 10,000 users who are playing the game. Now, 10,000 data points is probably not sufficient to train a sufficiently large, you know, largely accurate model. So, you know, how do you solve that problem? And the way we thought about it was that yes, you have 10,000 users, but if you look at internet as a whole,

Starting point is 00:19:51 there are probably 4 to 5 billion apps in the world from which you can train a model. And that's large enough. It's more than a large state of data to train a federated model. So basically what OpenAI did today, we kind of took the same approach that like,

Starting point is 00:20:08 you know, there's all this data coming in, right? Can we get insights? We can be infer and, you know, figure out the user personas and, you know, essentially make it available as a service to developers. So that now, instead of like me trying to kind of either, you know, invest, if I'm becoming like Unity, maybe I can do that. But, you know, small time developers or even an average unity, maybe I can do that, but you know, small time developers or even an average developer, you don't have to think about it.

Starting point is 00:20:29 I can say, you know, Hey, this is a user, you know, tell me likely you would have this person making a purchase or clicking on an ad or, you know, staying engaged in my app, you know, and if it is, so we actually do two things. So, so first of all, this, you know, that like, you know, telling me the likelihood of person doing next. So that's like, and the way we thought about it was that that was good enough initially, that let developers make a decision. Now, if you think about it from a developer's perspective, what decision are you going to make? Most likely you're going to flip a coin and say,

Starting point is 00:20:59 I'm going to try something. So that's basically 50% chance in terms of you're going to be right or wrong. So we started working on the second part of that to say, you know, can we now optimize this? So we essentially launched later on an optimization as a service as part of Firebase, which is still there, I think, in production is, you know, you tell your objective, you want to increase engagement, spend, or get them to renew the subscription, and we'll figure out what are the right experiences that you should deliver them. And you tell us what kind of experiences you can deliver, A, B, C, D, and then we'll find out the right user bucket. So that was what we did with the AI and machine learning. And I think the whole thing around democratizing

Starting point is 00:21:47 AI, it has become now commonplace, but those were the days where Google pioneered some of those things. Yeah. So, Viren, about a year and a half ago, we were again chatting and we were talking about microservices of the scale of Netflix and what it means like to have all these and why they are needed and why we need like software like Conductor like to do that. What has changed since then? Because the reason I'm asking is because like I think one of the, and I'd love to hear your opinion, personal opinion, like on that too, and experience. But I think one of the things that

Starting point is 00:22:30 happens with people that, you know, they, they work in these very unique environments, right? Like Netflix or Google that have unique problems in terms of scale, but also like unique, as you said, resources and unique talent, right? Like the talent that you find in these companies like is rare. But where you go out in the market and you build a company and you try like to bring,

Starting point is 00:22:59 let's say all these innovations from these companies like to the rest of the market, you start experiencing like differences, right? Like the rest of the world is not the replication of like Google and Netflix, right? So my question is about what you have experienced through this one and a half year, like working with the market out there

Starting point is 00:23:22 and what the difference is between like an organization like Netflix and Google and the rest of the market out there and what the difference is between like an organization like Netflix and Google and the rest of the market and what has changed that's the second question about like the product or the technology from purely talking about microservices orchestration to what orchestras can do today right And what's the link between the two? Because when you build companies, obviously you react to the market signal, right? That's why I'm trying to bring these two questions together. Absolutely.

Starting point is 00:23:55 So I think the first question is a great one. And that was a very insightful thing as a founder also to kind of learn. You know, my history has been like an I work at Netflix, Google, Goldman Sachs, so, you know, very tech forward companies. And you at netflix google government sag so you know very tech forward companies and you get to work with a group of talent right and when you build a product for companies like this you know you have certain user persona in mind right these are my developers

Starting point is 00:24:15 this is how they work and then you try to bring that market right so one kind of thing was that like you know when you look at the developer side there is what i would like to call this 99 developer right and the one percent i think when you think about tech companies those tend to be more one person developer they like hard problems they like to solve big challenges and then think about distributed systems and everything when you look at the rest of the world let's say if i if you go to let's say general electric right ge or some traditional company the focus there is i have this feature to be built i have this product to be launched and this is my timeline how can i get there fast right there's less thinking about that

Starting point is 00:24:54 also you know not everybody can pay google netflix level 7 and there's not that kind of talent available also in the market right which means you, you're also working with very different levels. You know, you have kind of very junior engineers, you have principal engineers, but, you know, there are less of them. So one thing that we quickly realized was that, you know, for a product to be successful, it's very important for the product to appeal

Starting point is 00:25:18 to that 99% developer group who is not interested in solving distributed systems, very difficult, hard NP, hard problems, right? They're interested in solving distributed systems very difficult hard np hard problems like they're interested in solving their current problem which means usability is paramount we think about usability in like you know when you're building an app or a site right people a lot of times don't think about developer experience but you know developer experience is paramount i in my personal opinion i don't think we are a hundred percent there. I, we constantly keep on, you know, improving and try to kind of work with

Starting point is 00:25:47 very junior, sometimes, you know, freshmen developers to kind of figure out like, you know, where can we improve this? But that was the number one thing, right? That what matters to them is very different from what we kind of initially built. Right. And their skill sets and, you know, where you should focus is an, was an interesting insight. Um, yeah, that makes total sense.

Starting point is 00:26:07 And I think it resonates a lot with also my experience. And actually, I would add to that, it's not just a matter of quality of talent. It's also that when you're talking about General Electric, the core competence of the company is not distributed systems. They don't care about that. They shouldn't care about that. That's not their thing.

Starting point is 00:26:26 The same thing with Bank of America. All the companies out there, they are obviously Fortune 100 type of companies because they do something really well. That's right. For sure, not how they build data centers and distributed systems. Exactly.

Starting point is 00:26:43 For example, Bank of America would probably want their last industry systems right exactly yeah exactly like you know for example like you know you know Bank of America would probably want to spend their efforts and energy

Starting point is 00:26:49 on making sure that banking is a first-class thing right it's the rock solid it's the best in class

Starting point is 00:26:53 not how do I solve the industry but like that's not their core competency there's no

Starting point is 00:26:57 point investing in that right it doesn't make any sense from the

Starting point is 00:26:59 business perspective companies like Netflix they're all about tech tech is what

Starting point is 00:27:04 drives those companies so yeah 100% so okay perspective. Companies like Google and Netflix, they're all about tech. Tech is what drives those companies. So it's a different perspective. Yeah, yeah, 100%. So, okay, you mentioned developer experience and a different, let's say, prioritization in terms of what features are more important or what value is perceived by the user, right? Like, or what is valuable for the user? What has changed in terms of like, let's say the use cases? Because we started and we were talking back then again, a year and a half ago about orchestrating microservices, but what are the use cases today, right? The dominant ones.

Starting point is 00:27:42 I think it has definitely evolved and in some of the surprising ways as well. So, you know, microservices where one thing I think today, when I look at even the current set of use cases, predominant are more event driven kind of orchestration, you know, service workers, you know, microservices also kind of got a little bit of kind of negative press recently, right? With you know, the blobs and everything. Not necessarily everything is kind of, everything is in the right

Starting point is 00:28:12 context, right? But like, you know, you sometimes don't need microservice as in like, you know, an HTTP or gRPC endpoint for every problem and every solution. Service workers are actually much more lightweight and probably better in terms of infra and speed of development and deployment and everything, right? So that's

Starting point is 00:28:29 one area where I've seen like a lot more kind of usage of Conductor and how people are using it. They think less about, and also like, you know, instead of saying that like, you know, every deployment is one microservice, you know, sometimes your deployment with, which almost looks like a monolith, but then you have different components talking to each other asynchronously, and therefore it is still not this monolith, but rather kind of an event-driven system.

Starting point is 00:28:54 So that's one area where we have seen. Another surprising set of use cases that I've seen is, how do you build user experiences? And this has come to me as a complete surprise is, there are times where you want to drive your user experience based on various different parameters and you want to make it dynamic. And traditionally there is one way to do that is that you encode the entire logic

Starting point is 00:29:14 in your UI application or your mobile app. Mobile app actually where one of the trailblazers in that area, because you know, UI is very straightforward. You can deploy a new version and everybody gets it. Mobile apps, people go download. So developers are already kind of doing that by using things like remote config or launch directly where they were driving user experiences based on the feature flags on the server side. But now I have started to see those things happening with Conductor as well, where you

Starting point is 00:29:39 have a UI flow designed in Conductor and then UI is driven based on that flow. And then a product manager is changing the flow based on the experiences they want to drive and things like that. That was a very surprising use case. But I think what I think about it makes a lot of sense. Yeah. Okay. So that's super interesting.

Starting point is 00:30:01 But who is the user here? We talked about like product people, we talked about developers. Obviously, we're talking about application development here, but application development is a complex thing, right? It involves a lot of different products in there, different types of developers, from front-end developers to even DBAs at the end, like managing their own cases. So who is the user who gets, let's say the most value, like from a system like Orkish?

Starting point is 00:30:30 I think it's the software engineer, right? Like the developer working on the backend or frontend. I think in the end, this is the persona that we kind of build a product for. Everybody else gets benefit out of it, but that's not intentional. It's more of a by-product. But in the end, you end, it helps the developer. If I'm using something like Conductor, now I don't have to think about

Starting point is 00:30:52 handling error cases and resiliency parameters and all of those things. I can just think about building stateless screens and orchestrate that separately. In the end, the developer, their life becomes much easier more than anything else.

Starting point is 00:31:07 Do you see more of the front-end developers being, let's say, the owners of that, or is the back-end developer? And how do they work together, right? Because that's always like I see their faces between developers are always a very interesting topic and

Starting point is 00:31:23 a hard problem to solve in general. I agree. I don't think we have seen a solution either. Today, predominantly, our developers are mostly backend engineers, the way I see, and people that we interact with. Frontend apps,

Starting point is 00:31:40 again, it's still a very small percentage. It seems to be growing, but I would say right now, it's mostly backend very small percentage. It seems to be growing, but I would say right now it's mostly backend engineers. How do they work together? I think systems like Conductor, typically what I've seen backend developers doing is that, they built out the API using Conductor and mock out the data and then frontend can keep working on it. And then they slowly implement stuff that starts to give you real data as opposed to

Starting point is 00:32:03 mock data. We have at least a couple of customers that have adopted that kind of implement stuff that starts to give you real data as opposed to mock data. We have at least a couple of customers that have kind of adopted that kind of strategy and have been kind of pretty good at that. Yeah, well, that makes all sense. Okay, there is like

Starting point is 00:32:17 there is a new wave of, let's say, transactional systems out there, right? Like there is this attempt, especially after, I would say, Heroku went out of market, to, in a way, build on the legacy of Heroku. Because Heroku, I think at the end, was just too early in the market.

Starting point is 00:32:40 But they had some amazing ideas there. I think the legacy of Heroku will live and will drive a lot of innovation now that the market is more mature for this kind of product. So we see a lot of conversations about these new types of backend systems that are kind of a fusion between like a database

Starting point is 00:33:07 system together with an orchestration system and okay like in some cases like some other stuff too but i'll focus on this too because i think like the main conversation is like how you mix workflows together with transaction boundaries there? And what does this mean in terms of managing infrastructure and building applications? So what do you think about that? And what's your opinion? And what do you see is going to happen at the end? I think that's... I mean, yeah, you're right.

Starting point is 00:33:41 The kind of void that Heroku left, there has been some attempt of fill that gap. And I think conceptually, if you think about, right, like mixing databases with workflows is an interesting concept. Like I think back in the day, we used to have stored procedures, which will do kind of almost the same thing. And it worked well when like, you know, you had all of your data and, you know,

Starting point is 00:34:04 business logic inside one database. What has changed now with this new kind of databases and systems is instead of a stdproc in PL-SQL, you are writing JavaScript code to achieve the same thing, and you're working with more like a NoSQL database, like a document-oriented database. So that's an interesting concept i think and in some ways kind of firebase also did similar stuff with you know combination of firebase database and triggers that you know executed firebase functions to you know do kind of a lot more stuff right and there is kind of definitely a need for a group of developers you know i would say a lot of app developers for example who needs

Starting point is 00:34:43 back-end because you know they are very good at building mobile apps, for example, or building the front-end experiences. But to drive the data and business process, they need some backend. And either you kind of have another team that is focusing on backend development, which may not be possible

Starting point is 00:34:59 if you are kind of a singular developer or a small team of developers focusing on building a game, right? Focus on game experience, then building a backend. So this is where I, I see there is kind of a value where like, you know, your processes are relatively simple. You, when you want to insert a record, uh, you want to run some small process or a business logic that drives a bit of a workflow and you have a single

Starting point is 00:35:20 source of truth for the data. So I think there is definitely a place for it. One, my experience is that Firebase also told me that, like, you know, that quickly, it becomes very good for prototyping stuff. Yeah. But almost 100% of our customers did that, you know, they did use Firebase. And the moment they got more production ready, they moved out into more MongoDB or something like that, or Cassandra, for example. And then they kind of invested into more proper serverless

Starting point is 00:35:52 or container-based systems and whatever not. So I think that's how I see those systems that are very good for getting your prototype up and running without having to have a full-fledged backend. Yeah. getting your prototype up and running without having to have a full-fledged backend. My question is not from personal experience, to be honest, but more of my overall experience as an engineer. The value of having, let's say, an external orchestrator, and again, my experience comes more from

Starting point is 00:36:24 the data infrastructure, where things tend to run much longer. So the possibility of something breaking there is like higher, right? So having like an external orchestrator is that you have like in case of a fault happening, you have a different system that can take control. If your database fails, let's say, then your orchestrator can execute logic about how to manage the failure. But when you put the processes together, things get a little bit more weird there, right? So is there, and that's like a purely, let's say like engineering question. And I am asking you like as like, okay, like one of like

Starting point is 00:37:10 the most experienced engineers that I know with like these architectures, is there like a way that like we can guarantee that if we put together, let's say like a transactional database with an orchestration system,

Starting point is 00:37:22 that when something will go wrong with the database for whatever reason, the other process that is responsible for the orchestration is going to remain fault tolerant and do what it's supposed to be doing. Yeah. And I mean, that's exactly the purpose of having an orchestrator, right? And especially the next generation of orchestration, like Conductor, for example, which are basically not a single point of failure. They are more distributed, right?

Starting point is 00:37:48 So like, you know, gives you much higher availability. And, you know, you are also kind of de-risking and decoupling yourself from a single database. And it does kind of help you two ways, right?

Starting point is 00:37:56 One is, if your database goes down, you can operate on a cache data and apply circuit breakers and things like that. So, you know, depending upon how, what kind of user experience you want to drive, you know, that becomes kind of real thing, right? You kind of avoid at the same time also, like, you know, you can also do things like,

Starting point is 00:38:15 especially in a read-only scenario, you can also do things like hedging, you know, send requests to multiple databases and make sure that like, you know that you are able to serve it. And also, now suddenly it opens up an opportunity for separating out your local transactions with a global kind of transaction. If you think about it, let's take an example of a payment processing system. When I want to transact and do kind of, let's say I want to send an ACH through the FedBuyers system, I want that system to be a transactional. So, you know, that particular service could have a local database that maintains the transaction and operates in a global thing. But globally, you know, you could have other systems also, like, you know, sending out an email, which is like, I get two emails, you know, do anybody care? As long as I get two emails, I get an email, right?

Starting point is 00:39:06 At least once. So now I suddenly, you know, there's an, there is also, you can, there's an opportunity to kind of optimize and like, you know, make systems a lot more decoupled. And Orchestrator essentially maintains your state, right? Overall. And then you can change it however you wish to, depending upon your, how your process change. So not only it gives you you resiliency but also gives you flexibility in terms of you know how we can you know do things yeah 100 that makes sense cool so let's move a little bit like to ai because i think if anything about like ai is what happened is that somehow everyone is like becoming some kind of

Starting point is 00:39:46 trying to figure out like a new type of orchestrator there because at the end what we have here is that we have like a system that it's not reliable by definition so what we used to do as the exception with like distributed systems for example, where we had provision for when something goes wrong with the orchestrator, now it's pretty much the opposite. Every time you get a response, I mean, on a semantic level, it's not like an API error, but you need to ensure that at the end you get what you are looking for, which is in a way, it's like managing faults at the end, right? Like it's not that different, like from an engineering and like design perspective, right? So what I've noticed is that like, okay, like systems like a land chain, like

Starting point is 00:40:36 well, all that stuff that we see out there at the end, they're like specialized, like orchestration systems. What's going on with that? How many different flavors of orchestrators will have? Because it starts becoming a really hard thing to talk about that stuff. It's almost funny. If you talk with a distributed person and you say orchestrator, something almost completely different compared to what orchestrator means for the data engineer.

Starting point is 00:41:07 And if we talk about someone who builds applications with LLMs, again, something completely different. So it's very interesting what's going on there with the definition. So tell me how you think about that and what you see actually happening out there. So yeah, I think LLMs are interesting, right? Because suddenly, you know, because LLMs are inherently kind of asynchronous, very latency, high latency systems, right? And you know, it does this one thing. By just executing a prompt, you can't build a system, you need to also chain other things,

Starting point is 00:41:43 right? And that's where kind of, you the, suddenly the orchestrations and workflow engines kind of became a lot more important to when you want to build applications that leverage is LLM. There's one more problem with LLM, which is when you think about building an application using a traditional way. It's almost pretty guessing or traditional, right. Is through APIs and all they are deterministic.

Starting point is 00:42:07 If I send a simple query, I know what I'm expecting. LLMs are non-deterministic. So you also have to handle non-deterministic aspect. And the moment you put non-determinism in your application flow, now you have other things to worry about, which is compliance and security. And I've seen this now. Companies want to use LLMs, but they are very worried about the aspects of compliance, security, and reputation damage if something goes wrong. So now you also have to put some guardrails on top of it.

Starting point is 00:42:34 And the guardrails can come in the way of leveraging another LLM to do some sort of adversarial validation of the output. And for very highly sensitive systems, maybe also humans, right? Who can actually review and validate whether this makes sense or not. But all of this requires orchestration. It requires you to build flows, which are kind of very flexible, can change. Um, and if you want to kind of run this in an environment, you also need a distributed system because, you know, now, you know, everything could be running differently, right?

Starting point is 00:43:10 Your LLM is running on OpenAI or Azure or Google, whatever, right? And your systems are running somewhere else. And then came an interesting mix with vector databases, where suddenly now, you know, you have this retrieval augmentation generation where now you also look up a vector DB with a namespace. How do you protect that? Again, the same set of problems. So I think as you say, right, like suddenly, you know, now a different

Starting point is 00:43:31 class of orchestrators are coming up focusing on just purely prompt chaining. But prompt chaining alone doesn't get you an application deployed. You also need to make an API call and look up a database and process and, you know, add humans and everything. Right. So that's where I think long-term everything is going to converge into, need to make an API call and look up a database and process and add humans and everything. So that's where I think long-term everything is going to converge into probably one or two orchestrator which can do all of these things and do it very well at a scale.

Starting point is 00:43:53 Yeah, makes total sense. Eric, I have a feeling you might have more AI related questions, so I want to give you time to do that. Well, I think we're fairly close to the buzzer here, but Viren, one of the questions I have is, I think you have a very interesting perspective on AI in general. Obviously, there's a ton of hype. There are a lot of statistics thrown around, a lot of clickbait article titles that you've built. You've built products like this inside of companies like Google with vast, let's just call it unlimited access to data and

Starting point is 00:44:36 unlimited access to compute resources. How can people separate the hype from what is real? And I think that's, even for people who are very technical, it's easy for us to see the immediate sort of practical benefit of, you know, being able to draft a blog article or, you know, have, you know, even get support on a SQL query, the best way to optimize it, right?

Starting point is 00:45:07 I mean, that sort of very synchronous feedback on specific problems that would normally take a long time to research through traditional search methods. I think everyone's like, okay, this is great, right? I don't want to go back to the previous world. But when you talk about, okay, I want to use an LLM to predict the next best action for this user based on all these inputs. Well, even though you don't have to build the model that recommends the next best action, right, which is a huge step forward, that is still phenomenally difficult to get right.

Starting point is 00:45:47 And so there's this huge gap. How big is that gap? Or maybe I'm over-interpreting it. I mean, definitely there is a gap in my opinion. And definitely the gap also is shortening because there's a lot of investment that is happening both in terms of research, people, money, infrastructure that is going into it. But the way I see it is, if you look at the current state of the world, there's a lot of investment happening onto the foundational models. Like started with open AI, but now you have plethora of models. Some of them are completely open, like Lama 2.

Starting point is 00:46:24 So there is one aspect of that, which is essentially democratizing the foundational models. Like foundational models are going to become like Postgres. Everybody has access to it and everybody can use it. The question is how do you use it and what do you build out of it, right? So then the question comes is like, you know, okay, I have a very powerful model that can, you know, do a lot of interesting things, but how do I actually use it to solve my business use case? And I think that's where today the gap is you know to be able to say because as i say right like elements are still non-deterministic and they lack the consistency that you know we are used to

Starting point is 00:46:57 having in a kind of a normal system yeah so you know bridging that gap is kind of where i think there need there is a need to kind of do that so that as an enterprise, as a company, I can say, you know, I can safely incorporate this LLMs into my flows and kind of make use of it. And I think the way things are also probably going to go forward is it also is changing the way we interact with the systems right today you know everything has to have a ui and you know you have actions and buttons and everything a lot of times you can build a lot more chat based interfaces right so like assistants are probably going to become a lot more commonplace but that also brings in an interesting question in terms of how do you actually build those things right that's one area where we are also focusing on yeah and hopefully by the time this is published, we'll have something

Starting point is 00:47:48 really exciting there. Very cool. Very cool. So do you think that the companies that are going to thrive in this environment are the ones that help fill that gap? I think so. I think so. I think so. Because there is going to be like, you know, I think it's almost like a supply chain, right? Like at the bottom of the supply chain is the hardware manufacturers and then you have models and end of that is the kind of end user building an application. But in between there is nobody right now. Yeah.

Starting point is 00:48:18 That's where a lot of investment is going to happen. And I think that's where the big opportunity is. Yeah, for sure. Because I think that there is this interesting dynamic right now of essentially wrappers on an LLM that's just a UI. And I mean, there will certainly be some companies that make progress there because there are specific needs. But I also think there's going to be a ton of companies that fail because the open AIs

Starting point is 00:48:44 of the world are just going to productize all of that, right? And just completely take that business. I mean, Google kind of is notorious for doing that sort of thing. So I'm interested to know, when you think about at the enterprise level, and especially as you think about Orcus, we're talking about sort of incorporating an LLM

Starting point is 00:49:04 into a much larger system. But there's also software providers. So let's just, you know, we're, you know, earlier talking about sending emails, you know, and things like that. There are also a lot of software providers who are packaging an LLM and then essentially, you know, sort of building functionality on top of that within their own software, and then, you know, sort of building functionality on top of that within their own software and then, you know, sort of reselling that packaged functionality. Yep. Where, what are the limitations there? How do you think about, you know, incorporating an LLM

Starting point is 00:49:35 in a bespoke way, you know, at the enterprise level versus, you know, sort of buying it as part of a packaged software suite? Yeah, I think the former is more about like, you know, sort of buying it as part of a package software suite. Yeah, I think the former is more about like, you know, the, how do you enhance a product using LLM, right? I want to send out an email. And if you notice, Gmail had that feature for a long time, but it could auto complete your sentences. Yeah. So, you know, that kind of gives you the personal productivity and like, you know, you enhance your product using LLM. Copilot is a good example, right?

Starting point is 00:50:08 I don't have to necessarily push for the same amount of code. I can just autocomplete everything. IntelliJ has been doing that for a while. So that's one area where, you know, there is kind of the application of LLMs into very bespoke products. Enterprise applications are very custom. Like, you know, they are rapidly changing. They always change. Like, you know, custom like you know they are rapidly changing

Starting point is 00:50:26 they always change like you know usually you know you build an application the kind of the life cycle of an application or a lifespan of an app is probably two years or max two and a half you have to rewrite that thing again and then you need basically tooling to be able to kind of leverage an LLM to kind of

Starting point is 00:50:42 you know redo those things some stuff or put it in a different way. So then I think the other class of application is where you leverage an LLM to build those bespoke experiences, but you are using internally or selling it to another customer,

Starting point is 00:50:58 but that's the other aspect of it. And to me, that's a lot more interesting because instead of a very fixed problem, these problems are very dynamic. They are changing and they are very different from company to company. Yeah. That's super interesting. Okay. Well, last question, which this is always a fun one. Is there anything that worries you about all of this new technology around AI? I don't think so.

Starting point is 00:51:26 Like, wasn't the same thing when people were talking about when the computers came that computers are going to like, you know, put everybody in more jobs? I mean, it's good in more jobs, right? The same thing happens, right? I think there is, I don't have the specialization in like, you know, how people think about things, but I think it's natural, right? Like we are always a little bit wary of something is going to come in, you know, take over jobs and everything. I think developers are getting more productive because Like we are always a little bit wary of something is going to come and, you know, take over jobs and everything.

Starting point is 00:51:45 I think developers are getting more productive because I don't have to always go to Stack Overflow search for it. My ID can complete for me. I can do more stuff. I can probably do a lot more and I can probably have a lot more quality time for myself. Businesses can move faster. Maybe they can do more stuff.

Starting point is 00:52:00 So I have more work. So in the end, I think everybody's going to benefit. Yeah, I agree. I agree. Well, Viren, this has been such a wonderful time on the show. do more stuff so i have more work so in the end i think everybody's going to benefit yeah i agree i agree yeah well viren this has been such a wonderful time on the show what a year and a half it's been congrats on all the progress congrats on orcas congrats on you know the fork of conductor and the foundation around that just so impressed with everything you're doing and we'll keep cheering you on from the sidelines. Yeah, absolutely.

Starting point is 00:52:27 Thank you for having me again one more time. Yeah, absolutely. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com.

Starting point is 00:52:49 The show is brought to you by Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.

The Data Stack Show - 176: The Fundamentals of Event-Driven Orchestration and How Generative AI Is Shaping Its Future with Viren Baraiya of orkes.io

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.