The Data Stack Show - 213: How AI Can Bring Advanced Data Outcomes to More Businesses featuring Taylor Murphy of Arch

Starting point is 00:00:00 Hi, I'm Eric Dotts. And I'm John Wessel. Welcome to the Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. We're here back on the Data Sack Show with Taylor Murphy, the CEO of Arch. And Taylor, you have been on the show before, actually,

Starting point is 00:00:40 almost two years ago, as crazy as it sounds. So welcome back. Glad to have you back again. Yeah, thanks for having me. All right. Well, give us, so almost two years, give us the quick flyby of what happened. So last time we talked, you were working on Meltano and Arch is something new. So what happened? Absolutely. Yeah. So since that time, I've actually taken over as CEO of the company and kind of, we've expanded the vision of what Meltano and now Arch Data can do. We rebranded the company to Arch to signal, hey, we're this larger platform and kind

Starting point is 00:01:10 of the foundation of your business success. Arch today is an AI data analyst for business leaders. So we help bring the outcomes of good data teams and what good data teams can provide to more organizations. And we're doing this with an AI-powered all-in-one platform that comes back with human experts in the loop. So Taylor, before the show, I had a really fun time talking about the AI agent model and the different philosophies behind that that people have. You guys have a really unique philosophy on that, so I'm excited to dig on that. And then what else are you excited about digging in on? Yeah, I'm absolutely excited to talk about that.

Starting point is 00:01:49 I think more broadly, too, the reason AI is so compelling at this point in time is we now have a good understanding of what excellent modern data practices are from the explosion of all these different tools. And we also understand that what businesses need and what moves the needle in a business are always going to be based on the metrics and how you measure your business and what the processes are under that. And when you combine a good modern data practice with a smart, intelligent AI and a platform that can help combine those two, I think it's something really cool and magical. Awesome. Well, let's dig in. Yeah, let's do it.

Starting point is 00:02:25 Taylor, so great to have you back. After almost two years, pretty crazy that we can say that now on the show, you know, to have a repeat guest. I want to, man, there are so many things we can talk about, but out of the gate, just give us the overview of Arch. You rebranded Naltano. We can talk about that journey a little bit. But I would just love to tantalize our listeners with a description of Arch,

Starting point is 00:02:50 but then we're going to talk about ETL first and make them wait a little bit to build anticipation. There we go. I like that strategy. Yeah. I love it. Yeah. Thanks for having me on the show. Excited to be back here and talking with you guys. Yeah. So with Arch, we are really trying to push the boundaries of how data analysts, data professionals, and AI can collaborate to bring what I believe to be the outcomes of excellent data teams to more organizations. We've been on this journey with Meltano in finding product market fit, and on this journey from just kind of pure EL to a larger end-to-end platform. And so where we are today with Arch is working with organizations to really up-level their entire data practice with our all-in-one platform, and then also scaling the efforts of existing

Starting point is 00:03:35 analysts or with us as the analytics team, if they don't have anybody on staff, and our AI capabilities to, again, just bring these really good outcomes. Because I'm a data person at heart, and I know when a well-run data team is functioning for an organization, what they can do for that business. And I just want to bring that to more and more organizations. So yeah, so we're focusing on being your AI data analyst with a full platform and team behind it. Love it. So many questions there. But as I promised, we're actually going to talk about ETL or EL. And I've been dying to ask you this question since we got this new recording on the books. But you've been on a journey with Meltano, starting within GitLab, spinning it out, and building product and selling in the ETL, EL space.

Starting point is 00:04:33 And so you probably have a really unique perspective on that space in general. And I think we were talking before this show a little bit. It's easy to think of that as a solved problem just because there's multiple big vendors, tons of people write their own custom pipelines. It's just so ubiquitous as a pipeline in any analytics workflow. It's just, oh yeah, is that true based on your perspective? It is and isn't true.

Starting point is 00:05:01 Like you mentioned, we've been on this long journey to figure out product market fit for Arch and Meltano. The early days and when Meltano was spun out of GitLab was kind of towards the peak of the zero interest rate phenomenon. Lots of VC money, lots of funding, and we were certainly a benefactor of that environment. What we've seen and the organizations that we talked to, there are larger organizations where moving data is a core

Starting point is 00:05:26 piece of what a lot of large data teams are doing. And for them, there's always going to be new challenges in terms of scale, data variety, complexity. And I'd say for them, that is just the hard part of software and data, and they're going to be continuing to evolve there. For a different segment of the market, and it's one that we've been pulled to generally, is folks who, for the most part, data movement is kind of solved from more common applications, whether it's like Stripe or Salesforce or HubSpot. That basically is a solved problem. The meta conversation around that, though, is, and what I've kind of come to believe and realize, is for the vast majority of products, and when you're talking, you're trying to sell to folks, they just don't care. I'll just be honest.

Starting point is 00:06:13 They just don't care about the underlying data movement. They just want it to work, and they want it to know that you can get your data from point A to point B and give you the value on top of that. And I think back in the pure modern data stack era, as all these tools were growing, there was this renewed focus on genuinely better ways to move data. And we saw some great innovation there, including what we can do with Meltano, and people still love Meltano and use it today and reach for it. But the reality in terms of the business and what it cares about,

Starting point is 00:06:42 what's visible to executives, and what is driving the needle on business outcomes, data movement is an essential part of that. And you do need to get it right. But it is so below the line in terms of what's visible to anybody that has control of the purse strings. And so we've kind of evolved our strategy as a business to move to something that is much more visible throughout the organization while still caring very deeply

Starting point is 00:07:05 about how to solve these problems. And that's why Meltano, we didn't rebrand the Meltano project. It's still an open source project that people know and love. It is now built by the team at Arch and it currently powers the data movement within the Arch platform.

Starting point is 00:07:19 So I have a lot of painful memories from early ETL, 10 plus years ago. Start with that. This is going to be a good counseling. This is going to help you work through some of those things behind you, John. This is probably 10 years ago, mainly Microsoft ETL tools specifically. And a lot of them, I just think it's interesting because one of the Microsoft stack from, say, 10 years ago was pretty ubiquitous. Almost every company had that.

Starting point is 00:07:46 I mean, there was, you know, IBM had some options and there's a few other options. But that was one of the most popular tool sets. And there was never a data movement tool. I don't think you could buy one separately. I think it was always packaged with a storage layer of reporting slash analytics layer and then the data movement. So they were packaged together. We unbundled them in the last five, seven years, however many years it's been.

Starting point is 00:08:15 And it's interesting to watch a lot of them pretty much get re-bundled with two of those three or maybe all three of them. So I just think it's just an interesting observation. Yeah. The bundling, unbundling journey is very common those three or maybe all three of them. So I just think it's just an interesting observation from. Yeah. Yeah. The bundling, unbundling journey is, I think, it's very common and it's a natural part of the cycle. And depending on where you sit, I'm sure if you go talk to George at Fivetran, he'll have his own opinions on this as well. I think the reality is when we talk to businesses, people are more cost conscious nowadays and that pushes you towards all-in-one

Starting point is 00:08:45 solutions. I think what's different about this time, though, is the ubiquity of open source availability, and that kind of solves for that long tail. And that's why we're huge fans of open source. Interestingly, though, now, the new world of AI, and I don't know if you're ready to transition to that, but these AI models now can actually help you take advantage of all this unstructured data. And that's kind of a burgeoning interest of, you know, you throw a PDF at it or, you know, can literally scan something, fax it somewhere and then get that data structured.

Starting point is 00:09:13 That is super interesting and something we're addressing as well with our platform. Yeah, yeah. Yeah, one more comment. I mean, so much to talk about on the AI side, but one comment and then one more question on the ETL side. I'll never forget, actually, very early when I joined RutterSack and we had a customer who was just running a bunch of

Starting point is 00:09:38 really basic event stream pipelines, right? We're just getting data from an SDK into a data store, right? And they just getting data from, you know, an SDK into a data store, right? And they just needed to move a bunch of data. And so I get on with this customer, asked him, could you just tell me sort of what, you know, they had given us a bunch of good feedback. Hey, we love this. And so I'd gotten on a call with them just to, you know, connect with the customer and learn more. It's like, well, what do you love about Rudderstack? And they're like, I don't ever think about it. Like, right, which is a little bit jarring, but it really like when you said, people don't care. Like they do care, but not in the way that, you know, you're not like a product that you're in the interface every single day, because you're,

Starting point is 00:10:16 you know, doing a bunch of, you know, it's like a huge part of your job, which is interesting. So that really resonates with me. But one other thing you mentioned was around, you know, sort of EL becoming part of a platform. Do you think we'll increasingly see that, right? Because Fivetran is sort of the big player in the room, right? Like as far as just being their own thing. But in some ways, they're kind of building out or purchasing additional functionalities to actually be more platform-esque as opposed to just a pipeline. Yeah, I think everyone is going to be compressed in interesting ways. And I think this still ties to the AI story of if you believe in what people are saying about the future of AI generally, it basically is going to compress the cost of intelligence and then also I think the cost of compute generally and the value of software. And we're seeing with, you know, you can move petabytes of data using just open source tooling.

Starting point is 00:11:14 You can write it to an open data format. You maybe pay for some like, you know, egress costs or storage costs on S3, but you can do things that you could never do before for so inexpensively. And these players that are, you that are charging rents on different things that basically are going to come down in cost overall, it's going to be a tough spot. And so you have to figure out how to compete and be more valuable to organizations.

Starting point is 00:11:38 And that's part of the same journey that we've been on. When I look at an organization now, what is most valuable to people? And it's usually what is most visible and what is helping them achieve their outcomes. And, you know, when you think about, to use a rough analogy, it's like you go to a car dealership and you don't care really what fuel it uses or even how the fuel gets there. You just want to know, okay, do I have enough energy to actually, can I go buy fuel for this thing? You care about the car and the features and the value that it's bringing to you. And so, you know, EL is some component of a car that people care about, only in that

Starting point is 00:12:12 they can kind of check a box there. But the people who really care about it are figuring out ways to make it more efficient and all that fun stuff. So, yeah, I think we're entering kind of a whole new world and where you sit in the market kind of depends on your maturity as an organization. Yeah, I think another thing, speaking of the cost, I had this really interesting conversation with a non-technical CEO in the last week or two.

Starting point is 00:12:35 And we were just talking about compute costs and it's budgeting season, right? So everybody's trying to get their IT budget. And I think it's interesting where there's basically three major cloud public providers, right? That almost all of this tech runs on. So there's this like leveling of cost. And I was explaining to him, like, there are a lot of reasons to spend more than this, but as a baseline, your storage is like 15 cents a gigabyte that was either azure s3 like one and

Starting point is 00:13:07 there's different tiers right sometimes it's lower and then your compute is around two dollars an hour right like that's kind of a rough starting point and when you see that you're paying multiples above that you could be paying for redundancy you could be paying for you know more advanced use cases or maybe you have thousands of users and but just having a starting point i think for him was like oh so like you know i'm looking at a million or several million dollar it budget yeah like what like how do we start breaking this apart but like well how much data are we storing like well it's only a few terabytes of data okay Okay. Like, if that's the case, then like maybe you're overpaying for storage,

Starting point is 00:13:48 but maybe you're overpaying for compute. And because you're using an arbitrary system that has, you know, has you kind of locked in or maybe it's this or that or other. But that's kind of that like first principles approach that I think has helped people think through. Like there's this commodity layer so we can kind of reverse into how much this could cost.

Starting point is 00:14:06 Yeah, and I love, you know, there's been arguments from some of the Mother Duck folks and the DuckTB folks around how powerful computers are nowadays. Like, I have an M1 Mac, looking at some of the M4s that are going to come out, and the processing power there is insane.

Starting point is 00:14:21 And so, you start thinking about that, like, oh, we have an organization of 1,000 people. We buy them all Macs. That's a huge amount of compute that's just sitting there. It's idle 98% of the time. That's running Chrome. Exactly, yeah. And Electron apps.

Starting point is 00:14:37 And Chrome takes every ounce of it. But it does just make you rethink from like, it is that first principle thinking of, this stuff doesn't have to be expected. Like the raw actions that we're taking to move data, store it, transform it and ask questions of it are incredibly cheap nowadays. And so when you're paying for a vendor, you just have to consider what are you actually paying for? And, you know, there's a lot of things there to pay for, Collaboration aspects, security model, SLAs, things like that. And not discounting those, and those are important,

Starting point is 00:15:12 but you also have to look at the raw numbers. Does it make sense? Has it ever made sense to pay for monthly active rows or something? Not to bash on Fivetran too much, but it just puts downward pressure on it in a good way. And so I feel like we're in a good position for me and for Arch as a company to rethink things from those first principles and oh, if all this is basically free, then what

Starting point is 00:15:31 is the value of a data platform, of a data team, and how do we position ourselves and go to market in that way? Yeah, I love it. Okay. Our listeners have been waiting, so we're going to talk about AI. I love how much time we left for AI.

Starting point is 00:15:47 I know. I think we'll get like 50 minutes in and we won't even bring it up. I'm really proud of myself because we could keep talking about our philosophy on data industry and ETL and EL for a long time. Okay, John, I'm going to give you the first word here because I usually jump right in. But you had some really great questions about this and we were chatting before the show. So Arch and AI. Let's, yeah, let's start with kind of that worldview because we talked about this and you had such a good and interesting take on this. So I've seen two or three models in the, like in this AI agent model say, Hey, and everybody, or a lot of people have seen like what chat gpt has done what you can do with like a data you use it kind of as a data analyst but there's this worldview

Starting point is 00:16:29 of one like i'll call it like the zero shot worldview where people just want to like where they want to like do text to sequel they just want to like throw something in the magic you know box and then get exactly what they want out there's that one one. And there's the other one from raw data. We're just going to raw data and get something magic out. That one, I don't think has gone well for very many people. The other one is we want to very carefully craft and model the data and put metrics layers and give tons of context so the AI can make better decisions.

Starting point is 00:17:01 So I've seen some success with that model. But you, it sounds like, not that you're not kind of pulling best stuff from all the models, but it sounds like you guys have kind of a third view on this. So I'd love to hear more about that. Yeah, so, yeah, lots of dive in on this topic. So generally bullish on AI

Starting point is 00:17:19 and the opportunity it's going to bring, I think I tend to be more on the optimistic side with some of that. I'll be honest, though, like a lot of the newer models, it's each time they get in better, it does generate that anxiety, just like, oh, are they coming for our jobs? But, you know, any automation probably will have a net benefit at some point in society. So how we're thinking about AI is that it can be in the most like optimistic sense of what it can do, it can bring the outcomes of good data teams to more organizations. I see it as this force multiplier, but not in a way where it's just like chat GBT on steroids, where you're constantly having to shove things in the context window.

Starting point is 00:17:57 If you want an AI analyst, and that's how we're pitching ourselves, to actually do well, it's going to have to do a lot of the same things that a human analyst would do. It's going to have to ask follow-up questions. It's going to have to get context from the business. And it's still going to have to do the fundamental aspects of what a data professional has to do. Somebody's going to have to move the data to a central place or, you know, make it available for querying. There needs to be that cleaning step on top of it. And then there needs to be the structuring, the mapping of real business processes to what the data is saying in that. One of the phrases I like that I've heard recently is that data is the shadow of a business process. And so part of modeling the data is, you know, revealing what the form is of the shape that is casting that shadow. And AI still is going to have to do all those things. It can just start

Starting point is 00:18:43 to do it much quicker. And when you can do things quicker and for more cheaply, it opens the world of who has access to these things. And that's been our strategy currently is that by focusing on the AI analyst aspect that has access to a full data platform that it can build its own ETL, write DBT code for transformations, and then also has the assistance and collaboration of human experts in the loop, then that kind of democratizes, you know, I kind of hate to use that word, but it opens the door to more people to benefiting from this great data intelligence. Yeah. So you touched on this, and I think it's really interesting, because I think it's completely overlooked by a lot of people is a lot of the analyst role is doing like collaborative or like even data collection stuff. So I'm even imagining like right now, like most of these analysts, like you're just talking with it in the data.

Starting point is 00:19:36 Right. But if you had, you know, one of these agents that can go ask somebody a question like from the business like hey does this look right hey i need your sales target number for you know for october i think that's a really interesting thing here where the analyst like you get you it's easy to get so focused on the well the analyst job is analyze the data like well sure but a lot of it is there's a data collection component or is that human in the loop component of like asking the right people the right questions to either validate data or, a lot of times, collect data. Yep.

Starting point is 00:20:09 Yeah. I'll jump in there and say the pushing of data experts back out into the different aspects of the business, I think, is one trend that we're going to see. For large organizations, we'll have still a centralized center of excellence for data and analytics, but what matters at the end of the day are the outcomes of the business, you know,

Starting point is 00:20:30 are you growing revenue? Are you increasing leads and prospects and decreasing costs? That's where data teams need to be focused. And I think for a while, you know, we get so wrapped up in the tooling and how are we structuring data teams and, you know, data as a product and all this stuff. And it started to get a little disconnected from how businesses actually work, what are the levers to control your growth as expressed by the different metrics. And so we fundamentally see our approach and our strategy when we're talking to prospects is like, hey, if you don't have a data professional, we'll be that professional. You'll have access to the AI and then you have kind of a human in the loop there,

Starting point is 00:21:05 even though all the conversations are still going to be kind of funneled through this AI chat interface. But if you already have someone on staff and one of our customers that we talk about a lot on the website, they have an analyst on staff already and we are 10x-ing his output.

Starting point is 00:21:20 This is for a private equity firm. They have 20 plus portfolio companies and he's essentially the data team for 21 companies, the VP of sales, to just ask questions of like, hey, can you give me these latest numbers? I actually think with AI, we'll see an increase in the number of ad hoc analyst requests. And if you

Starting point is 00:21:55 have good model data, go ham. Ask all the questions you want and then the analyst behind the scenes can be kind of optimizing the system. So I don't know. I get excited about that because it does, I feel like for a while as data professionals, we fought against the tide of what people just really want. And what people want is to ask questions.

Starting point is 00:22:12 They want to get data. They want sometimes the data to make them feel good. And data teams are like, well, you got to push back and say, well, what do you want to measure this, do with this data? What's your decision going to be? And it's like, sometimes people just want to look at a chart and see it go up. That's okay. AI can make that much easier. One question I'd love to dig into there,

Starting point is 00:22:32 I think there are probably listeners, and I'm even thinking about an occasional guest we have on the show named the cynical data guy, who their gut response is, man, it's so, it's hearing you say, okay, the CEO can ask the AI a question and sort of get numbers or whatever. They're like, oh my gosh, that's so dangerous. That's like really scary. And I think, you know, I think that, I think part of that is because, and I mean, well, John, I mean, you were, you know, you've been on both

Starting point is 00:23:04 sides of that and you have too, Taylor. So keep me honest. But I think part of that is because, and I mean, well, John, I mean, you were, you know, you've been on both sides of that and you have to Taylor. So keep me honest. But I think part of that is there's sort of this like contextual, you almost, you need to wrap the numbers in context, right? Because they can be misinterpreted or, you know, it's like, okay, well, if you don't have some level of healthy control over the narrative, it can actually create more work for you, right? Because it's like, what does this mean or whatever? How do you think about that challenge at Arch? Yeah, I think where I've been coming down on this and where we kind of structure our thesis and how we talk to prospects is like, all of the questions you're going to want to ask an analyst or an AI analyst need to be anchored

Starting point is 00:23:46 in the metrics of the business. And I keep coming back to that because it is the forcing function to help solve for some of these problems. When you have a CEO asking, our sales are down for last month. What's happened? That is an opportunity to have a back and forth with the CEO. And an AI can do that just as well, I think, to say like, great, are you asking about, you know, metrics X, Y, and Z? Are you looking at churn? Are you looking at, you know, leads dropping down on the marketing funnel? That you can have this kind of like back and forth conversation.

Starting point is 00:24:18 And then actually have the AI say, hey, we're working on measuring this, or we don't have this metric. And so the explainability of why this top level number went down doesn't exist yet. This is kind of tied in with, and it's a little too nerdy on the side, but statistical process control. And to anchor it more concretely for folks, Amazon did this with like a weekly business review where they have hundreds of different metrics. And they're just looking at variation across all of these things. And so that's our thesis where it's like, we help you build the model of the business based on the metrics that are measured and expressed in these different processes.

Starting point is 00:24:52 And then that's what the AI can really do. I'm with the cynical data guy, though. I think there's a lot that you can be cynical about, but the AI does give a new optimistic lens on this stuff in a way that we haven't seen before that scales non-linearly outside of human labor. Part of me also is really interested in what you can do with the questions themselves.

Starting point is 00:25:20 There's a lot of debate over self-serve analytics. Is that even a thing? Is it even possible? Democratizing data and all these things, right? But, you know, no matter your opinions on that, actually, if a bunch of people in the organization started asking a bunch of questions, and you could actually see the lineage of those conversations and even have the, right, it's an AI bot, right? And so one of its core strengths will be actually summarizing all of the questions that it's getting, right? I mean, that in itself, I think as an analyst would be hugely helpful, right? Because that's actually pretty hard to scale, even though you may be able to like, you know, understand nuance that an AI bot might not, because you're human and you went to the CEO's pool party at Christmas and had conversations that the AI wasn't privy to.

Starting point is 00:26:17 Even still, trying to collect intelligence around the questions that are being asked is really difficult within an organization. Yeah. And that's, it's interesting because that is the promise of some of these like enterprise knowledge search tools where it is like, you know, connect your Google Docs, connect your Notion, and it wants to be this centralized brain for task automation and asking questions. But I do think all of these tools are disconnected from the metrics, the fundamental measurement of these different processes. And to your point, when someone asks a question, that question exists within the context of the rest of the business.

Starting point is 00:26:52 And so the AI has infinite patience with this stuff and can search a lot of information and appropriately summarize like, hey, actually, your VP of sales asked this similar question. Here's the answer that we gave them. It's been updated since then. You know, here's the, and what I want to like kind of build into our product in particular is how do you really trust what the AI is saying? And I think for that to be true, it needs to cite its sources, show its work, and be connected to the underlying systems.

Starting point is 00:27:17 Not just, you know, not just like ChatGPT where it gives you an answer and that's the end of it. You're going to want these things to be auditable. And just, you know, good AI systems that are plugged in can do this at a very large scale. And again, that infinite patience point I think is important because having been the sole data professional for a while at a large organization, you're inundated with requests. And if you can have support in that in a way that doesn't require you to tire

Starting point is 00:27:43 people every single time, I think that's huge. It's a huge unlock. Yeah. One thing you mentioned a minute ago with Amazon and the weekly metrics reviews, I think that makes a ton of sense in their context. How would you approach that in more of a mid-market? Because two things I see with mid-market companies, one, a lot of them don't really have metrics, honestly, other than like financial metrics.

Starting point is 00:28:12 They typically like understand at least roughly like how much money they're making and some like high-level, you know, maybe like customer, some of them like more sophisticated like customer lifetime value, but some of them it's just financials, is their metrics. So there's that component of it. And then two, they do not have the luxury of a lot of times statistically significant data, because at the Amazon scale, like you can quickly get to enough data to like really point,

Starting point is 00:28:39 you know, directions if you're A B testing, or if you're doing whatever. So like, so say somebody wants to implement a full data stack for one of these mid-market companies that maybe has just the financial metrics. Like what would you say to them? And then has the sparse data problem as well. Yeah, so that's a fun challenge, I think. And I purchased two ways.

Starting point is 00:29:00 One is if I was literally stepping into this organization, what would I actually do? You know, you'd have to make some assumptions about the size of the business. My approach is two ways. One is if I was literally stepping into this organization, what would I actually do? You'd have to make some assumptions about the size of the business. I think the particular kind of weekly business review, statistical process control, is definitely for companies that are kind of post-product market fit and have some scale and set of processes going. But the whole idea with this is trying to get an understanding of the causal model of your business.

Starting point is 00:29:23 When I do X, I expect Y to happen. When I increase ad spend, I expect leads to come in and revenue to jump up appropriately. And so what a good analyst would do, if you got hired as a head of data or CDO or whatever, you would have conversations with the executives and the business units for sales of like, what do you think is happening in this business? Where are you confused? And what tools are you using? What are you trying to measure? And it is part of a larger conversation to figure out and hypothesize what's going on. So some of it just falls back to what do you think is happening? Cool. Is there a way we can measure

Starting point is 00:29:57 that? Awesome. Let's go measure that and then track it for a few weeks and see if it has any correlation with that. And if we think something else is missing, it's, you know, another conversation to figure out the metric. One thing that Cedric from Common Cog talks about, and I reference his stuff in other places, is the data problem is actually less challenging than the social problem within the organization. It is getting business leaders and business unit leaders to think in this way, to think about what are the causal levers of their particular unit, whether it's marketing or sales. How do you measure that? The actual fundamentals of measuring a metric sometimes can actually be quite easy. And then it's the, do they have low enough ego to look at these numbers and actually respond in a way that moves the business forward?

Starting point is 00:30:43 So it comes back to the classic, this is a human problem. There's that meme of all my data problems underneath are human problems. And that's kind of the reality. And so that's why I think there's a world where we can have 100% AI data analysts, but there's still going to need to be some human in the loop there, somebody you can talk to, to help drive that kind of narrative forward. So I don't know if that answers your question. Yeah, I think so. And I think this got me thinking, too, there's this other component, too, where there's no reason the AI can't help with the metric creation process.

Starting point is 00:31:19 And, you know, at least from an ML standpoint, like, ML's pretty good at, like, helping with sparse data problems. In my past, I had this problem around pricing. And that's a really good application if you have sparse data is using machine learning to fill in gaps so you can get to better recommendations for pricing or for a lot of things, really. So I think maybe it's use the tools for those things and that data

Starting point is 00:31:46 gets, you know, fed back into the tool. So there can be like kind of, I think there can be a positive there. One question I have on the metrics, I'd love just to know from a product standpoint, you know, what does that look like in terms of the product experience? And then what does that look like under the hood? Because, I mean, it makes complete sense, right? If you anchor this in metrics, that actually anchors the conversation, which means that, you know, the AI isn't, you know, you create boundaries within which the AI is going to have a conversation with the CEO, which, you know, sort of solves part of that problem.

Starting point is 00:32:20 But how do you define that? If I'm an Arch customer, how do I define those metrics? Do those eventually materialize in the DBT semantic layer? How does that start for me as a user defining the metric and then walk us down into the stack and tell us what's going on under the hood? Yeah, absolutely. So the early days of the conversation are about when onboarding is figuring out what is the current state of the business and what do you want to understand and what do you want to measure.

Starting point is 00:32:50 Within the platform, there are ways to document the metrics themselves. And that is literally just a list of here's the metrics, here's the owner, here's what the definition means in plain text. And those are going to be connected to the semantic model behind the scenes. We do use dpt core under the hood for the transformations. We're using cube.dev, the open source version for the semantic layer, that's right, the actual semantic layer itself. And then, you know, on the AI conversational front, like that stuff is put into the context of the chat interface. And we're also working closely with the with our design partners on this to help them understand like where this stuff starts to fall over and where they're not seeing it.

Starting point is 00:33:27 When you ask a general AI, help me come up with metrics for this part of the business, it can come up with some pretty good things. But then there's also just unique nuances to every single business that aren't going to be captured by these general foundational models. And that's where you need to have that sociological component of having someone to talk to to figure this stuff out. I always like to point back, like for real examples of how people can think about this too,

Starting point is 00:33:52 like I go back to GitLab, you know, you go to the GitLab handbook and they list all of their KPIs or at least the top KPIs for each department. And for me, that is still kind of the gold standard of here's what we're measuring,

Starting point is 00:34:02 here's the definition, and here's the link to kind of the history of that data. So another aspect of this kind of metrics layer discussion that I think is interesting is there are several methods like we've already talked about where you've got like the agent method versus like a human in the loop. Tell us more about with human in the loop, and I'm specifically interested in like long running processes, something we talked about before, where if you and I were interacting, I would be, and I'm the customer, you're the analyst,

Starting point is 00:34:35 and I'm talking about something, and it would be very normal for you to say, hey, like I'll get back to you. And then a couple hours later, you'd get back to me with some kind of information. But if it's some kind of technology, like that would maybe feel weird, right? Where it's like, well, shouldn't you just be able to get that for me instantly? So walk me through that, like just from maybe a user experience.

Starting point is 00:34:54 Yeah. So this is something that we are actively working with design partners on. And I'm really excited to try and dive in and solve it for them and more broadly, because you're right, it's not something that we've seen in the larger market. Like, you know, I think the state of the art for the most part is like open AIs, O1, where it shows kind of that full chain of thought process. It says, you know, here's what I'm doing. Here's what I'm thinking about. And that is kind of a longer running process. I think you've seen examples of people, you know, taking several minutes for it to answer the question. But then you hear folks like Sam Altman saying like, oh, you're going to have agents running for a month at a time. But he doesn't talk about what is that experience like for the end user.

Starting point is 00:35:30 To your point, am I just staring at a spinning wheel for a month? Is this something going on? So when we're taking a step back and thinking about this, there needs to be some way. I mean, it comes back to the classic data challenge of tracking state. What is the state of this overall process? The way we're thinking about solving it initially is using kind of what has been learned over the past decades of like task management and, you know, managing humans, literally like a compound style flow of different tasks. So our, what we're working on with our AI is it'll say, okay, great.

Starting point is 00:36:06 Here's the process that needs to happen. We need to connect this data. We're going to bring it in and transform it. Here's the metrics that you've defined and here's the report that you've asked for. I've created these set of tasks that are going to happen within the system. And you give just an upfront estimate of like,

Starting point is 00:36:22 we expect this will take several hours for each of these different things. And then whoever requests the information can go off and do something else and say, we'll notify you when your results are ready. And that can be via text, ping on Teams, email, whatever you want. And then it just kind of goes into a classic mode of, you know, an AI agent can be churning on something and building code, or it can be a human in the loop actually doing some of the work, or it can be a specific task for that person to do. If they need to go click a button to authenticate Google or Salesforce to actually allow the data to move, that's a task for them to go do. So I think we're just seeing this convergence between

Starting point is 00:36:59 how humans interact, where it can be over email, it can take time, and then how we expect computers to act, which is usually instantaneous, and trying to bridge that gap in between. There's an interesting example with Repl.it where, I don't know if you paid for their new agent thing, where you just ask it for a program and then it has a pretty long running process, but it is more a chain of thought where it's like, here's what we're doing now, creating the project plan, getting your approval, and then it's showing all the work that it's actually doing. And I think that's still interesting, but

Starting point is 00:37:29 you're expected to sit there for five minutes while it churns on this stuff. And I think we're going to have a world where you can fire off a bunch of different requests and then go do something else and then just be notified when it's done. I don't think there's anything going to be too fancy for it than what a normal human would do.

Starting point is 00:37:46 Just like, hey, yeah, I'm tracking this. I've put it in Notion or I've put it in Trello or whatever and I'll keep the status updated and we would expect computers to do the same thing. Yeah, I mean, that's a fascinating workflow. Like you can imagine, like you said,

Starting point is 00:37:59 like a Trello board or a linear board or Jira or whatever it is. And like two or three things here like one could you assign like tickets or tasks to an ai agent and listen then they come back and update the ticket like potentially yeah or like eventually they could interact with each other and you could actually imagine like a board of like this is an in progress and then some kind of ai agent comments on the ticket and says like this step has been done, like moves it to another status for review. The human reviews it and then it, you know, maybe goes off into another flow of getting assigned to another AI agent or human or whatever.

Starting point is 00:38:36 Like, you know, that would be fascinating. And it's like taking that data team workflow from a project management tool and then integrating AI into that. Well, that's exactly how I would think of it. I've worked remotely since 2018. And in the most, I guess, cynical worldview, for some people, you're only interacting with them via Slack or via a task management system.

Starting point is 00:39:05 And so as long as the outputs are something that I can consume and collaborate on, it doesn't matter if it's a human or if it's a really good AI agent. For certain businesses, they're just not going to care one way or the other. And our approach right now is to have these longer running tasks that an AI can churn on or a human can actually do. And then we have that kind of human in the loop check of like, hey, here's the PRs that I was going to make for the code changes. Or then we can stand up and say, yep, that's good. Or no, actually go back and fix that so that we can earn the trust to automate more and more of this for these organizations.

Starting point is 00:39:38 Yeah. And that's interesting too, right? Because depending on how far that went, it basically elevates all humans to being some sort of a manager, a project manager or the manager of, you know, AI agents and humans, some combined team. Like it's, yeah. Yeah, I mean, I see that already. Like our head of growth and marketing manages,

Starting point is 00:40:02 in one frame, he's managing 10 different AI agents with these different tools. And he's way more productive than he would have been, you know, five years ago. So I think that's happening. And that's going to happen for data professionals. Like, I just feel incredibly confident about that. But what is the UX? What are the affordances? And like, that's what we're trying to figure out, like, right now. Yeah. Interesting. Well, one question to bring the conversation full circle close to the end here. So you've learned an extreme amount about loading Interesting. just from the Meltano experience that happened downstream of Meltano loading. In terms of loading this structured data, you're pulling Salesforce or HubSpot or whatever in, you have developed some level of expertise around,

Starting point is 00:40:58 okay, here's how these jobs run, here's how the data is structured, here are the transformations generally that are run, not that each business doesn't have their own specific adjustments to the transformations that they want to run. How critical is that for building the platform that you're building? Because one of the big challenges is you have this really long tail of

Starting point is 00:41:17 everyone's data is a little bit weird and schema changes and all that sort of stuff, but you actually have a huge running start from Meltano. Do you view that as a critical advantage? Yeah, so I do. And where it manifests is in a lot of the sales conversations where part of sales, and that's been one of the big parts of my journey,

Starting point is 00:41:41 is moving into a much more forward sales role and having conversations and trying to ask people for money. That's where the rubber meets the road. But one of the objections parts of my journey is moving into a much more forward sales role and having conversations and trying to ask people for money. That's where the rubber meets the road. But one of the objections is like, oh, well, can you get my data? And with Meltano, we're able to say, absolutely. If we don't have a connector for it, we can build one incredibly quick. And to the earlier conversation, it's like, it checks that box for them, but we do it in a way that is authentic and has deep credibility because we've done this for years. We have the open source platform. And now the new frontier for folks

Starting point is 00:42:11 as well in the AI angle is saying, hey, can you get my data in all this unstructured mess that I have? And we're bringing those capabilities on to Arch and how it will relate in the Meltano space. But we can absolutely do that. And then there's just the opportunity to bring even more with kind of the state-of-the-art and AI models today. Yeah, that makes total sense. In the conversations that you've had, I mean, I know you're still early

Starting point is 00:42:35 on the unstructured data side, but what are the main types of data that you're, you know, in the conversations you're having that people want to include as part of Arch as a data platform? I think we're still trying to figure out a lot of that because for the businesses that we talk to, you know, they get excited by the, oh, we can get your data from anywhere. But then the real meat, the real wheat of the data is in the systems that they kind of already have, like the systems of record, or if they're using a CRM. And so it captures people's imagination of like, oh, yeah, all this data that's lying around,

Starting point is 00:43:10 whether it's actually, you know, useful for not in this kind of metrics view of the world is very dependent on the individual organizations. You know, we're talking to some folks that they have old paper processes that they're working to digitize. But for them, that's also part of a larger conversation and moving to different tooling and things like that. So we keep the conversation focused on like, what is happening in your business? How can you measure that? And then how can we help you find the levers

Starting point is 00:43:33 to pull with this AI analyst? But it does open the door to getting them to think more deeply about this. Because part of the journey in sales too is like, how much do you have to educate your customers, your prospects on what's possible, where they can move things. But some of the stuff is just super cool.

Starting point is 00:43:49 You upload a PDF, you get structured data out of it, you can chat with that data, and then you can put it in process for defining some of these metrics and the downstream analytics. Yeah, we've been one example just from what we've done at RedRestack. And John and I actually worked on this project together, which is kind of fun, but we've done a lot of transcript analysis. Yeah. And so if you think about, I mean, you can do mining individually, which is super helpful, right? Like, okay, as a product manager, I want to go into all these customer calls and I want to ask a bunch of questions, which is great.

Starting point is 00:44:20 But you can also, for example, like prep for a QBR by standardizing the questions that you ask of transcripts, right? From a bunch of calls, right? So, I mean, whatever it is, sentiment or, I mean, there's all sorts of things that you can truly standardize, right? I mean, ironically, of course, like a lot of those as part of the data platform, because, you know, you can do a lot more than just, you know, materialize it as a custom field. But that's stuff like that is just. I'm excited to see what you guys build with that, because it's something that would take weeks of just brutal work. And it's like, wow, you literally can just do it like and it's so good. You know, it's like, wow, you literally can just do it. And it's so good. It's so good. My first job out of grad school was, we were scraping a lot of websites and I wrote a ton of regular expressions to parse data from websites, from PDFs. And that would just be automated today. You dump it in there, you get it, it's accurate, check it, and you're off to the races. It's amazing what you can do today. Yeah, yeah. I'm also fascinated by paper. Like we had a guest on, this has been several

Starting point is 00:45:25 months now, talking about an internship where he spent months and months like looking through old records prior to whatever year and creating a manual database of, I think it was economic events. Yes. Right. Yeah. Because looking through paper, I mean, like that's a, it is feasible now, right? To base it, you know, to scan those in and actually get information, you know. I mean, like that's a, it is feasible now, right? To base it, you know, to scan those in and actually get information, you know. I think like one, I would love to work

Starting point is 00:45:49 with organizations that have a ton of paper because I think it'd actually be pretty easy. You set them up with a fax machine, just fax us a ton of stuff. It's then digitized

Starting point is 00:45:57 and then you can pull a ton of stuff from there. And they all still have fax machines for sure. So that's part of it. Especially if you're in healthcare. Yeah. Yeah, healthcare for sure.

Starting point is 00:46:04 That is like the absolute best story of like okay what's an amazing use for ai right facts to ai yeah facts to ai yes absolutely that's great awesome taylor well we're at the buzzer here as we like to say this has been an incredible conversation super excited about what you're building at Arch and what a journey. So keep us posted and we'll have you back on. Absolutely. Looking forward to it. The Data Stack Show is brought to you by Rudderstack, the warehouse-native customer data platform.

Starting point is 00:46:36 Rudderstack is purpose-built to help data teams turn customer data into competitive advantage. Learn more at ruddersack.com

Your Ad Here

The Data Stack Show - 213: How AI Can Bring Advanced Data Outcomes to More Businesses featuring Taylor Murphy of Arch

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.