The Data Stack Show - 260: Return of the Dodds: APIs, Automation, and the Art of Data Team Survival

Starting point is 00:00:00 Hi, I'm Eric Dots. And I'm John Wessel. Welcome to The Datastack Show. The Datastack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Before we dig into today's data. episode, we want to give a huge thanks to our presenting sponsor, RudderSack. They give us the equipment and time to do this show week in, week out, and provide you the valuable

Starting point is 00:00:38 content. RudderSack provides customer data infrastructure and is used by the world's most innovative companies to collect, transform, and deliver their event data wherever it's needed all in real time. You can learn more at RudderSack.com. Welcome back to the Datasack show. This is Eric Dodds reporting from the United States. I've been in Europe for the entire summer and because of the time zone offset missed a bunch of shows but I am here IRL with John and the cynical data guy we thought a little reunion episode would be great now that I'm back in the States so guys good to be back thank you for just granting your presence to us and you're welcome you know we were just lost for two months without you

Starting point is 00:01:27 here to just guide us through this wilderness. Sometimes to understand the value of something and needs to be taken away for a time. We're glad to have you back, Eric. It's good to be back. It's good to be back. Okay, I was in on a couple episodes, but, okay, I have an AI question

Starting point is 00:01:45 that I want to lead the show off with, but what did you guys talk about? Like, what were the main things that, what were the main topics that emerged over the summer? I don't know I remember anything after we stopped talking about it. I said, John, did you start? You just, your brain, vacuuming.

Starting point is 00:02:01 You just, you're here, you do it, you walk out the door. It's like severance, you know. Yeah, it's like severance, yeah. Shout out to severance. A lot about data and AI and a lot of AI. A lot of data and AI, a lot of AI. That's the summary. I don't know, we've got a lot of good topics for today

Starting point is 00:02:24 that I think are at least tangentially. related. But one of the ones I'm excited about is, well, you know what? This actually perfectly relates. We talked about AI and guardrails. That came up several times. And we're going to talk more about that today, I think. Yeah. We also, we did some LinkedIn stuff as we always do. And then I think we got desperate and started messing with the format just to make it different for ourselves. Yeah, a little bit of commiserating, right? We spent some time doing that. Okay. All right. So good summer. Yeah. Good summer. Okay. Well, here. Here's something that I've been thinking about.

Starting point is 00:02:59 It's, of course, data and AI. But one thing that I think we're seeing emerge now, and this hopefully transcends a lot of the marketing promises and sort of the projections about where things will be, which, of course, the models are going to get better. But generally we know there are areas where LLMs create dramatic and unprecedented, improvement in a process, a task, you name it. And then other areas where, okay, maybe there's a long

Starting point is 00:03:34 way to go, right? And so a good example would be math, right? That's sort of one of the notorious things is, you know, LLM's about math and, you know, that whole side of things. And so when it comes to data, one thing I'm interested in, you know, especially to ask both of you because you will bring the mindset of someone who's, you know, run a department, run an entire, like, technical organization in a company. Where is the jurisdiction of AI and where is your personal comfort level in terms of incorporating it into processes that generally were almost exclusively deterministic, right?

Starting point is 00:04:16 You know, let's say with the exclusion of machine learning, where you're intentionally introducing like a probabilistic practice and this is layered and so there are a number of things of this question but one thing that's interesting is like if LLMs are not good at math

Starting point is 00:04:33 they are getting better and better at writing code code can generate math and so you can sort of back into it that way but you're also piling in a lot of you know sort of process lineage there that can be difficult to trace down So when you think about what you were delivering, you know, owning a data organization and where you would be comfortable with layering non-deterministic elements into that,

Starting point is 00:05:05 how do you think about the jurisdiction there? I'll go first. This is actually, this is going to get really specific really quick. I think one of the things that we were talking about before this show was the power of these models, with tool calls, right? MCP, servers, tool calls, whatever you want to call it. And the, which is interesting,

Starting point is 00:05:26 and we were talking about some of the limits and frustrations around context length and stuff before the show. But there's another component here that I think people failed to realize for sure in the broader business context and maybe even somewhat in the data team,

Starting point is 00:05:41 which is typically more dealing with databases versus APIs. Because practically like the MCP layer, tool call layer is just on top of an API. right and it's how bad most APIs are they're bad they're bad they like in like two or three like specific ways one if you want a bunch of data from an API that's usually a problem yeah yeah number one number two if you want to send a bunch of data yeah yeah number two if you want to send a bunch of data number three if you want web hooks at work all the time every time like that's actually a problem

Starting point is 00:06:12 too there's like three three major problems that the mcp or tool calling in front using AI you can do a ton of things and a ton of cool workflows. But then there's a practical of like, but most of these APIs are not very good. And this doesn't fix that problem. Yep. And now you've got two layers. One, the AI like hallucinates or screw something up.

Starting point is 00:06:33 But two, the more practical layer of like timeouts, issues. Sure. Like whatever various like, you know, errors from the API itself. Yep. So I don't know that answers your question, but I think that's like the first practical thing of like, I got to get past that to like, fully answer that question, if that makes sense.

Starting point is 00:06:51 Because like the, what I hear you saying, if I had to summarize it, would be, I need to actually put some use cases into production in order to start to answer that question, but it's really hard to do that. Right. With APIs that are not. Because there's a bunch of really neat,

Starting point is 00:07:11 let me just give one example, like Shopify has an MCP server now. If you're an e-commerce company, like, neat, let's hook that up, let's do all this cool analysis. this would be great. And Shopify actually has a pretty decent API, so maybe this is a bad example. But then it's like, okay, how do I,

Starting point is 00:07:25 how do I really just want to give somebody like Claude in this thing? And then like they get just right before they answer and then Cloud runs out of like Conno, you have to start over. Like, is that what I want to give to people? Even if the Shopify MCP is great because their API layer is great,

Starting point is 00:07:44 you're introducing another set of dependencies that actually, interestingly enough is blurry isn't the right word, but like it's hard for Shopify to control that, depending on the user experience they want. And I said it was pretty good. I didn't say it was great. And it's, okay. Anyone from Shopify listening, please come on the show. It really is good. And because of the MCP stuff, they have all these like extra scale problems, extra demand on the API that like. Totally. But then the question is, what do you hand off to the user?

Starting point is 00:08:17 Right. And so like, are they going to handle it for you? or do you sort of bring your, you know, it's like, that's tricky. Well, it's also, I think what you're talking about is one of those, that it's not like you can, well, we'll POC this and then if that goes well, because it's this kind of, it's not that it's going to go wrong or something's going to, you're going to have issues like the first time you use it, it's that it's going to be, it's good, it's good, wait, it's broke, wait, it's not, it's returning a bad call or something like that.

Starting point is 00:08:42 And those are much harder. I mean, that's like an overall data problem, I feel like in a lot of these, is that scaling and that kind of like long-term usage of it. Yep. The idea of like, I can make it work

Starting point is 00:08:52 in a condensed time frame with a limited amount of data or like a limited number of uses. But once you start putting it out, we don't really do this like gradual thing. We don't go 10, 20, 30. We tend to go like 1, 5, 10, a thousand.

Starting point is 00:09:11 Yeah. Well, that clause this problem. That's where you find a lot of these problems. That's the perfect. illustration because what I'm finding and it's such an interesting trend like in data is we have like I don't know when you would say the spreadsheet moment is but like you know years ago when spreadsheets like became ubiquitous there's a moment there and then we've got this arc in data for like 10 plus years of like big data like Hadoop and snowflake and things like that and then when you introduce the AI stuff it's like that does not work well with big data this is not like context or super limited like computer It's super expensive. But when we've been on this big data trend and people are like, well, why doesn't it work?

Starting point is 00:09:51 Like, I've got millions of records here. And I want to do like this. And you're like, it doesn't work. I just, I want to take a moment to pause here. And maybe this is just because I've been away for a little bit. But it really struck me that you were just like, you know, Hadoop and Snowflake. You just lumped in the ring.

Starting point is 00:10:11 Basically the same thing. Lumped them into this. Not at all. It's all this is. Snowflake. Yeah. Death Flake team, I apologize. No, by a humblest

Starting point is 00:10:20 apology. Like 10, 15 years of history. Of course, you know, we know that people at Snowflake and they're great and hopefully they get a chuckle out of that. But I think in the best way possible, that's kind of a good thing in that they are like a massive

Starting point is 00:10:33 data, you know, enterprise data cloud, right? Which is, so maybe that's a sign of their success. Innecnotal examples of like the big data movement, which we wouldn't call it big data probably anymore. But the point being that like we're trending more and more data

Starting point is 00:10:48 more and more compute optimized and work on a lot of data and then like we throw an AI and it's like this is not compute optimized or work on a lot of data right but would you

Starting point is 00:10:58 so the interesting thing is let's think about Hadoop and Snowflake right the things that you're talking about around API reliability and scale and all those sorts of things like what's interesting is if you think about

Starting point is 00:11:13 the Hadoop and Snowflake ecosystem you know, and if you, let's talk about Hadoop first, right, is that, yeah, you still face, like, similar issues. It's not like anything you mentioned as new is a new problem, right? But in the Hadoop era, you have all the knobs at your disposal, right? And so you can actually, like, deep, you know, you can sort of do whatever you want with the, you know, on top of the core system, right? I mean, you know, you have the, you have, like, large providers, you know, Hortonworks and others that are like productionizing, a lot of that and sort of making things easier, right? Then you go to the Snowflake Area and the promise of the

Starting point is 00:11:48 crowd, the cloud is that they're doing a lot of that for you, but then they also give you APIs and other things, but then they have these, you know, they are, they're actively solving those problems, right? And so it's not a new problem, but sort of the, they are

Starting point is 00:12:04 contained, right? And I think that's what's interesting about what you're talking about is we're introducing problems from different sources, right? It's not like the system is contained and you can sort of address those problems like well well i'd say i think i can speak for most data teams most data teams are like yeah we have to interact with APIs like but we do it way upstream and we don't want to do any of our work directly or actor interaction we want to like

Starting point is 00:12:30 abstract that like put this here and the higher five trainer if we do it ourselves we want to still like highly abstract that out yep get it into our data cloud as fast as possible and then not mess with APIs but then if you're going to introduce like tool calling and TP and all that other stuff then you're like, oh, this is a lot more messy. Yep. Right. I don't know. What do you think, Matt? I think there's always, I think you got those good points there that have just like you're, it was already not clean in a lot of ways.

Starting point is 00:12:58 And so I think that's just kind of dealing with that. And, you know, I don't know, APIs have always been one that like you can see the power of APIs. But from a data standpoint, they're a pain a lot of the time. They just are. And so you're going to run into those problems. I mean, I think if we go back to the original question and kind of like, where do we see it? I think I looked at a slightly different from the way you did.

Starting point is 00:13:20 And then I was thinking of where would you put this, where would you put AI into kind of your, in like a data team? And it's like, my distinguishing at the moment is probably from like, is it like stakeholder facing or is it in the back? And I would not make it stakeholder facing for a data team. So I think that's, you're going to run into problems with that. And I think it's also like, it's going to sound a little weird, but like you're, beating into, like, bad assumptions if you start, because everyone's going to want to see it. Yep. But it's like, it's, if you're going to have it there and it's going to be like, you know,

Starting point is 00:13:54 oh, we could have it like present to the user or something like that. It's like, no, I don't really want to do that. I don't know if you need to be there and that part of the staff. I totally agree with you in principle because you are, you're introducing a huge amount of risk for the data team itself. Right. Right. which is problematic

Starting point is 00:14:13 but at the same time it almost seems inevitable which is a very strong word but there's so much AI being crammed into analytics tools where it's like well does the

Starting point is 00:14:27 stakeholder just have this expectation that they can ask an anonymous you know an anonymous you know Chad Botter agent a question and they get an answer you know which a lot of those are

Starting point is 00:14:40 self-contained user-facing, you know, analytics tools where they control the data model. I mean, right, there's a very large amount of control there, which is way harder to achieve, you know, when you're just producing analytics, right?

Starting point is 00:14:55 Right. Running sort of the whole end end and stack. Well, I think there's also that difference between, are we talking about your like a data team and it's for like a product that is going to go out into the wild customer and like most of the work that I did with teams, we

Starting point is 00:15:10 were pretty internal. Right. Your customers or other teams with the customers are like the marketing director or the product team or something like that. And so I think even there, you know, it kind of has that same pull to like, oh, well, we'll make BI self-service and that'll make our jobs easier. And it's like it doesn't really work. And I think when you try to do it, you end up turning the AI into kind of this little parrot, a very expensive parrot at the end of it. It's like you're doing all the work in the back anyways. and then it's just there to summarize what you've done nicely for them. But it's not going to be the one that's doing the analysis a lot of the time.

Starting point is 00:15:50 And it's not going to be the one that really is like, oh, it found the insight. Right. No, it's probably not going to. Okay, so I have a question for both of you because we've talked about self-service, you know, self-serving analytics before. A couple times. A couple times, how do you see, is AI going to be a big step forward in that? or not. Here's the funny thing

Starting point is 00:16:13 that came to mind. So I was fairly involved in DevOps stuff four or five years ago. And one of the things that is coming to mind that there's this there's the chat bot era.

Starting point is 00:16:27 Right. Did you ever use Hubot, Matt? No. So like the idea here is it's the same concept as a lot of these AI bots but it is but it's like deterministic.

Starting point is 00:16:39 Right. You could give the thing a command, and we'd do this all the time at a previous company like, hey, well, like, log me out of this thing. It was like a thing that like your session would get stuck and you'd like tell it to log you out or tell it

Starting point is 00:16:52 to like reset this thing or do this thing. So it was a little bot and it was determined with commands. So like that version, I mean, it's interesting. Like a lot of what people want out of the AI bots, like you could probably index 10 queries. Yeah. And do like a little

Starting point is 00:17:10 search match based off of the words and like get the query to run and like almost simulate like what the AI would be doing. The AI still provides value in the like, you know, in the parsing. But like that version, I think a lot of companies would actually really like and be impressed with just that, which therefore is actually imminently possible with AI. And you know, it's a little more sophisticated and more accurate than doing just like what I described. Yeah, I mean, I think it's kind of like one of those is it going to make a big step forward i don't know is everyone going to try to make it make a big step forward yeah i think that's you're all it's going to be this self-service is like it has this like sirens call to it in a lot of ways because in theory it should add a bunch of

Starting point is 00:18:00 value it should make life easier for everyone involved but reality never really quite gets there And so I think this is another one. It's the same type of thing as trying to go like text to SQL and stuff like that where like there's this idea that if we could just figure it out, it would think of how great this would be. But the practicalities of it are always kind of off in a little bit. And the fact that it doesn't really, you're still not dealing with a lot of times with the problem of like the zero to one step.

Starting point is 00:18:32 And if you're not getting that zero to one step there because they don't understand what the data is or it's like, you know, you're just giving them a blank screen. and you're like, ask, like, what do you want to know? Well, I don't know. You know, maybe if you're looking at a dashboard, they can get a little further along that, but it's still one of those that like, you know,

Starting point is 00:18:49 it's even for a human, like, analysts, this is hard sometimes. Well, and the two, my two, like, data roles that, like, where the teams were most successful, and both of them, we had standardized reports, and there were a few of them, like, less than 10, seven or seven eight people knew what they were and they knew what the columns meant yeah like if you can get there and like everybody on the same page then the a i stuff would be amazing because it's like finite like we know exactly what it is we know what everything means we can train the AI and what it means totally

Starting point is 00:19:22 super cool stuff with that i mean i had one of the most successful teams i worked on we had one query that did 80% of all the work sure and so onboarding wasn't as hard either because it was literally just this is your query. Yes, it's a very gnarly 14 joint query and all of that. But like, this is what you're working from. And we've already renamed everything. And we've already done all that work for you. And so you're just starting from there. And if you need to make changes around the edges, you can go look at it or you can go ask someone. And so you could do things a lot faster because of that. Right. And that's not a, I mean, there's some businesses where that it doesn't work. Like it is more complicated than that and it's valid. But even

Starting point is 00:20:04 in that scenario, like still getting like this is the five for this team or these are the five standards and we're always going to derive from the standard to like stuff like that is still the hard work. And then after you have that done, everybody agrees on what, you know, the definitions of things are like the AI stuff. Like it's, I don't say it's trivial. Like it's still like some effort and there's tuning and stuff, but it's possible. Right. I think that's where a lot of the marketing sort of outpaces the reality, which is not a new topic on this show. I'm talking about this for years.

Starting point is 00:20:41 But in particular, I think, because of the pace of change and because of how magical it can feel, you know, it's easier to believe that, right? But it's not a cure-all for the foundational stuff, especially when it comes to data, right? You actually have to get the underline. lying data model. Well, and I think especially

Starting point is 00:21:03 early on, there was a lot of thought of almost like we can sell this because in six months or eight months or 12 months, it'll be, and I don't really feel like we that's happened, you know? Like there's progress that's been made, but it has not caught up to what the claims

Starting point is 00:21:19 were, you know, 12 years previous to it. Well, yes. I mean, I think the I mean, you can make an argument for you know, the need to tell a story for the valuation of a what right say it diplomatically but there are i think and i think this is a tricky this is a tricky thing to navigate mainly for people who are trying to figure out

Starting point is 00:21:48 how to productionize this stuff right is that there's a lot of promise out there and in some specific areas the what is possible is extraordinary right i mean Think about translation. Yeah. Think about, you know, there are areas where it's like, okay, this is clearly such a fundamental leap forward that translation will never be done the same again. Right. And it's actually already having really outsized impact on, you know, the labor force, like within translation, right? Right.

Starting point is 00:22:23 Because it is that transformative. Yeah. But that is not evenly distributed, right? There are a number of areas where the technology is so dramatic, completely changing something, but that's not evenly distributed, right? And I think that's part of what makes that very difficult, right? Is that, okay, well, this could transform absolutely everything, every part of every domain, right? And that's just not true.

Starting point is 00:22:53 Well, and I do think, because we've been mainly focused on the consumer, like, customer-facing, whatever you want to call it, peace. There's some pretty transformational stuff, I think, on the back end, like you just mentioned translation, migrations between systems or, like, translations between languages, like, you know, like Python to whatever, or R to Python or SaaS to Python or whatever. Yep. Like, that, I think that's pretty transformation. I imagine there's like a number of companies that may decide to take on some efforts that they wouldn't have if that weren't available. Yep. So I do think it affects data people's workflow and people that are writing semantic layers and SQL and things like that for sure. And then the other one that like I don't hear a lot about that's possible now is using like using these tools inside databases like Snowflake for example and many of the others.

Starting point is 00:23:49 like there are these foundational models inside the database to like contrary to what I'm talking about with tool calling and all that other stuff like that we are seeing I'll say early implementations of the foundational tools in the database where it's like hey I want to like categorize all these products or something you can make a tool call or use clot or whatever to do that and I think there's some neat applications there with forecasting things like that but I think it's there I do think it makes an impact but it's a little bit more hidden and there's less like marketing push behind that stuff because we kind of already had a wave of that with ML and like yeah I don't know like yeah but maybe a little less app type I have an interesting

Starting point is 00:24:30 view on that actually okay because let's talk about AI like within a database okay there are tools out there that will you know they're like listeners you know and so you like put it on your database like sort of run an algorithm that says great there's something that seems like it change, you know, let's call it observability. That's been around, but let's sort of categorize that under ML that had the great fortune of good market timing

Starting point is 00:24:56 so that they can just market it as AI when it's all under the hood. Right, right. So great, good, that's awesome. Like, that is wonderful. Okay, that's, you know, that's been around. A chatbot inside of a database, like, is that useful? Okay, that's a hard question to answer

Starting point is 00:25:13 because there are probably some instances in which it makes certain things way easier, but it's not the interface through which you interact with the database. It's just not, right? And I think part of the challenge there is that what you actually want is something more akin to the observability tool that is like constantly curating the things that would be important or interesting or helpful to you, right? But the reason that isn't happening on a large scale and production is because it's way too expensive. It only makes sense to incur costs when

Starting point is 00:25:57 the user explicitly says, I'm submitting a prompt, right? But what you really want, you don't want a blank page. No. What you want is something that is presented to you that, you know, pre- curates like a bunch of different stuff, right? And I mean, I'm not saying that people are trying that or that doesn't exist. But that is generally not, in my view, is that's actually primarily a cost thing, right? I mean, the lost leadership mentality for most of these products is mind-boggling on the scale of billions, right? And so there's a huge amount of subsidization happening. Right. Right. With the expectation that the cost will come down every time. Exactly. Essentially. Exactly. Right. But because of that, like, because it's already somewhat upside down,

Starting point is 00:26:43 like pre-encuring a bunch of that cost before you know before you know exactly what the user wants or whatever like trying to do that is it's just a very expensive right and so it's like okay well let's just use a chat bot and the user we're going to incur cost

Starting point is 00:26:58 when the user says you know we're going to incur cost right well I mean there's like there's opportunities in places where because we always think of it a lot of times as you know real time with the user but there are opportunities where it's doing stuff in the background and it's like you submit something and you have to, you know, you wait and then it'll come back. But that cost part of

Starting point is 00:27:17 it does come into it. I mean, because the way you generally have to work with this stuff because of context windows is you're essentially batching it in a lot of ways. You're summarizing chunks of it, whether it's text or whatever, and then you're summarizing the summaries. And you keep doing that until you get it small enough to fit in one context window. But each one of those is incurring input token, output token costs. So you can look at it and say like, oh, well, my total tokens is, you know, whatever,

Starting point is 00:27:47 and that should cost me a dollar. Once you've gone through 16 iterations of this, now you're spending $30, $40 on this. And that's kind of where that all adds up. Because, you know, it's one thing of like, if you get it to a point where you're like, I don't care how long it takes to do certain stuff. Like I think that's something that, like,

Starting point is 00:28:06 legitimately could be useful with this is like, I don't care how long it takes as long as it's not people doing it right. Having a computer work for 20 hours on something not a big deal but it's going to be that token cost that you've got to be careful of.

Starting point is 00:28:23 This is tangentially related and this is the business idea legitimately like so if any of our listeners want to pursue this would be awesome. I already call it. All right, Matt calls it. I'm not playing a domain right now. That's practically a patent. So, but you're mentioning the token cost and like the things like running in the background.

Starting point is 00:28:43 This does exist. It's just upside down. It does. But like here's a really interesting one that I have not seen nearly enough products in this space is things with a GUI where you pay an admin to do things. Salesforce admin, for example, or like Marquito admin. I don't know. Like all these like complex enterprisey tools that have APIs. Like why do we not have like chat interface?

Starting point is 00:29:07 with some kind of like MCP thing to do that and there's some things out there but like that feels like hold on explain that more so there's a lot of companies that pay like a full-time Salesforce admin

Starting point is 00:29:18 oh right right right right like and it's because I don't know where the menu the stuff is in the menus like normal people could do it you just can't find it and AI seems to yeah or your

Starting point is 00:29:29 it's like so obscure and you're doing it and Salesforce admin is one that's like pretty complicated but there's others that like or are not hard to do it once you can find it, but you can't find how to do the thing.

Starting point is 00:29:42 So it's almost like a search problem. Yeah. Yeah, I don't know, though, because I will tell you my experience with that, okay? Is that, no, I think this will change, but the problem must, okay, Salesforce and Marquetto are outstanding examples, right? because their APIs

Starting point is 00:30:09 are horrible their user interface is a Frankenstein that's been changed over multiple decades many years and so of course that makes it more complicated right but they're very powerful tools

Starting point is 00:30:24 but those to me are they're not even close to the real issue the real issue is that there is business logic that has been represented within like the customization options within these tools that's very difficult to that's probably undocumented right right that is a mixture of three things like some sort of custom field

Starting point is 00:30:54 names some sort of workflow in whatever workflow builder right four things custom field names some sort of custom workflow in the workflow builder some sort of custom code in the tool so in the Salesforce, that would be Apex, and then some sort of integration, whether that's a custom integration or like a quote-unquote native like daisy chain integration, whatever, right?

Starting point is 00:31:16 And making changes to a system like that is not easy. Yeah, for sure. That's why you have these people who's admin rules. But I think the challenge is that, okay, so I fully believe

Starting point is 00:31:34 that if you gave an LLM all of the needed context it could certainly navigate navigate through that right but that's the big challenge is that all of those intricacies of the business process

Starting point is 00:31:49 the history of like oh well because an LLM would run into a situation to say and it would conclude this doesn't really make sense because three years ago someone made this change and like blah blah blah and it's like well you can't change that

Starting point is 00:32:02 right we've just accepted that this is debt within the system that we live with him to perpetuity. Well, yeah, I mean, that would be the interesting, right? Is if you had the full change log of everything that's ever happened, full context, like maybe you had to do post-training or something to really, like, cramp of everything in there. It seems like that would be an application that would be possible.

Starting point is 00:32:19 I totally agree, but the, but you would, I mean, and actually maybe the business idea is that we will come in and document everything as context. I mean, that's the funny. It's like context engineering is the actual, like. Yeah, that's the hard problem with this stuff. Yeah, yeah, yeah. I mean, because you just got me thinking, and it's like, well, what this is, what this is not?

Starting point is 00:32:38 It's like the amount of money that gets spent on companies that's whole purpose of existing is like, we just help you with implementing Salesforce or other products. If no one else, I would think they would be ones that are looking at. And that's probably the answer, right? It is like a lot of these implementation companies will figure out how to leverage AI for their teams to like to be effective. You know, it's a situation where enough tools are exposed to where. you can use an LLM to actually generate the documentation. Yeah.

Starting point is 00:33:10 That's going to be another summary of summaries, though. That's going to be your big thing. Yeah, totally, totally. Totally. But it is, the tribal knowledge is really hard to replace. It's really hard to replace. Yeah.

Starting point is 00:33:24 Well, well, this means, I mean, I could see some people thinking, like, okay, now is our chance to get off of pick your tool. Because we're going to throw an LLM at it to try to, like, figure it out. and pull it out so we can then go to this tool that actually does what we want it to do

Starting point is 00:33:39 and we can clean start it kind of without completely erasing everything building from scratch. Yeah, I mean... Someone will try it. I don't know if it'll work, but someone will try it. I think that's a totally legitimate pathway.

Starting point is 00:33:52 Just to say like, you know, because I think the challenge and we go back to like the promise versus the reality. The challenge is in a lot of those situations, especially in an enterprise, like you have to have someone who can make meaning of the tribal knowledge and like all the whatever, right? But if you just

Starting point is 00:34:11 say, great, we're just going to keep that as a baseline assumption and just start building stuff on top of it, then... Yeah, well, and that's the funny thing, right? Like, we're talking about all this, you know, progressions toward AGI or whatever, but there's a context in some of these to contradict my idea, honestly, there's a context where like, you could throw the smartest person on this that has everything around it and they would have no clue, no clue what to do. because they lack the like Yeah, the tribal knowledge. But I would say like

Starting point is 00:34:39 okay, this is something that I've thought about a lot and especially, you know, it was a slightly leading question in the beginning with saying, you know, what's the jurisdiction of LLMs within, you know, the data domain? But the person

Starting point is 00:34:56 who was already predisposed to be, to like use the advanced technology to be like a great Salesforce admin. Like AI is going to be like crack cocaine for their productivity. It's just going to be awesome, right? Yeah.

Starting point is 00:35:11 Well, and I think that's the right place to like where it's going to get implemented. And then the question is like, does it stay there? Like maybe it does. Totally. Yeah. Totally.

Starting point is 00:35:19 But like the, it's in the current state a dramatic accelerant to that person which probably makes other people on the team much less necessary. right? A team of four becomes a team of one, like whatever. Sure. Right? Because if I have the tribal knowledge, like I can use LLMs to do all these different interesting things like integrations. I mean, those are domains that are extremely well documented. Even the tooling within, within like Ipass tools or integration tools like Dapier, they're all getting better. These are all like publicly documented APA that

Starting point is 00:35:58 there's like not weird stuff, right? But at that point, I'm using a set of tools to like just accelerate things that would have been very manual, like very difficult to build in the past. So I think you're right on that people will be doing that. I'm going to give you the shadow side of that though. That's why I had

Starting point is 00:36:17 to. Which is on the one hand, in the short term, you're going to be like, man, we cut the team down, we're doing it there. It's so much more efficient. You have just made your Gary problem 100x world. Totally.

Starting point is 00:36:32 When Gary leaves, you were going to be more screwed than you ever were in the past. Because now it's going to be... They're paying all of Gary's chat logs. That'll be in the crucial thing. And now you're going to have people who are... You hit the context window when trying to like... You're going to have them manually searching through catalogs and trying to figure it out. It'll be all this thing where he's developed all of these LLM shortcuts to get the things that he wants to do it.

Starting point is 00:37:01 You're going to be having to read. right what was the kind you know what was the like the system prompt he used when he was doing is this going to be like yeah short term it's going to be lovely and long term you're just it's a great point because that's true of a lot of AI things like there there's a practical low threshold that makes sense where like let's say we had a team of three and now it's enough work for one person but now it's just one person like we really like know that we should probably have more than one person that can do this core like really right thing yeah yeah Have you solved that problem?

Starting point is 00:37:33 Yeah, it's tricky. You know, one other thing is we think about that, especially on data teams, right? And so if we talk about systems integration or, you know, of course, like a huge thing that data teams deal with is, you know, it's like the operational side where it's like, okay, we're producing data products, but those need to get into other system. Right. You know, whether that's for an internal customer or whether we're delivering a data product to a customer that actually needs to be in a client side, you know, part of the application. Right. it's, you know, maybe an analytics product or whatever that is. So, yeah, certainly agree there where it's like, okay, well, the attraction of like 75% headcount savings there, you know, even if it's possible, is that wise?

Starting point is 00:38:10 I think that's a really interesting question. But the other really interesting thing is if you think about, okay, let's go back to something that both of you said, which I've had the same experience. Like the most effective teams is like, okay, you have a less than 10 reports, you're driving the business, the department, whatever. the scope is, off of a limited number of reports, they're really tightly scoped. There's generally, like, good documentation around that, right? You didn't get there by everyone agreeing with you, right? Like, what actually happened was there were a lot of disagreements that forged, like, really strong opinions that, like, forced everyone to figure out, like, what is our actual goal? What are the compromises that we need to make? What?

Starting point is 00:38:58 are we willing to give up, right? Like, what are, you know, you can't have your cake and eat it too, right, in those situations, right? And so there tends to be a lot of healthy conflict that leads to an outcome like that where you say, okay, the most effective team I was on, like, blah, blah, blah. But one individual with an LLM cannot, in the current state, reproduce that level of healthy conflict, right? Because they're generally very agreeable, right?

Starting point is 00:39:25 And so it's general, like, that's another interesting dynamic. of, you know, even if you think about founding a company and there's the whole movement around, you know, the first, you know, what is it, the trillion dollar single founder, you know, whatever. I think the number goes up every year. Yeah, for sure. Like billion trillion, I don't know.

Starting point is 00:39:42 With the electric bill. Yeah, right. But it is interesting to think about that. The conflict often leads to the best possible outcome, and that's actually way harder to achieve. It's so much work to achieve that with an LLM. Conflict sometimes, because I can think of times where I was like, okay, I want to do,

Starting point is 00:39:58 something, right? And I'm like, all right, let me go to the LLM and let me ask it. How would I do this? And how would I structure it? Or whatever it is, right? I'm going to try something. And then I go and talk to someone else about it. And they're like, why don't you just do like this other thing over here? And it's like, look, oh, okay, yeah, that is the way I should do it. I shouldn't even spend the time. LM's never going to do it. I agree with that. Like, I think the conflict is one thing, but you can tell it to be adversarial. And if you pick the right LLLM, it will be. The other thing is what you said is that like it tends to struggle

Starting point is 00:40:31 with simple solutions but like just do this like very easy simple thing like it tends to overcomplicate things and if you get it again maybe you can tell it like simplify you know totally whatever if you give it a complicated thing

Starting point is 00:40:44 and you're like I have a great idea and we're going to set up a server that does this thing that has a web hook and that's great let's get going yeah yeah right talk to someone else and they'll be like why are you doing that it's just built into the product.

Starting point is 00:40:57 Yeah, it's already there. Totally. Yeah. No, yeah, I agree. Although, okay, yes, you can set it up to be adversarial that you're embedding your own. Your bias is still embedded. The reason that conflict is good is because it's generally, in healthy situations,

Starting point is 00:41:17 is generally rooted in deep conviction about something. Sure. Right. You know, and that's like where it emerges. And like from a technical standpoint, by the time you've put in, I want to do x what's happening underneath the hood is that entire network and knowledge web has now rearranged itself to be those things that are most probable around what you've just said it's stuck in that local minimum a lot of times it's not going to pull itself out yeah totally right totally

Starting point is 00:41:47 one other thing i think it would have rexon tell us we probably have time for one more question okay here's another thing that's really interesting so if we think about all a lot of the things that we've talked about. It goes back to, and this is like a really great irony, I think, of this age that we live in, which is amazing. I mean, how fun is this? Like, this is amazing to be living in the age that we are. But there's sort of this dynamic of forcing people to go back to the fundamentals and do them well. So I'll give you one example. So linear has done such cool things with their MCP server, their agent technology, all those things, right, where it's like, okay, if you, you know, and then if you think about tools like chat PRD from the product side, right, where you can generate product requirements, et cetera, and then you can actually scope those into like different tasks, you know, combine those into a sprint. And then now even with, you know, with linear agents, like you could actually like automate some of the feature development just based on, you know, what you've outlined as the tasks that need to be done, you know, in the project and linear.

Starting point is 00:42:55 And which is crazy. I mean, it's pretty wild, right? But what is the prerequisite to that? It's like writing really good, detailed, like, tickets, you know, or issues in linear formats, you know, breaking a project out into, like, logical sequences. Like, you know, all of that stuff where it's like, okay, well, that is just generally good. practice anyways. And so one of my friends who's an engineer is like, oh, like the best possible thing that happened for forcing our engineering team to like start writing really good issues was AI, right? Because now they realize like, oh, well, if I want to like leverage this to help, you know, increase my productivity, I actually have to go back and do the thing that I should have been doing before. Right. But there were just less, you know, it didn't create as much

Starting point is 00:43:48 And zoomed out even further, it's also, I hope, puts more pressure on doing the right things to begin with. Because doing the right things, like, assuming we're like working on the right things from a like feature roadmap or whatever, then there's a cascade down, which it's really good. But hopefully that frees up cycles to work on like, hey, are we doing the right things like all the way up at the beginning. And then you've got some like less friction to do the right thing as far as like implementation with PRDs and requirements and issues and you know yeah for sure and even if you think about documentation and that could yeah there's another one internal or external right or like unless the context of a data team and like you know you have stuff that's

Starting point is 00:44:29 really well defined is your internal documentation really good right is well great like AI can be really helpful for that if it doesn't exist like you're going to have to write it and right sometimes AI can help with that but you know sometimes in full circle maybe too for teams it provides like more robust tooling like APIs actually get better because like we go through this process and like we go ahead and like build out the edge cases that we would have maybe

Starting point is 00:44:55 skipped over before totally okay lightning round really quickly here because I think we're close to the buzzer best AI tool you like use this summer new AI or like feature within an existing tool hopefully within the data space but if not here's a hack

Starting point is 00:45:13 that is pretty obvious, but I think it's super interesting to look at the top tools. Like you mentioned, linear. It's a great tool. And then go trace out their integrations and then go look at those tools and trace out their integrations. You can have some great combined experience. It's like we're talking,

Starting point is 00:45:33 you're just talking about linear. Like with chat PRD. Cool. Like neat tool. That's another one. Like linear's got a neat like cursor like thing where you can assign things in cursor. Yep. Yep. So, like, just doing that, I think is interesting. And watching these, like, even, like, who does open AI have on the stage with? I'm like, go look up those people.

Starting point is 00:45:49 Like, that's a, for me, this is more on the discovery, not, like, specific feedback of a tool. But that's been a good thing for me trying to, like, keep up with this stuff because it's so hard to even know, like, what to pay attention to. Yep. Matt? Matt doesn't use AI tools. Yeah. I'm just kidding.

Starting point is 00:46:07 It's not like I work for an AI company or anything. That's actually it makes it hard because, like, literally I'm dealing with our internal stuff and build you know and like trying to help build these things so it's like I don't know I don't have a good answer I mean we've done some like in my current when we've done some cool stuff but like you know kind of consumer based things trying to think of if there's anything I can really come up with because I mean I use stuff every day I don't know how much newish stuff I've used though I've tried a couple different things and kind of gone back to the old stuff

Starting point is 00:46:42 Yeah, it is interesting. What about you? Well, I mean, of course I'm a Raycast fanboy, and so I feel like they have just continually made the general experience so integrated into the core workflow. You know, integrating with, you know, apps on your computer and tools.

Starting point is 00:47:02 So I would say that's a little bit of a cop-out on my own question because it's not like a brand-new thing, but it's such a good example. I think of just a tool where like I just generally use Raycast because it's such a good experience even you know even before like you know whatever I have

Starting point is 00:47:25 GBT and cloud and all the you know all the individual stuff but just their attention to detail on integrating it into the core workflow that feels like an operating system level thing is incredible you know I do wish there was more stuff that had that in there because I mean that's to me one of the big things that I would like to do especially like just even in my current role where it's like it's me there's no team right yeah it's like there's stuff i'd like to be able to do where i'd like to be able to set it up

Starting point is 00:47:51 to be able to say hey go do this go do that yeah you know and a lot of the tools just it's still it's still really hard to like string it together but i feel like rake out knits it together in a nice way but a lot of them don't really even they'll say like oh yeah we can do tasks and stuff and then you'll get into it and it's like oh we can do this one task yeah only It's still hard, yeah. It's still hard, yeah. And then, like, you know, it's still not accessible to sort of non-power users. Okay, one anecdote that's really funny. So my dad's, he owns an automotive shop and they do general stuff, but, like, he's worked in transmissions this whole life. And so, like, you know, we rebuilt a car together. So he's

Starting point is 00:48:31 very mechanical. And so my lawnmower died before the summer. And now that I'm back, it's like, okay, well, I need to buy a lawnmower, you know, because, of course, I'm one of those people who I have to mow my own lawn. I've hated paying someone to do it. I bought a robot to be a mile on the summer. Like a hustle. Well, my backyard, you have to go upstairs. And so if the robot can go upstairs, then I'm in.

Starting point is 00:48:56 Otherwise, I have to buy two robots. Yeah. But either way, I was researching, like, researching lawnmowers and all my other equipment's gas powered. And so it was just easier to whatever buy a gas powered lawnmower. Although, you know, people are trying to convince. me to go electric, but maybe I'm just a traditional guy. Either way, I'm looking at the gas-fired lawnmowers and, you know, trying to get the Labor Day sale and blah, blah, blah. And so I asked my dad,

Starting point is 00:49:22 I was like, hey, well, like, help me, you know, who makes, like, the best motor, and I haven't shot for a lawnmower in a long time. And he just immediately said, did you ask chat GPT? I'm like, you're up a chaos. Like, you're supposed to know, like, this is your domain. That's like internal combustion and he was just like, did you just, why didn't you just ask Chad GPD?

Starting point is 00:49:48 Raycast, come on. Exactly, yeah. Well, my wife has gone from where she would look at me for his like, are you just going to ask chat GPT that to now she'll be like, I don't know the answer to this.

Starting point is 00:49:58 Can you go ask chat? Nice. Yes. Awesome. I love it. All right. Well, thanks for having me back. Lots of fun shows.

Starting point is 00:50:09 Maybe we'll have you back on our show. Yeah. Great. That would be a great privilege. All right. Thanks for listening, and we will catch you on the next one. Stay cynical. The Datastack show is brought to you by Rudderstack. Learn more at rudderstack.

The Data Stack Show - 260: Return of the Dodds: APIs, Automation, and the Art of Data Team Survival

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.