The Data Stack Show - 201: AI Real-Talk: Uncovering the Good, Bad and Ugly Through Prototyping with Eric, John, and Matt

Episode Date: August 7, 2024

Highlights from this week’s conversation include:Current State of LLMs (1:12)Historical Analogy to the iPhone (3:32)Limitations of Early iPhones (5:02)Comparing LLMs to Historical Technologies (6:08...)Skepticism About LLM Capabilities (9:11)Broad Nature of AI Innovations (10:12)User Input Challenges (14:32)Transcription and Unstructured Data (16:19)Single Player vs. Multiplayer Experiences with LLMs (18:50)Revenue Insights from ChatGPT (20:27)Contextual Use of LLMs in Development (23:43)Implications of Human Involvement (26:15)The Role of Human Feedback (29:19)Customer Data Management and LLMs (31:25)Streamlining Data Engineering Processes (34:24)Prototyping Content Recommendations (37:42)Summarizing Content for LLMs (39:51)Challenges with Output Quality (41:18)Data Formatting for Marketing Use (43:20)Efficient Workflow Integration (46:20)Exploring New Prototyping Techniques (50:56)Distance Metrics for Improved Relevance (53:00)Improving Search Techniques (56:46)Utilizing LLMs in Customer Data (59:15)Challenges in Customer Data Processing (1:01:10)Final thoughts and takeaways (1:02:12)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Hi, I'm Eric Dotz. And I'm John Wessel. Welcome to the Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Welcome back to the Data Stack Show. We have a special episode for you today with just the Data Stack Show team. We actually invited Matt Kelleher-Gibson,
Starting point is 00:00:38 also known by his stage name, the Cynical Data Guy, onto onto the show in large part because he and I've been working on a ton of LLM projects internally at Rudderstack. And of course, we want to make the world a better place by putting another podcast out on the air that discusses AI and LLMs. So we're doing our part. We're doing our part. Yes, on that. But lots of interesting things to discuss. Matt, what are you what's on your mind? We have some topics to discuss, but what's on your mind? I think just talking about kind of where we are with LLMs in general. And, you know, there's a lot of talk always about the hype of it, but also kind of what's the reality and where is it still what's difficult? What are the actual use cases for it? Yep. John? Yeah, I think in addition to, I've, we talked about this on a previous episode of what I like to call single player mode versus multiplayer mode.
Starting point is 00:01:33 And it'd be interesting to explore again. What are we seeing in single player mode, which is like, oh, look at this cool thing I can do. Yep. And then like, what is it truly to have something in production in an app with hundreds or thousands of people using it simultaneously? They're not the same problem, right? Yep.
Starting point is 00:01:50 I love it. All right. Well, let's dig in. All right. This is great. We can talk about AI with basically no agenda, you know, which is really what the people want. The AI may train on this one day and think about that. Matt Levin- Whoa.
Starting point is 00:02:08 Matt Levin- Whoa. So I'm just going to say cowboys are bananas. That's the right way. Matt Levin- Yeah, exactly. Right. Sure. Right. Perfect.
Starting point is 00:02:15 Appreciate that. Let's start. I, we have some specific stuff to talk about, Matt. I really want to dig in with you. You've been building a ton of prototypes internally here at rudder sack. So I want to dig into those and talk about some practical stuff. But I realized that we've gotten a lot of perspectives from different guests on the show around AI, but I don't think that we've discussed
Starting point is 00:02:38 that as a podcast team. And so that's why part of the reason I wanted to do this episode was just for us to have a discussion about it, because I really value both of your opinions. We've talked a lot about about an analogy for where to place this in terms of historical reference, right? Like, okay, we've kind of seen something similar before when this happened or this happened. And there have been lots of different, you know, there's been a ton of ink spilled on the internet about this and probably podcasts. But the one that really came to mind in terms of my own personal experience was the iPhone. And one thing in particular really stuck out to me. So I thought, okay, let's think about the iPhone. Rewind to 2000. it came out in 2007. And I think I got one that year, I want to say. Or at a minimum, the place where I worked, which was a marketing agency, like there were multiple iPhones around, right?
Starting point is 00:03:56 And so we had this eye exposure like very early on. And that was when they, do you remember the, it's hard to remember this, but like the app icons, it had like a very physical design built into it, right? So there was wood grain as a shelf for your apps and all this really interesting design. Which is very different than your BlackBerry apps. Oh, I remember that. Okay, so you rewind back then, and I think if you would have told me then, multiple people. And because of that, a ton of people, it made it really hard for them to believe that someone could run an entire business off of an iPhone. Do you remember what that was? Was it copy and paste? Copy and paste. Do you remember that wasn't? Was it copy and paste? Copy and paste. Yeah.
Starting point is 00:05:06 Do you remember that wasn't? Yeah. And it was really painful, right? Yeah. Like, and you just thought, man, it's going to be so hard to do anything. Any sort of complex cross-app workflow is was to be, you know, interacting with a piece of technology where you could both sort of sense the immense change in the way that you were going to interact with stuff. But then also these severe limitations that made it hard to imagine how ubiquitous it would be feels very similar to LLMs at this point, right? Because it's like, holy cow,
Starting point is 00:05:53 these things are capable of doing some completely wild things, but it's like, ooh, they can't, you know, it's like that copy paste, you know, functionality. There are some limitations that feel similar. So John, thoughts? is that accurate am i off do you agree i think so i think the interesting thing will be because it's easy to look back and like and say and but you take a lot of things for granted so i'm interested in so we so happen
Starting point is 00:06:20 like with iphones to not face limitations with like 4G and 5G. Like we were able to keep up the speed because, you know, your original iPhone, I mean, 3G, like if 3G had stayed at 3G, like you would not have had the same adoption, right? Yep. With 4 and 5G. So like what are, so with LLMs, what are those like companion things that would be your like 3G, 4Gg 5g iterations that also have to happen for it to hit that same trajectory yep and then even to like those like kind of stellar apps whether it's like app categories like whatever the first big game was that blow like blue open game where i was the first like a youtube type thing that was like a perfect
Starting point is 00:07:05 fit like i don't even think i have a good idea for what those things are yet yeah yeah yeah for sure that that was yeah i mean yeah there i'm sure there was one way earlier in this but it was very the pokemon go phenomenon made the whole thing so visible all around you because all of a sudden there's these people walking around holding right and you're like whoa this is a totally different experience than what was possible before right on a number of levels yeah i mean i think we're still kind of trying to figure out a lot of that especially on the use case side where i mean and i think the fact that it was so successful in that consumer side with the chat interface has really locked a lot of people into they just think about it in that chat interface.
Starting point is 00:07:53 Yep. And I think about it as being very visible and very like right in the customer space. And I think that's pushed a lot of this towards that idea of like you know the whole joke like ai power type like it's got to be out there it's got to be front and center um which then just makes me wonder a little bit like is this you know are these use cases real or is this like saying bluetooth enabled it's bluetooth enabled okay but i don't need bluetooth in my microwave yeah that is actually another technology that's a perfect like antithesis to the iphone where it's like bluetooth we're on version five point something now i think what happened bluetooth like like who has a
Starting point is 00:08:39 seamless headphone experience here like every day it's always perfect like yeah so it is interesting like it doesn't follow that curve like more than i and like the range has gotten better on that on the bluetooth side but it's well well i'm kind of yeah one of the more recent ones has gotten really good but you're still dealing with the idea of like it'll still drop sometimes won't connect you know those types of things and so i just with a lot of the generative AI stuff, I'm still kind of, you know, we're still in the state of it can do everything or it can, it'll change everything. I mean, even, you know, like I saw something today where they were, where someone made the statement, well, as AI, since AI makes software development so much cheaper and faster, and I'm like, whoa, do we have any proof of that? Like, I hear that stated a lot, but I don't know that anyone's actually seen that or had results of that. Yep. So I think there, you know, there are definitely going to be use cases that are going to be
Starting point is 00:09:35 big and powerful and impactful, but we're still kind of in this like fog of war almost. Yeah. Yeah. I think the other bizarre thing here too, conceptually, is AI is so broad. Even we're just talking LLM, so it is so broad.
Starting point is 00:09:51 And I can't think of another, like, innovation. Like cloud computing is the closest I can come with that is just so generic. Yeah. As far as the possible things. Whereas again,
Starting point is 00:10:04 like back to the iPhonehone like that had a use case to begin with it had a you can text and call your friends you can browse the web you can the gps use case like you can use it just really clear defined use cases and lms did not have not come out of the gate with that and that had a roadmap for years before people knew that like GPSs were going to get cheap enough that you could put them in a phone. Yeah. Right. Yep. So that.
Starting point is 00:10:29 Oh, and cameras. That's the other one. Yeah. So like those had like very clear roadmaps where the whole industry see where it was going. And then it finally kind of culminated there. We don't really have that. There's not really a very good like industry roadmap of where this is going. Like 20 or 30 years worth of like
Starting point is 00:10:45 the first one was the first gps one was the first like portable camera digital camera like it combines so many different so many of those things yeah that great moment where it feels like llms it is different because a it's broader and b like it's net new just like there's not you know any anything i mean there's obviously a lot of like work before it but it's sure it's different like to the common person like it's kind of feels net new right do you think though that yeah i think about alexa and i haven't looked at the recent data on alexa but i remember at point, Amazon had an unreal number of people working on Alexa. I think it was 10,000 people working on Alexa.
Starting point is 00:11:33 And one of the interesting revelations at that time was that people did two or three things with Alexa, right? Yeah. They checked the weather. Turned their lights on and off. Scores. Yeah. Yeah. Turned their lights on and off, maybe.
Starting point is 00:11:45 It's a smart timer to a lot of people. Right, exactly. Yeah, that's probably not right. Totally. And so, but to some extent, and you think about, I go back to the iPhone, and the other reason that it stuck out to me was that a lot of people, it was a cooler way to watch a YouTube video or like make calls or whatever, right? I mean, of course they're like, you know, the video calling very early on was pretty wild from the phone, but, but that changed a lot over time, right? Right. Where it's hard. You think of this thing as sort of very, you think of a technology or interact with a technology in a one dimensional
Starting point is 00:12:26 way because you're so used to doing things other ways or it's so new, right? I mean, this is Google search is actually similar in that, you know, when Google search first rolled out, it wasn't intuitive that you would just go Google everything, right? I mean, it's like, oh, well, this is, it's even if someone says you can search, you know, most of the internet through this portal, you're like, Oh, that's amazing. And then you're like, you know, if you've never seen it, you're like, you get an intuitive sense of what's on the internet. And I think that's a major problem with LLMs. Most people do not have an intuitive sense of like what it could answer. Right. Exactly. Yeah. Yeah. Okay. So we've been mildly, you know, we've been mildly skeptical in terms of some of the issues here,
Starting point is 00:13:12 but these things are incredibly powerful and I think will change a lot of things fundamentally. So what's the bullish outlook, right? If we remove all of those, right? If we sort of look at things like, you know, if we think about things like Google search, things like the iPhone or whatever, have like drastically expanded in the scope of what people use them for over time. What types of things do you see for LLMs? One other thought back to your iPhone analogy.
Starting point is 00:13:41 Do you remember the big thing that people were walking around saying that were like, I'm not going to get an iPhone? Do you remember what it was? Like, oh, I could never give up this thing. I could never give it up? I could never give up like this function on my phone that iPhone doesn't have. Oh, hold on. The number pad? The physical keyboards. So it's like the BlackBerry physical keyboards. I remember lots of conversations with people, I cannot give up the physical keyboard. So in a lot of ways, the iPhone knew that,
Starting point is 00:14:11 and they obviously knew that, but they bet against it, and that like, oh, the video and the scrolling, and there's studies on this of how people use their phones, the time scrolling videos and things like that is way more than typing. So it'll be interesting with llms which is primarily typing but we're back to that like yeah that's fascinating yeah from a like user input is what's going to happen there yep yep but so the bullet so the bullish would be like when we get into like multimodal or more
Starting point is 00:14:44 user which like the audio is pretty cool. Like it works pretty well. Awesome. Yeah. More video things where like you're using your camera and there's a few that are out there. Like I feel like that multimodal will be a big part of it. I think like for me personally, I already start using it where there's a lot of situations, you know, as a search engine. Right. Where it's like I just need I don't want to scroll through a bunch of websites.
Starting point is 00:15:06 I don't need a bunch of SEO content that I need to go through. I just need to know like, how do I do this? Or what's a recipe for this? Or give me a summary of whatever. I think, you know, there's some drawbacks in that. Are you really getting the full picture and things like that?
Starting point is 00:15:19 But I think for a lot of general cases, that's super just very quick, very powerful and gives you the ability to probably put that in more places and you know and just search in general probably just to have that i think about other things that could be used for i mean we already see it being used in a lot of customer service type areas yeah and i think that probably another one where it'll continue even like co-working some of the like the rest one to whatever. Yeah.
Starting point is 00:15:47 Right. The really terrible, which are like, please tell me what it is you want. Yep. And they never, never understand. And you're just screaming at it.
Starting point is 00:15:54 Yep. Talk to a representative. Right. Right. So I think just keep pressing zero until. Yeah, exactly. We're going to connect you with a representative.
Starting point is 00:16:00 Yeah. So I think those are some of them that, that will be really good. But I also think kind of what you were saying also i mean there's a lot of stuff where like the transcription ability on audio things like that that i don't think is fully like the capabilities i don't think are fully like being utilized or understood at this point yeah the unstructured data thing i think is it really is phenomenal. That's yeah, it is, you know, the ability to. Translate across languages is I mean, it really like it's going to be more amazing than I think a lot of things that we've seen previously that have, you know, translation type tasks or whatever.
Starting point is 00:16:40 But then also, I mean, I go back, I think I've mentioned this before, but just processing unst processing unstructured data right like looking through a 1400 page pdf manual for something right right instructions for any sort of product that you buy that you know has some sort of like summarizing all that i mean it's like the time savings i think is going to be incredible across a number of different vectors i i really like yeah translation is a good one i'd use the word contextualization because now like all of these things that historically you've had generically like all the instructions are generic for just about anything right so intentionally like intentional yeah yeah so you've got and actually some companies are getting better at this like having a little variable where it like puts your account key in there instead of like
Starting point is 00:17:26 brackets around your account key here. Things like that, like are actually helpful. But with an LLM, if you're generating that just on the fly, I mean, you can do some amazingly good instructions to make, let's just say software set up much easier because it's fully contextualized for your account for your region i mean i configure software all the time so you gotta go look up like your attendant id like what region is it in what's my account this what's my account that yeah so i mean that's a fairly simple example but i think things like that like with contextualization will be a lot better yeah and i know people who use it for things like being able to get you know like
Starting point is 00:18:05 government grants and stuff like that being able to read the grant proposal write it yeah totally types of things where you know maybe it's not the final draft but it's getting you 80 of the way there and now you're just reading through it and double checking or even doing the check on the back end of it of have we completed and are we fully within what the instructions are that we should be doing with this? Yep. Yep. Let's, okay, I want to move into the single player versus multiplayer and then get even more specific, Matt, by talking through your experience doing a bunch of prototyping because a lot of the stuff that we've talked about and even the examples that we've used and to your point it's hard to think of you know it's hard to think of things right which is often
Starting point is 00:18:51 true of new technology right hard to look back in history and find the exact analogy right but a lot of the things we've talked about are very individual right and the examples of the iphone is a device that you have right you can connect with other people through it right but alexa which with the iphone we completely skipped over social networking yeah that was a huge deal and like gps is very much like has you know outside components of your positioning like there's a lot of external networking network effect stuff that i don't see in lms of like right now Yeah, such a good point. Yeah, yeah, yeah. When we go to multiplayer, though,
Starting point is 00:19:28 and I guess the point I was making was around utility, right? A lot of the things we've talked about are personal utilities. So even though there are networking effects related to the GPS and maps, you're still consuming that as an end consumer, right? And I think one of the interesting things, and would love to hear more of what you're thinking, John, is how do you think about this sort of single player to multiplayer in the context of harnessing this power as a business
Starting point is 00:19:57 and distributing it? Because I think one of the things that makes this conversation so tricky is the revenue from chat GPT, like just subscription revenue is in the billions. Right. And so you look at that and it's kind of mind boggling. Right. But then when you really dig in and talk to companies who are trying to actually productize this, it's Alexa, right? There are a couple of things that are working really well and that are really powerful, but they're not hundreds of them. Right. And they're not necessarily like, I mean, you know, like with Alexa, it's like, oh, it does a timing really well with your voice. Well, that's not really going to pay for itself. Yeah. Right. That cost element is a big like elephant in the room for a lot of these places okay so john that was the longest lead-in what a horrible rambling lead-in for me to say what like tell us about single player versus multi yeah well there's actually two aspects of it one i didn't consider
Starting point is 00:20:58 at first one was what we were just talking about with actually like a i use this llm with you multiplayer which is not what i was referring to but interesting to think about oh interesting talking about with actually like a, I use this LLM with you multiplayer, which is not what I was referring to, but interesting to think about. Oh, interesting. It's like, you know, like a multiplayer game or like a video call. There's two of us or multiple of us.
Starting point is 00:21:14 I don't know of any multiplayer in that sense. LLM things. Yeah. But I was more thinking like multiplayer as I use it on my laptop and go into production. So we'll stick with that topic, but the other one's kind of interesting too one's really interesting yeah so maybe matt will have some takes on that but on the productionization i mean we had an awesome conversation with barry
Starting point is 00:21:34 from hex yeah on this a couple episodes ago but i think it's a fundamental problem of like if you're a developer or have worked with developers it's this like it worked for me thing like on my laptop i got it to work and then like the productionization process has gotten far easier in the last 5 10 15 years there's a lot of tooling around it there's a lot of like best practices around it most of that's missing right from llms and there is the like attended versus unattended problem when I'm like here using GPT to look up like a restaurant to go to tonight it's an attended thing and I'm constantly saying no I want it closer to my home no I don't like Chinese food or I don't like Italian food like you're and then you can kind of get in it it's cool it works but if you're doing more of an
Starting point is 00:22:20 unattended thing in production where like you want a one shot answer to X, like translate, you know, X to Y or whatever you're trying to do, it's different. Yeah. So and people have to think differently because they expect deterministic code. Yes. And production. Like that is a very high expectation from people and if it's not in that like iterative like interaction phase which just feels weird like we're just not used to it in software not used to the software being like a little unsure of what to do yeah you have to like kind of coach it the attended versus
Starting point is 00:22:57 unattended i think is really interesting and one of the things in the conversation with barry that i appreciated so much was that he said, he basically, he didn't use those terms, but he described their implementation as being attended. And so for the listeners who didn't catch that episode, number one, unbelievable episode. Barry is an incredible guy. So go back and listen. But for those of you unfamiliar with Barry McArdle or Hex, Hex is a collaborative data platform. So analytics, data science, workflows, reporting, and they have a feature called Magic that
Starting point is 00:23:34 you can essentially ask it to do things or ask it questions in plain English, and then it will generate code and run that code and even give you multiple cells of results. And it works really well but it's not perfect and barry was like this isn't perfect but you have to think about the context where are like where our user is right it is someone who's writing who writes code you know as we're living or as a major part of their job right so sql python you know what do they support multiple languages i think they support? Multiple languages. I think they support R, don't they?
Starting point is 00:24:06 I think they might. Yeah. Which I respect. It's so cool. I appreciate that. Mad respect for supporting R. Get your matrix math in there. But he said, you know,
Starting point is 00:24:15 our end users aren't perfect, right? And so it maps to their experience in that it is an attended problem, John. I think that's a great way to describe it, right? They are, if they write code and execute it, a great way to describe it, right? If they write code and execute it, they're going to analyze it and evaluate it for its accuracy and all that before production.
Starting point is 00:24:31 And so it's an attended problem. It's also one where it's generating code, but you're running the code over and over again in production. You're not asking it to write code every time you do it. And that is one thing that in a lot of the stuff that I've done here at RudderStack that you run into is, you know, there's a couple of things that I feel like are a blessing and a curse for their current implementation.
Starting point is 00:25:05 One of them is the fact that if you have, you know, when you're trying to work on something in a larger scale with it, anyone can go, well, I just went to JET GPT and it did this for me. Why can't you make it do that? That's a classic. I can make you do it once, but can you make it do that consistently a million times? Yeah. That's really hard to do. Yeah. When I think in that attended implementation, like one of the interesting things to think about, and this wouldn't be data legal, like a lot of
Starting point is 00:25:25 applications. I think it can be a really wonderful tool in a workflow with an expert. Like, I mean, you can imagine being a lawyer and reviewing all these things and like, you've got these like nice summaries and you have these things, same with data. You're already like, you know, Python and SQL really well, but like, you know, you're saving keystrokes, like, you know, you're moving faster. Like this makes a ton of sense. And you're an expert and you have so many reps yeah you can look at it and be like yeah that's a problem yeah and you can run it and get an error from a deterministic system and fix the error like all that works really like i don't think there's any problems with that yeah yeah well there's a company is it rv Hold on a second. I'm going to look this up.
Starting point is 00:26:08 That is a legal, it's LLMs for legal. And they just raised a bunch of money, which makes a ton of sense. Yeah, Series C announcement. But the trouble here is for legal, for like financial tax, like the desire, oh, we got to get these in the hands of the people. This is going to be so great, right? Yeah. Which at some point that's probably true, but especially in, and then think medical, right?
Starting point is 00:26:32 Like a human more scary, but I just don't think we're at a point where that works, right? Yeah. I think also it's a situation where we're saying, well, if you have an expert, this is really useful, but there's a temptation to do either give it not to an expert
Starting point is 00:26:48 or for the expert to get kind of complacent or possibly lazy, like we've seen with people filing law briefs that were completely made up. And that becomes a question then of, will it hold up or will it just cause people to get, will like the quality go down because people stop learning or trying to
Starting point is 00:27:08 keep up with that? If you're already an expert, it's great. It's kind of like the co-pilot stuff, right? If you already know how to write code really well, it'll help you a lot. If you don't and if you're just starting, it'll help you,
Starting point is 00:27:24 but if you want to get really good at it, you need to also code without having a co-pilot there to help you. Or you will become completely dependent on that co-pilot. And it will stunt your ability in the long run. Yeah, it is interesting. Yeah, I pulled up Harvey, which seems like a really neat company. Yeah. But the Harvey platform, just reading from the website, the Harvey platform provides a suite of products tailored to lawyers and law firms across all practice areas and workflows.
Starting point is 00:27:51 What's interesting is that is designed with human in the loop. And I think that's where a lot of the aspirations, the aspirational thinking comes in around what types of things will be able to happen at scale without a human in the loop. Right.
Starting point is 00:28:09 Because almost all of the use cases that we just talked about are attended in an expert context, right? Or someone who's knowledgeable enough to know how to use the output, even if it's not 100% accurate. And that seems to be like a big, if it's not 100% accurate. And that seems to be a big challenge. Now, I will say, I think on the customer support side of things, that's gotten really good. I mean, there are a lot of things you can do without a human in the loop. Yeah. So that's at least one area where it has proven to be to create
Starting point is 00:28:45 like really non-trivial gains in terms of when i think with the human a loop too like it doesn't have to be a certain spot in the loop right like you the human loop could be all the way at the end and determine like hey that result is not right and but they could have but there could be some kind of like python or sequel or something being generated in the middle and that can still work it just um you know there's just a certain like if you're in the loop and you're the expert in python or sequel it's easier to tweet whether you're down all the way downstream looking at end results like that's not right you can give feedback to the lm and still probably eventually get what you want it's just a a little harder because you're further downstream.
Starting point is 00:29:28 Matt, you've been doing a ton of prototyping at Ruddersack. And I've been, I want to say helping, but maybe I've been slowing. We're difficult. You're smiling. No cop. I wish the listeners could see this. Yeah, I could see his face. I appreciate that.
Starting point is 00:29:52 I appreciate you withholding information. Yes. So one of the things I'd like to talk about first is that, which I'm personally very excited about, just going through these workflows with you is on the infrastructure side of things. So I'll give the readers a little bit of context here, but, um, you know, rudder sack deals with customer data, right? So a lot of the use cases that we help our customers with happen on some sort of entity level, right? A lot of users,
Starting point is 00:30:27 like, you know, modeling users, modeling accounts, modeling households, these entities that sort of represent a business's relationship, you know, with their customer. And Rudderstack's product that helps with that modeling is called Profiles. And so it helps solve identity resolution and then makes it really easy to compute features over that identity graph at an entity profiles. And so it helps solve identity resolution and then makes it really easy to compute features over that identity graph at an entity level. And so the output of that generally is a set of tables or sometimes a single table that our customers would call their customer 360 table. It's sort of one row per user entity, whatever, Yep. And then everything they know about them. And so you've actually been using some early versions of LLM infrastructure
Starting point is 00:31:12 that we built into that pipeline. And what's been interesting to see for me is that you're actually spending most of your time like working on the LLMs, not doing the data engineering side. So can you speak to that a little bit? Because like the input and output side of that has been fascinating.
Starting point is 00:31:25 Yeah. Well, so we gotta give credit to the engineering team here because they've done a great job of just hitting that right into the kind of profiles flow so that you're, you're not having to deal with any of that stuff there. And then also, you know, we're building this off of snowflakes, cortex, all that type of stuff, which makes it a lot simpler to go and actually, you know, pull up, you know, have a model you're going to be interacting with and how's it going to be sent and all that, that a lot of that you
Starting point is 00:31:53 don't have to deal with. So instead, what your ability to do is basically to take that customer 360 and then be able to say, okay, I'm going to ask it to do whatever task it is. Here are the variables or the features as we call them from the 360 and you can, you know, inject them directly into the prompt. And you can also go get other static data that you would want. So you can inject at a user level. You inject at a user level and you can also get static data from anywhere in your warehouse with just a SQL query
Starting point is 00:32:27 that you can also inject. Oh, right, yeah, because you can just use a SQL model. Right, and that becomes the thing so that, you know, when you've got a constant, you know, like a list
Starting point is 00:32:34 that you wanted to pull from or that you wanted to be able to guess from, you can pull that in, right? And it's very simple to do within the configuration of how profiles works. So then what you're really focused on is
Starting point is 00:32:46 just the engineering prompting at that point yep how do i get it to do the thing i wanted to do yep how do i get it to you know put it in the right format not give me an explanation afterwards not tell me what a good idea this is or whatever it's gonna stick in the front right yeah yeah so you really just focused on that part there which is one of those that like you know when you first when i first was getting into and doing it you're getting so frustrated with the engineering prompting and then you realize and i think you asked me a question i was like oh yeah no yeah they did a great job that all works seamlessly that's just getting the stupid thing to tell me give me what i wanted to get well okay but then talk about the output as well because that was something i mean
Starting point is 00:33:24 you and i just sort of and we can talk about the output as well because that was something, I mean, you and I just sort of, and we can talk about the specific use case, but you and I just wired up a pipeline to like put a bunch of this output at a user level into one of the marketing team's tools at Rudderstack. Right. Yeah, so what it ends up doing is it creates its own table from this that gives you the like universal ID
Starting point is 00:33:43 that the profiles creates and the results that you're looking for. So you basically have an entity ID. Entity ID and then the LLM response. At a user level. At a user level, yeah. Yeah, super interesting. So it streamlined a significant amount of the data engineering piece
Starting point is 00:34:03 because that's the brutal thing with customer data right is that these you know like you said making it work in gpt you know just like sort of uploading the data one time seems magical but it's actually very difficult to do that at scale across like you know yeah tens hundreds well and like thousands of users for comparison's sake i have also downloaded you know my open source models from like you know hugging phase and it is a fight to get those to work just in like python code and stuff versus you know i'm dealing with one now that it's like i've spent a day or two and i cannot you're fighting this problem you don't have this package it's not working the right problem. You don't have this package. It's not working the right way.
Starting point is 00:34:50 Oh, you don't have the right key access token, whatever it is. Okay. Why is this giving me an error versus, oh, look, it's a YAML file. I wrote down this stuff. I ran it. It ran. Yeah. Yeah. Cause you're just interacting with the API and it's baked into the pipeline.
Starting point is 00:35:01 It's just all baked in there. Yeah. Yeah. Okay. Let's talk about, let's get really specific and talk about one of the use cases that we selfishly built for my former marketing self, which was, and you can kind of get a sense of why Matt did not answer this question here in terms of me helping, you know, these LLM projects. But one of the interesting things that we talked about a lot on the marketing side of things was that we had all, we have a very deep
Starting point is 00:35:33 library of really good content. And we know from just our basic analytics that if people are exposed to a certain amount of that in a certain time period, they're much more likely to convert, right? To try a product or request a demo or whatever, right? Which makes all the sense in the world because they're sort of getting, you know, they are being educated about all these things that are really important about our product or whatever. And so, but what's difficult, like content discovery is very hard, right? Especially on a B2B marketing website, how do you expose that, etc. And it also is highly contextual, right? We have a very deep integrations library, we have different warehouses that we support all of this stuff on. And so people generally want information that's related to the domain of their stack, right? And the things that they can do and the features that are highly relevant to sort of this use case, this set of particular technologies.
Starting point is 00:36:29 And so if you think about trying to expose content, that's actually a pretty difficult problem to solve, right? Not unsolvable, like this has been solved in the machine learning space for a long time, you know, but the team is me and Matt and an engineer, right? And you also just need a lot of training data usually for those types of things.
Starting point is 00:36:45 A ton of training data. It's not as simple, and we're just going to wire it up and it'll go type deal. Right. So Matt and I thought, okay, well, one of the difficult things actually
Starting point is 00:36:57 is getting recent browsing behavior on a user level and sort of packaging that up. But we realized with profiles, we already had that baked in as a feature, right? So I don't know how, I mean, last 10 page views or over some time period or something. What did we actually end up doing? It was the last five pages you visited. Okay. Last five pages you visited. See how helpful I am? Yeah, on AI projects. And so this is actually something that I, that was really interesting to me. So walk us through prototype one, and then prototype two,
Starting point is 00:37:34 because we use two different architectures. We ran all this on snowflakes cortex, which was pretty, pretty cool. But prototype one and prototype two. so prototype one was we were going to use an llm to pick the content based off your last five yes pages also known internally as the yolo prompt engineering methodology yeah we'll get to that so so we started just basically like i started in chat gpt just trying to say okay could we get it to output, okay, could we get it to output something? Okay. Yes, we can get it to output. I can get it to output in like a JSON format and the way I wanted it. Cool.
Starting point is 00:38:14 We then got all of our web content. We actually put into a table in Snowflake to make it easier to handle with. But of course you're dealing with context windows, which is one of the things you've got to pay attention to. And just,'m yeah fairly certain most of our users are familiar with this but for those that aren't could you explain context window for anyone who wants to prototype and this is just a really good sort of single player to multiplayer thing of like you hit a context window if you go the yolo prompt route yeah so basically any prompt you write for an llm you, those words are broken up into tokens. And so for every, you only have so many token that you can put into the prompt before it
Starting point is 00:38:52 gets too large. And the model itself cannot hold that much kind of in its memory, so to speak. Different models have different context windows. For example, a lot of the ones that are available in Cortex have about 8,000 tokens. We were using one that had 32 000 tokens because we needed every one of those 32 000 tokens there's another one that has like a hundred thousand we didn't go that far because all these things cost differently and most is based off of your compute time so you want to try to keep that in mind as you're doing it i did get a talking to at one point when they're like, why is our snowflake bill spiking last week? Oh, yeah.
Starting point is 00:39:30 I remember that. Oops. I actually was helpful in solving that. So anyways, so we kind of did a bit of a proof of it. But once we got the web content down, I mean, it was never going to fit into a 32 token window. Right.
Starting point is 00:39:43 Because I mean, if you go from words to tokens, I think I actually asked chat GPT, that's like 1.4 tokens per word is like a rough estimate you can use. Yep. So what I ended up doing actually with the, with one of their LLMs was asking it to make summaries of all of our content. So I made another table that had, I told it to create three keywords and had to give it some stuff like don't use
Starting point is 00:40:05 rudder stack as one of the key terms because it allowed to do that and then to summarize it down to like under 35 words so we could get it down there at that point we still had too much so we just basically lowered it to only blog posts and we had to kind of subset the blog posts yeah it was so it was like a curated list of the most high intent, best performing stuff. Sure. Let's see. It was. So we started with that part there and then it was the hard part became just you're writing
Starting point is 00:40:38 a prompt that basically says we have this respective customer. They, you know, and we were putting in things where we were saying they work in this industry that at a company that has this many employees there was one other thing i can't think of in their job title maybe it was something like job title and then we said here's the last five page up to five pages because some of them it wasn't completely five you know and produce a recommendation that type of a thing. That is kind of like, getting to that was a little bit of work, but okay, we can do it.
Starting point is 00:41:09 The hard part came once we started running it like that, because the nature of it, it's a little hard to kind of one-off test these. So you have to kind of run the whole pipeline to do it. Yep. So then we had to go to a subset of users because of, yeah. We had to subset it down. It's still kind of one of those,
Starting point is 00:41:24 you submit the job, you wait for it to finish, you go look at what it is. And we ran into that's where we kind of ran into a lot of our stuff was it loved to pick really generic content. Yeah, which was super annoying. So that was one of the other reasons why we went with just blog posts, because when you would give it just website context it would be like oh the integrations page yeah that's not super helpful yeah right and you can i was changing the instructions on it saying like make them specific you know i tried doing stuff where i'm like think of three topics this person would like and now match it to this list that I'm giving you that has keywords and a title and a summary. And it would still love to be like Rudderstack homepage.
Starting point is 00:42:11 Yeah. Okay. So that was like probably the biggest fight within it was trying to get it to do that and trying to get it to do it in a way that a marketer could look at it and say, yeah, those make sense. Yep. Because I remember I showed you the first couple of ones where I'm like, I don't think this is working well.
Starting point is 00:42:26 And you're like, yeah, I would not use the. Yeah. Yep. So that was the big headache was going through that. I tried different models. We I worked with the prompt a bit. And it's tough because we were maxing out that context window and all these LLMs get a little fuzzy as you in the middle there as you max out
Starting point is 00:42:45 the content window. We got it to a point as we were getting where I felt it was being better and then I was looking through it and realizing it was making up titles and URLs. And I was just frustrated. Yeah, you give it a list and it was still making it up. So what we ended up doing
Starting point is 00:43:01 was putting an ID number on there. And I'm like, alright, at the very least you're going to return me this ID number and I'm going to use the master list to make it match up. And so that ended up being the solution that we had for it. I mean, the other problem we had was I needed it to, I wanted it to be in a JSON format. Yes. Right. And it's one of those things we talk about it's really simple you can look at it once and do it but when you run the same thing 900 times there is a percentage of them where they go here is your json formatted yeah yeah no yeah or they put it at the end and they're like
Starting point is 00:43:38 here's why we recommended these yep i know i said don't do any of that. Why are you doing this? Right. So when we got to a point where we're like, okay, this is good enough. We feel this is good enough. We could use this. The project ran fine. What I ended up having to do was have like a post. Well, you could do it as a post hook. I ended up just doing it as a view within Noflake. Yep.
Starting point is 00:44:01 To clean it up. That just cleaned it up, matched it back to the original data, and then just reformatted it the way i wanted it formatted now okay so the that was a fascinating experience for a number of reasons but aside from the you know aside from the prompt craziness once we got it to a place where it was, and by we, I mean you. That's implied. It's the royal we. Yes, the royal we. But once we, like, it was pretty amazing
Starting point is 00:44:34 just to, like, hook a RutterSack reverse ETL pipe up to the cleanup view that you had. And the reason we wanted it in JSON, actually, which this is just bonus points, you know, for for any listeners who, you know, have to deliver data down to stakeholders. The reason our marketing team wanted it in JSON was because they were going to use liquid tags in their email tool. And so for anyone unfamiliar with liquid tags, and like an ESP, if you have JSON objects, you can use liquid in like an template, in the code editor of an email template, and pull in content dynamically from a user record.
Starting point is 00:45:13 So if you wanted to say, hey, first name, and you have the Liquid format, that's Liquid syntax. And that's pretty pretty common among like ESPs, right? And so we wanted this at a user level in JSON so that our marketing team could use liquid to just set up a template and read straight from the content recommendation, right? And it was, I mean, how long did it take us to actually wire that up and like get it in the email tool? 30 minutes or something?
Starting point is 00:45:44 Yeah, and it wouldn't have taken that long except i had to restart my browser halfway through and redo part yeah so that was truly like from a workflow standpoint i guess like one of my takeaways from that which i want to get to prototype number two because that's even more interesting especially with the stuff that cortex did it was was pretty nifty. But from a workflow standpoint, going from like, we need the last five page views for all of these different users to we're interacting with an LLM,
Starting point is 00:46:15 like we're generating input, we're interacting with an LLM, running a prompt or a series of prompts. We have an output table, we wire that up and like we actually deliver this output to like user records in an ESP pretty dang smooth like overall which is impressive well and and just from listening to the conversation it's always interesting to me hearing other people's
Starting point is 00:46:38 recounting of projects right because what i hear a lot of the time is that well we spent the most time on the project around the part that wasn't the hard part. But this conversation, I think, was not true. I think you actually spent the most amount of time on the part that was the hard part. And what I mean by that is, like, we spent so much time trying to get access to this thing. We spent so much time trying to get up this pipeline that I thought would be easy to set up, but this and this. So that is actually a positive thing. Why won't this API just work the way I want it to work?
Starting point is 00:47:08 Right. That's super interesting, yeah. Right. So I think it's a real positive thing that's like, okay, dealing with the LLMs was annoying in a lot of these ways, but at least your focused effort was on that thing that was the right piece of it
Starting point is 00:47:21 that was going to be the difficult part. Yeah, and even the beginning of it, because we were just using Rutter stack web data that came together very quickly oh right yeah right so that all came together the you know identity stitching all of that just seamlessly happened for us with that part yeah so it made it very easy to focus on just this kind of new and novel thing yeah we're doing man, we're not even paying you to make that comment about RubberStack because you didn't hear about any of this.
Starting point is 00:47:51 Yeah, this is net new. I'm just thinking on my own experience with projects. I was like, man, I spent most of the time waiting for permissions for this, then a bunch of time on that API that was broken and created a support ticket. I mean, which is funny, right? I mean, like, even when you think to yourself, cause I've been prototyping some
Starting point is 00:48:09 things when you think yourself like, Oh, like I can just grab that really quick. There, there is a lot of wisdom in being like, all right, has this already been abstracted by somebody else? Yeah. Like, let me just do that. And then, you know, later on, like we can go back and address and like, maybe we'll write that ourselves. Like it's, it's a good thing to use those abstractions when you have them. A hundred percent. Yeah. And honestly, like start there. And then when, if you have a need, cause say maybe it's a costly abstraction,
Starting point is 00:48:36 look at it backwards and be like, all right, we will work out cost by like addressing this abstraction, you know, if there's some kind of ETL tool or something that like, wow, like it works so much better that way. Totally. Yeah. I totally agree. by like addressing the subtraction, you know, if there's some kind of detail tool or something that like, wow, like it works so much better that way. Totally. Yeah, I totally agree. I'll buy you lunch for that plug. Thank you. No, I, in general, I really agree though. I think that there's sort of a, at least from, you know, and this is far from statistically significant, but just talking with friends and peers, one of the most difficult parts about this is getting your data to a point
Starting point is 00:49:08 where you can fight the LLM to get what you want. It is really time consuming. And I think a ton of companies see the potential to do interesting things, but it's just so much work to get to a point where you can craft input,
Starting point is 00:49:23 where you have baseline input that allows you to focus your time to a point where you can craft input where you have like, like baseline input that allows you to focus your time to your plan. I think it's a great point to like really work on the hard stuff. Yeah. It is a nice luxury to have. And it's one of those things where we just don't know, like if GPT 4.5 or five or whatever comes out and all of a sudden it's like, wow, look what they just did. And then you're set up for it versus...
Starting point is 00:49:48 But we don't know the speed, right? That's like... Well, and I mean, to Barry's point, this is kind of like a, let's just quote Barry a bunch episode, but... Perfect. He was like, the thing about new models is like some stuff gets better, some stuff gets worse, right? And so that's the...
Starting point is 00:50:04 That's the other challenge, right? If you don't have a benchmark and you have to actually go back and- Well, and there's a reason for that, right? Because they're optimizing for two things. Obviously they're pushing forward with like making it better, but they're also optimizing for cost
Starting point is 00:50:15 because they have a lot of pressure that way. So if you wonder like, why does it feel like it's doing both? I think the cost pressure makes it, maybe worse in some areas sometimes. And then obviously you have your like move forward future pressure that everybody has. Totally. Totally.
Starting point is 00:50:28 Yeah. And also the, yeah, whatever. That's a separate subject. It's also just hard to evaluate across everything. Very hard to evaluate. And then it does make me think about the billions of dollars of, you know, subscription revenue just for chat GPT, right? Yeah.
Starting point is 00:50:43 Right. The problem is they have billions of dollars in development costs every year too it's true and compute yeah it's pretty wild yeah yeah okay proto second version of the prototype so second version of the prototype which came about partially because we were talking with snowflake and one of their solution engineers recommended that we try a more RAG approach, which at first I was slightly hesitant to because that's still pretty new
Starting point is 00:51:11 and I didn't want to have to manually set it up. Sure, and we kind of talked about that, but also we're in prototyping phase, right? Which is why the YOLO prompt method is really helpful for getting an end-to-end use case running. Yeah, I would also say, if you can avoid YOLO prompting, think about it a little
Starting point is 00:51:26 bit before can i get an operational definition for yolo prompting please brooks can you put that in the show anyways so i went in and started looking at that and the nice thing was is that so for a rag which is just like retrieval augmented generation or something like that, you basically have to take whatever specific knowledge you want it to be drawing from and put it into a vector space or, you know, an embedding or whatever. So, you know, artificial multi-dimensional space that you're going to put it into. The nice thing is that Snowflake through Cortex has two functions that will do that and put it into a stable embedding space for you. So the thing, if you've ever done anything with kind of text embedding in the last 10 years are a lot of them were very much
Starting point is 00:52:16 that the embeddings changed as you added more to them. So, you know, it was one of those like, oh, we've got this stable state, but if we add in these hundred more rows, it's going to change everything. Yep. You don't have to worry about that with these. So that made it very simple because then all I had to do is I added a row to our table that had all of our content with all the summaries and keywords. Right. And I just embedded title, keyword, summary. Boom.
Starting point is 00:52:42 Yep. And I got that done. Then I went back and do a little bit of cleanup on how we were, how we had done our browsing history. Yep. Just because there was, you know, if it was a sub page, it would say like the name of the page and they would have like a pipe. Oh, right.
Starting point is 00:52:56 Oh, right. Cause they're just pulling from the like, right. Yeah. So you'd get, you know, five results and they'd all say rudder stack and that when you're going to be doing it in an embedded space, that's not helpful. Yeah. Yeah. So I cleaned that up and then all I did was I used a couple different distance metrics
Starting point is 00:53:13 and ended up just using like a straight, like Euclidean distance measure, which is, which are functions that snowflake offers you out of the box. And just saying, give me the like three closest, right? Measure the distance in this embedding space. Give me the three closest. And that ended up being better in the end. Like the results, it got us to more specific results because it was looking for things that were similar to the actual web pages you were going to.
Starting point is 00:53:40 And we're talking like text similarity. Like here's this text here right yeah common characters or whatever yeah well and more okay we'll go with that and so that was really interesting particularly because you don't actually need the llm at the end of that yes for this particular use case this particular use case because usually what you do in a rag is you would take whatever those like top chunks that you pull out and you would put it into the prompt as context kind of similar to what we were doing by putting in all of our all of our blog content like summaries except you would limit it in this one you know you would in the
Starting point is 00:54:22 same of a scenario you would limit it to just the ones that were relevant. Right, so you're not blowing up the context window and then you're feeding it with a curated input. And the hope is that you're not going to have it hallucinate when you do that. You're basically giving it the information it should be pulling from and it kind of summarizes it. Well, in our case, with this specific use case, that would have made no sense. Right, we were just trying to serve a list. Yeah, totally. It is actually pretty, the results are pretty awesome though.
Starting point is 00:54:50 And because when we were talking about the use case before, you know, we were saying, okay, this is pretty specific to someone's particular interest around, you know, what warehouse are they running or what migrations or all that sort of stuff, right?
Starting point is 00:55:02 And so if you take their browsing history uh generally in the last five page views they've looked at something that's like pretty relevant and so when you do when you pull in the when you use a distance function to pull in related stuff it tends to be like really relevant right like and stuff that could get this could be and again we're talking about like a curated subset of blog posts right and so it's like okay they probably wouldn't look at this content but it's highly relevant right so it worked really well and it's the type of thing that you know in a lot of this marketing content it has like a long tail to it it's a lot of very specific stuff that's hard to kind of get in front of the right person. Yeah. But if you've gone to our Android page, our like Android SDK page. Right.
Starting point is 00:55:50 It will pull three articles that talk specifically about using Android. Yep. And so that's one thing that you can get this thing that would be a very long tail piece of content. Yep. That would be very hard to figure out who the right user is to target this with. You can pull it right into them. Yeah, so it's pretty cool. Of course, we didn't, for this
Starting point is 00:56:12 particular use case, didn't need to do any generation or any sort of generative step, but it wouldn't be difficult to do that. It was pretty awesome to see. That was actually phenomenally fast. And I will say that I do think that
Starting point is 00:56:29 using the vector databases and the embeddings is an area where I think we'll see more stuff come out of without necessarily the generative side on top of it or with the kind of, as talking earlier, John called it, LLM icing on top i agree i mean that is kind of an interesting positive consequence of this right that even if you think about things like uh vector-based search for example as opposed to just standard index search right you don't necessarily have to have a generative step at the end of that for it to be
Starting point is 00:57:05 way way better right than like standard index search yeah and a lot of people it's not like we talk about gen ai and so those two components kind of get slammed together of yep there's all of this text or chunks or tokens or whatever that's in this vector database that's holding its position of like that's where you get the semantic knowledge from yeah of it yep and then there's this deep you know this deep learning net that's on it that's basically the thing that's generating it and kind of the one you know that's where kind of the that magic so to speak happens that gets you that generation but in a lot of use cases you don't necessarily need that yep and you And, you know, just another plug for kind of Snowflake and profiles there. You can put text embeddings.
Starting point is 00:57:49 Like you could do that directly within a profiles project. Yeah. Yeah. Oh, yeah. That's interesting. That's the feature that, you know, you could just make it as a feature because it's just a function call. Right.
Starting point is 00:57:57 Once you've got it all aggregated, you can do that. And then permanently set with it. That's super interesting. Yeah. So you essentially, because yeah, entities and then permanently set with it. That's super interesting. Yeah. So you essentially, because yeah, entities and profiles are agnostic. So you could actually create a, like an entity for the embeddings. You don't even need to create an entity for the embeddings. Like, you know, you could take our existing profile project.
Starting point is 00:58:18 We have an, a feature that is your, you know, your browsing history. Oh, got it. Okay. So you're trying to make another feature and you feed that into the function. And now it's in there. Now it's in your kind of either customer 360 or in another table associated with it. Yep. I have the text embeddings of my history, what your browser history was, and now I can use it wherever I want to. Yep. So you talked about like your context window limitations. Yeah. Have you thought of other ideas in the customer 360 space where maybe the result is Boolean?
Starting point is 00:58:55 It's like some kind of feature and it's a one or a zero. Like, you know, like is new customer would be example. Obviously, that's an easy answer. But have you thought of like is customer currently angry? One or zero. do some semantic analysis like do you think there's applications like that where the context window maybe is less of a thing which it would be for a long blog post so i think for what you're talking about like just using llms with in your like 360 a little bit different it's a little yeah so it's a different one because what really blew up our context window was that static blog post yes but like 3 000 5 000 yeah but like what i think is something that can be super useful within that is to be able to do those things
Starting point is 00:59:35 because you can bring in all that information if you can you know you could you can look up to a 360 like support tickets and then be able to have the LLM summarize, was this, you know, how did this customer enter? How did this customer exit? Those types of things. Were they happy, sad, you know, angry, confused, whatever. Like most recent interaction, insert emotion. Right.
Starting point is 00:59:58 Like, yeah, I mean, yeah, we didn't talk about that even on, that is like as part of the input. Right. Or even have an output that's something like, you know, their last interaction with the customer success. Did it get resolved or was it still hanging out there? Right. Right.
Starting point is 01:00:12 So you've got a lot of those types of things you can do. I mean, we haven't done this yet, but I mean, I've told Eric, I've told you this one. I still think there's things with like looking up things like contracts. Yep. As an input, right? So you have an ID and you have a contract and then being able to extract information out of that at a customer level.
Starting point is 01:00:29 So that you can add that. Like we were talking about inputs earlier from an LLM standpoint, but this is one of those areas, Matt, you and I have talked about this a ton, where this is arguably like one of the best possible use cases for an llm as part of customer data and the pipeline but it is not sexy where it's like in your customer 360 pipeline running an
Starting point is 01:00:53 llm to generate a semantic feature over unstructured data and automatically appending that to the user like each row for each user each entity is so so annoying to do like if you don't have a streamlined pipeline right i mean it's just like really hard think about a use case where a customer has thousands of enterprise contracts and each one of them is customized like sure you start somewhere but there's all these allowances and there's like users and like all sorts of customization and imagine like analyzing each of those and extracting's like users and like all sorts of customization and imagine like analyzing each of those and extracting because you know roughly like what things you made allowances on but nobody remembers what you did especially if you have like a million products you know like
Starting point is 01:01:34 yeah or you know whatever you know but that would be right but you could run it through and say i know roughly like we sometimes we made allowances with user count sometimes we made allowances with user count. Sometimes we made allowances with whatever. And you could have that fixed. There's only three, four, five things that are levers. But then have it run through each of the contracts and pull out, oh, allowance here, they have the max 100 user count instead of the default 75. And then tag that in Salesforce or something. I mean, that'd be super powerful.
Starting point is 01:02:02 Yeah, that's where I think a lot of it, especially for customer data, is that unstructured, destructured is very powerful. It's a really good. Yeah. Yeah, it'll be fascinating. All right.
Starting point is 01:02:14 Brooks is giving us the signal. He's back from leave, obviously, because we're getting the signal that it's time to wind it down. The show's already gone back up. The quality has already gone back up. And not just because we had Cynical Data Guy, you know, two times, you know. to wind it down. The quality of the show has already gone back up. The quality has already gone back up. And not just because we had Cynical Data Guy,
Starting point is 01:02:27 you know, two times, you know. Back to back? Not as a back to back. Oh, good. Yeah, it's not back to back. Two times in a month. And you were positive today.
Starting point is 01:02:35 Yeah. I know, it was weird. Yeah, I know. He's having an off day. We'll get you back in your groove. Yeah, I was going to say, we'll get your alternate ego back.
Starting point is 01:02:45 All right, thanks for joining us alternate ego back. All right. Thanks for joining us. Subscribe if you haven't. We plan to have more fun shows like this where we get really practical, both with Matt and other guests. We'll catch you on the flip side. The Data Stack Show is brought to you by Rudderstack, the warehouse native customer data platform. Rudderstack is purpose-built to help data teams turn customer data into competitive advantage. Learn more at rutterSAC.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.