The Data Stack Show - Data Council Week: AI Isn’t Just Hype - How To Successfully Apply LLMs Today with Tristan Zajonc of Continual

Episode Date: April 17, 2024

Highlights from this week’s conversation include:Tristan's Background and Journey into Data (1:14)Evolution of Machine Learning and AI (3:13)Impact of Generative AI (6:33)MLOps and Challenges in Ear...ly Data Science (8:48)Success and Applications of AI Today (11:34)Continual AI Copilot Platform (18:04)Challenges in building remarkable AI assistants (19:58)Reliability and accuracy in AI responses (25:31)Regulation and adoption of AI assistants (31:30)Future of AI assistants and Continual AI (33:12)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Hey, listeners. We are so thankful that Ruddersat continues to help us put on this show, and they're doing something even cooler for you. They're putting on a live workshop around data modeling in San Francisco. It's next week on April 23rd. Our product manager is going to be there, and it's going to be hands-to-keyboard working with a live data set showing you how we can help solve problems with identity resolution
Starting point is 00:00:21 and other data modeling. It'll be a great discussion with some great people there, and you'll get a chance to meet some other listeners from the show. You can check it out and register by going to rudderstack.com slash events. We'd love to see you there and meet you in person. Welcome to the Data Stack Show. Each week, we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies.
Starting point is 00:00:54 The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. What's up, Data Stack Show listeners? Welcome to Episode 2 of Data Council Week 2024. We're in the field at Data Council Austin. This is the third year in a row. Eric and Kostas both had conflicts this year, so I'm filling in. I'm Brooks, the producer of the show, coming out from behind the scenes to bring you a
Starting point is 00:01:18 few special episodes this week. Matt KG, my colleague at Ruddersack, who brings over a decade of experience working in data science, is joining me to dig into the technical details. But today is all about Tristan Zients. Tristan is co-founder and CEO of Continual, and he's a returning guest. We had him on the show way back in September of 2021, which feels like many epics ago in the data world. So we're excited to catch up with Tristan today.
Starting point is 00:01:42 Tristan, welcome back. Hey, great to be back and in person here. Yeah, love getting to record live in person. It's a special treat. Well, Tristan, lots to cover today, especially in the midst of what I'm reading about, I guess, where you're now calling this the AI revolution. But before we get there, will you just give us a kind of quick background? How did you get into data and kind of get to where you are today with Continuum? Well, I'm one of those people who have been around since far before the Gen AI wave that's been happening. So I'm a statistician by training.
Starting point is 00:02:18 I was a grad student and actually coming out and basically working on statistics during my grad student days. And that was back in 2013. And I think data science was the trend there. Data science was a huge term, tons of hype around the ability to, big data and data science to use data to do more. I got quite excited by that as a general area. And, you know, kind of with the long-term idea that this all led to AI, and this was going to be a very exciting area to be in sort of a decade or multi-decade potential
Starting point is 00:02:49 career because, you know, you sort of saw AI in the future, even if it still felt a little, you know, a little far off. We got hints of it with things like what DeepMind was doing at the time with restricted enforcement learning. And then there was lots of commercial opportunities around data science itself. So then went into the startup world, founded one of the early enterprise data science platforms called Sense. That company was acquired by one of the large data platform providers, ClubEra, which was the leading provider of the Hadoop platform. So I got to see the whole big data world that was really hyped up there. Then we were really moving over towards machine learning
Starting point is 00:03:25 operations and how do we get all this into production since data science and just analytics plus wasn't enough. So saw that whole trend and participated in it. And then obviously the last two, three years, two years maybe has been this whole next wave of generative AI, which is undoubtedly probably the most excited I've been in terms of the industry. It's fun when you get excited early on and then you get more excited as things go on. So you really began with the end in mind of you were thinking about AI kind of at the beginning.
Starting point is 00:03:59 I was thinking about AI at the beginning. I mean, I know that I remember getting a talk in 2012 or 13, and I think I was excited at the time by some of the stuff that was coming out of DeepMind. They were doing deep reinforcement learning for Atari games, and it was kind of hinted, hey, you could learn from scratch. So I was excited by that. I think that was what made me think the whole industry had a long future ahead of it. But I was also, just to handle huge amounts of data was also interesting from
Starting point is 00:04:31 a technical perspective. So that was the whole data trend. And then I was a big Bayesian statistician. So there was probabilistic programming back then, which was a whole idea. So different ideas were all intellectually interesting um and but the end state i think was okay wait there's something here you know that's that maybe we could call artificial intelligence how the path to it was a little bit unclear at that time then tell us a little more about the path to it now that you've kind of lived through it and we've been through kind of you know since what you're talking about 2012 2013 you know different generations of machine learning and yesterday we were chatting on the
Starting point is 00:05:14 floor at the conference and you said you know you really kind of had this kind of personal kind of story and connection to it all can you just tell us a little about kind of your experience going through these different generations and kind of how you've seen the technology of all? Well, I do think of it as basically these three phases. I mean, the one which I was kind of alluding to just previously, which was one sort of the data science phase. You know, maybe you put big data in there as well,
Starting point is 00:05:42 but it was largely around, okay, we're, you know, think there's value to looking at data, processing data, understanding data. There is value. Data is the quote-unquote new oil or something. Maybe it wasn't exactly clear how you took that to fruition. And we knew we needed new tools to be able to handle the data at the scale that it could handle. We knew that we needed to go beyond SQL and we needed to basically start to do data science, which, you know, it was a whole set of new tools. And so that was kind of, I would say, like gen one of the sort of data science, you know,
Starting point is 00:06:15 just sort of gen one, at least for me personally, but I also think it was just in the industry. And then I think generation two was what I would say is like production machine learning or MLOps, where we all said, okay, this ad hoc stuff isn't delivering the value that we wanted. Okay, what's the problem? The problem is that we need to get this stuff into production. We need to have it impact the business if you're in the business context or actually impact end users.
Starting point is 00:06:41 And so that led to the whole kind of the MLOps trend where we said, okay, how do we efficiently and reliably scale up production machine learning where we can deliver this into real applications to make an impact? And that had a good run, which I participated in. And it still obviously is important. And then I think this last wave, Gen AI, honestly is a huge transformation. I mean, it's a big break. It's actually not a continuation, I think, from the traditional MLOps world. It completely changed the capabilities of these models. So all of a sudden, we were all talking about things like forecasting and classification
Starting point is 00:07:25 and maybe little bits of optimization. And now we're thinking about generating some of the most creative texts that you could imagine or images, or now we're thinking about agents and kind of autonomous agents that are taking these tasks. So we've opened up a huge new set of potential application possibilities. And then I think secondarily, and maybe even equally importantly, was that it just was immensely simpler to implement and to productionize. And so the sort of the rise of sort of in-context learning, zero-shot learning, the fact that
Starting point is 00:07:59 you could have these large foundation models, get amazing breakthrough results with very limited effort. Results that were never previously even possible to do. You know, if you think like use cases, even as basic as like summarization, right? Hey, you know, you have a whole bunch of user reviews that summarize them or identify, you know, that was just such a hard problem to actually productionize previously. And then it became like something that anybody could implement. Any engineer could implement in an hour.
Starting point is 00:08:25 I mean, honestly, not even an hour in a playground environment of one of these tools, you could get that. You get amazing results that previously you never were able to get. So that was a real profound transformation, both in terms of how I perceived what these models could do and the timeline that that was arriving at. And the ability to actually put it into real products, to productionize it, which was historically a huge challenge. And something that I've been,
Starting point is 00:08:50 you tried to be simplifying with MLOps and I had spent a lot of time both at Cloudera trying to think about what does production machine learning look like and how do you make it simpler? And then in the early stages of my current company, Continual was really motivated by how do you radically simplify production machine learning and what are the kind of different ways you can think about that? or in the early stages of my current company, Continual was really motivated by how do you radically simplify production
Starting point is 00:09:05 machine learning and what are the kind of different ways you can think about that. And then, you know, kind of this Gen AI thing was a huge unlock in terms of realizing that potential. Matt, you have, you have lived through this and have war stories as well. Anything to add, maybe, especially on the mlop side coming from you yeah i think because i think like because i got started about 10 years ago and i think
Starting point is 00:09:30 we first were in there it was kind of this idea we've got all this data clearly obviously we're going to do amazing things with it but there was also kind of this idea of like it'll be easy and cheap like when people first started and then you kind of got into it and you're like well look i can build a model in python or whatever and look it's done and it's like but it can't go anywhere it's stuck now now what so what i can predict churn okay how do you get it to someone who's actually going to do anything with it yeah and that's where within it was kind of that whole thing there that you kind of go up to. But, but yeah, I will, I think also that that that change with the gen AI, I mean, we got a project at Rudderstock that I just ran where, you know, classifying tickets, right, having it just say, what was this customer success ticket about? And like, that's a project that like, 10 years ago ago we would have started with hey customer success you need to take the next three months and just label tickets yeah and we were able to i got the first results in an hour you know like that's better results and better results dramatically
Starting point is 00:10:35 better results than you would have been able to get even with all that custom yes all that custom label data we would have still gotten even much harder and we were able to ask it to do things like not just say hey pick from these labels and do it but also say like if it you know is there a source it names tell me what the source is yeah and format it in this way and do these types of things things that just like as you said you know even five years ago like i mean i've worked with people who've been doing kind of the nlp stuff and like they couldn't do that. They were doing some great things, but they weren't doing that five years ago in a lot of ways. Yeah,
Starting point is 00:11:09 no, absolutely. So it opens up a whole new, I mean, and then, and then as soon as you, it's that easy, it's that good and that easy. I mean,
Starting point is 00:11:17 your creative juices start flowing in terms of where you can apply it. And so even if, you know, individually, you know, each individual application may be relatively small, you know, in aggregate when you add them all up, you could totally change the way you do support management. That is a use case that I see now.
Starting point is 00:11:31 It's one of the most – it's getting completely disrupted. It's a huge impact on the way we do customer success. It's one of the most obvious use cases, and people are already seeing a lot of success with that. What are some of the other use cases and people are already, you know, seeing a lot of success with that. What are some of the other use cases? I think, you know, we're in this kind of put in the hype cycle where it's just, there's almost this just like general mandate. It's like figure out, you know, some way to harness AI at our company. Put it in, just put it in.
Starting point is 00:12:01 Just slap AI on, you know, the customer success use case is one. What are some of the other ones where you're seeing people actually have success with this stuff today? Well, I mean, it is. So, yeah, there is a sort of gap, a hype between reality. I think the hype is justified because the future is so exciting. And it does feel like we're in a world where we are going to have future breakthroughs. And these models are going to become increasingly capable and they are going to be able to do more and more. But let's be honest, you know, that's, you know, we're not there yet in terms of the full potential. And so if you're just saying sort of, OK, in reality today with current models, you know, where do you apply them and see success?
Starting point is 00:12:41 I mean, one is definitely any unstructured to structured information task has just been completely opened up. So if you, and the example you just gave is an example of that, but there's many more broader examples. You have huge amounts of information coming in. You want to pull information out of that, put it into some sort of workflow process, even if you don't automate that workflow process. But historically, we're working with a company right now that essentially does loan decisions. There's loan officers. And the first step is the end loan applicant uploads a lot of data, including all their bank statements, transaction history, which they largely just download from their bank portal and upload as a PDF.
Starting point is 00:13:24 And then the loan officer wants to extract, not just get structured information out of it, like what transactions were there and what is the balance, but even subjective questions like, do they have a regular payment schedule? Are they getting regular payments? Do they have one or more main sources of income? Do they have other outgoing payments that are regularly reoccurring outgoing payments, right? Because they have other loans or like car loans or things like that. And that's all from kind of a really messy, very heterogeneous set of data. Yeah. And they now, you know, this company Indecina can now make that incredibly easy for these
Starting point is 00:13:58 loan officers to, first of all, they can, you know, pre-can stuff out of the back, you know, out of the box that, you know, as part of this particular product, they get a whole bunch of, you know, stuff out of the box as part of this particular product. They get a whole bunch of answers out of the box. And then they actually also can even enable the loan offers themselves to decide on custom questions that they want to ask of the data that can now be pushed even into the hands of the end user so that they can do what previously had built in some ways a quote-unquote new model, although it's just a new prompt, essentially, in these days. And then they can build that into a product experience. So this one domain that's super successful is
Starting point is 00:14:36 absolutely this sort of unstructured, destructured information. I see that delivering it. Another one, obviously, are these conversational experiences. We've all experienced it with ChatGPT. And I would say where today I'm seeing the most success for that is one, these kind of product success, product support type use cases where the customer, you have a product and you're asking a question about, it's a complicated product, and you're sort of asking a question about the product itself. Like, how do I do X? And it's a complicated product and you're sort of asking a question about the product itself like how do i do x and it's just you know or where it is x in this product or something like that like
Starting point is 00:15:08 what is my w-2 you're an ihr and it's just like you're diverting hey you absolutely divert you know 50 or you know some significant percent of your support cases while actually not being annoying you know traditionally these chatbots were really annoying and now you're like hey yeah you know i actually would like to ask the chatbot because i don't want to wait an hour even if you have a one hour sla on your you know customer support you know human that's actually kind of that's the annoying task versus hey now you're starting to say you can ask this this chatbot or this you know assistant inside of a product and then it works and i would say that does work these we've gotten to the place where that works it works well well. And then the next level, which is something that, you know, at Continual now we're thinking a lot about, is actually being able to ask questions into the application data that's inside of these applications.
Starting point is 00:15:54 So this is, you know, this doesn't replace, it is, especially these ad hoc questions, one-off questions that the product itself didn't have. It's not repeated efforts were unique enough that the product didn't just create a button that just wore a pre-canned dashboard or something that answers it. There's a class of type of questions. And that's how we use ChatGPT, right? We kind of do these loosey-goosey kind of questions that don't really know what to Google search and you kind of ask. And so there's a version that happens, I think, within products. I see a lot of success for that. Everybody's excited for the next.
Starting point is 00:16:30 I mean, we can talk about what's coming or something. Because these are what are working today. Everybody's trying to get to the next level, right? Where you think about automating work and agents and things like the multimodality and true generation of different asset types and anything in certain domains, like obviously in images and stuff, you're seeing use cases.
Starting point is 00:16:54 But those are two ones. There is another one, which is just generating. There are certain workflows generating RFPs, generating like draft job postings, generating product descriptions, where generating summaries, right? Those are, that draft job postings, generating product descriptions, generating summaries. That's a third version, which is really focusing on that generative part that does work today. And no question, if your application has that, it completely works and delivers value. That was one of the first ones I saw was talking with people and they were like,
Starting point is 00:17:24 look, we take these proposals that we need to write and we feed it in and it shoots out framework fills in most of it and we just go in and edit it and do that and it was like took the workload down incredibly it's just an incredibly time consuming thing today
Starting point is 00:17:39 and it's one where these models aren't producing even on the marketing, some people try to do this marketing, but I mean, let's be honest, these models don't produce great marketing materials. I mean, you know, the readers, you're not really doing the readers a service quite yet. You know, sometimes when you rely too heavily on these tools. But there are a lot of other types of things, right, that are pretty formulaic. Yeah. Like a job posting where you have kind of the existing template and you're modifying it.
Starting point is 00:18:05 You kind of actually want a lot of consistency in the language. Product descriptions are similar where you really want a brand voice that's doing that or RFPs where it's like, hey, we're just making sure we're getting the work done and it's documented. And it works quite well there. We've talked around Continual a lot, I think. And I do want to get to kind of what's
Starting point is 00:18:27 next and kind of what pieces need to fall in place for us to take things to the next level. But can you just, can you tell us about Continual specifically and kind of where Continual fits into all of this today? Sure. So Continual, we're building what we call an AI co-pilot platform for applications. So our goal is to help developers build custom embedded AI assistants inside of their products. So core thesis is that every application out there is going to embed a co-pilot, or you could call it, that's a name, but you could call it an AI assistant or sidekick or something inside of the product.
Starting point is 00:19:06 And as these models get better and better and as these assistants get better and better, they're going to become more indispensable. And you're seeing that today. You see that with Microsoft Office 365 Copilot, the Gemini Copilot that's part of the Office suite from our Office 365 suite from Microsoft. They're probably one of the more ahead folks. You see that with what Google is doing with what they previously called Google Duet, but now they're calling Google Gemini for the workspace suite. So these are systems that are embedded into software applications. You see that with Shopify is doing it with Shopify's Sidekick, which is kind of the co-pilot for e-commerce that sits inside of Shopify. Intuit is another kind of leading
Starting point is 00:19:43 example. They're doing it for the whole set of Intuit products, like from TurboTax to QuickBooks, Credit Karma. And these really are, you know, so the basic idea is, you know, all of our applications are going to change. They're all going to have an assistant inside of them. That obviously is going to include a conversational element to it,
Starting point is 00:20:01 but it doesn't have to just be conversation, right? It's a multi-modality, multi-user interface set of enhancements that you can add to your products. But it is intimately connected to your product, your domain. And so we're helping people do that, helping it be deeply connected to the data of your product, be connected to all the APIs of your product
Starting point is 00:20:22 and both the backend APIs and the front-end experiences be integrated both conversationally and kind of getting out of the box conversational capabilities or conversational UI for your application, but also then build other features that are more general features. Think of like, you know, summaries or these information extraction tasks all on what we call like a standardized co-pilot stack or engine. So all the data is flowing into one place. You're being able to monitor it in a centralized place. You can refine it. You can evaluate it.
Starting point is 00:20:48 You can see what users are doing, where they're failing. And then you can kind of loop back and continually improve it, right? To use the name Continual. And so I'm super excited by it. I think we're going through the evolution that we just talked about generally, where you're going to start with sort of these bread and butter,
Starting point is 00:21:03 more simpler use cases. But I think what's exciting generally about this and our goal is how do you build not just these like, you know, kind of a chatbot 1.5, but like how do you build a remarkable, indispensable assistant that just enables you to do so much more, more things faster and do things that you were never able to previously do. And I think that, you know, it exists today, but, you know, but it's going to exist even more in the future. What do you see as kind of to get to, again, it exists today, but a more kind of widespread experience where we're able to build these really remarkable things using this technology.
Starting point is 00:21:46 What are the kind of next kind of core pieces or let's say problems that we have to solve to get there? I think the biggest one is reliability at low latency and low cost. So, you know, in the assistant use cases, you need to respond quickly. You need to be relatively cheap, right, to be able to deliver it to the customer. And you really need reliability for the task so it becomes trusted. And that's a hard sort of three set. They're conflicting three sets of things. Yeah. And, you know, some of them aren't appreciated.
Starting point is 00:22:19 Like the latency one, you might say, hey, let's do, you know, let's do, let's use GPT-4 and we'll do reflection over our answer and then reevaluate whether we successfully answered it and answer again, you know, multi-shot kind of responses, which can, you know, on benchmarks can improve performance. But if you're in a conversational chat experience, it's actually very quickly becomes quite painful. And so, and as you drive latency down drive latency down, you typically have to run smaller models. Typically smaller models are not as reliable. Even GPT-4 is not reliable enough for certain types of applications
Starting point is 00:22:52 like a lot of function calling, which is calling APIs, which is very important to these models. Certainly the ability to call multiple functions, more complicated queries. So we have customers that want to do, you're in like a CRM, right? And you want to,
Starting point is 00:23:10 here's two examples of hard questions that are not possible today. So like one would be something like say, what happened yesterday? Or like what were the top complaints over the last month? Yeah. And that's not something that can easily, traditionally, one of the major technology
Starting point is 00:23:24 or kind of ways we connect LLMs to particular applications is through retrieval augmentation, real RAG, where we do retrieval over some knowledge base and then kind of enrich the context and then the LLM responds with that enriched information. But there's context window lengths and limits there. And so something that's a broad question like that, it's very hard to do. You can't do retrieval over that easily because you really need all the data. And you're
Starting point is 00:23:50 trying to say, there's this massive amount of things that happened yesterday. Now go summarize them or pull out the major complaints out of all this. Now you could do it in a batch mode, right? You could build a customized workflow for that, but it's not something today that's easy. It's actually easy to do that without kind of crafting a kind of a customized experience for that. The other one is something like, you know, you say like, you know, go into every, using the CRM example, like, you know, go into every deal that we currently have open and flag any customers that have, that I should respond to and create a to-do task, you know task for that customer that I should follow up on them. And that's a task that you could imagine giving somebody on a sales team, right?
Starting point is 00:24:30 Hey, go and do deal review, create a summary and flag, create to-do items or something like that. But if you think about how to implement, how an agent or kind of a co-pilot might do that, today requires it to do potentially hundreds of calls to an API, the backend. Okay, look up all the customers, look at all the records, and analyze that data. It's feasible to do. It feels like we're on the borderline of doing it, but it's not really. Today, it's not really. These models kind of can't handle that level of the number of function callings without a whole bunch of work.
Starting point is 00:25:00 So we're doing some of that work to make that possible, but without a whole bunch of that work, that's not possible. Yeah. Yeah. I think there's, I think both of work. So we're doing some of that work to make that possible, but not a whole bunch of that work that's not possible. Yeah. Yeah. I think there's, I think both of those, I mean, what I'm excited to, you know, just using those as motivating to kind of go get to the point of the future. You know, there are potential things on the horizon, right? So we have now models that have massive context windows. The Gemini model has, you know, 1 million, you know, well, 1 million context window in the publicly available API and 1 million tokens in the publicly available API. And it has up to 10 million that they've shown that they can get it to, which is where you could solve that sort of retrieval use case in a fundamentally different way.
Starting point is 00:25:37 Now, they struggle with latency. In their case, if you do that, it takes 60 seconds currently to give a response. So that still doesn't work. But we're on that. We're pushing that envelope there. And then obviously, you know, there's a lot of excitement around agents and planning and reasoning. And we all sort of recognize that, you know, that is still a limitation. And there's obviously a lot of teams working on that.
Starting point is 00:26:00 And, you know, I think progress will be made this year on that. But a little bit unclear how fast the progress will happen. Sure. So with that, do you kind of feel, because I know, like I know especially when it first came out and you could point out some of the errors in it and the response that I know I got from a lot of people was like, well, yeah, but just imagine what it'll be like a year from now or whatever.
Starting point is 00:26:24 But you do read some other stuff that talks about that like kind of we're hitting some of the limits of like parameters and size or like latency and stuff like that do you feel like there we can get to where you're talking about with just what we have or is there going to have to be some type of like architecture or modeling change needed in order to get there? That's a great question. And I don't think I have a definitive answer there, except that I do think we're going to need some breakthroughs to get the level of planning performance reliability
Starting point is 00:27:01 that we need at a low latency and low cost. Like I do think we're, I mean, if you look at, you know, like for instance, the Google, the new Google models, presumably they have an unlimited compute budget there, right? And they are just meeting GPT-4 level, right? If you look at what Anthropic just released, right, which presumably was trained for hundreds of billions of dollars, they put you with Cloud 3 with their Opus model.
Starting point is 00:27:27 It's just beating or matching now a GPT four level. Right. And it's not actually really actually solving, you know, it's kind of meeting GPT four level four, like what we're currently benchmarking against. Yeah. Actually not really the next gen applications.
Starting point is 00:27:44 Like the next gen applications are more like these autonomous assistants that maybe could fully actuate your computer and your applications. And honestly, none of them even can do that. They do it, and they do it so slowly and really badly. I think these models, I think next token prediction and these autoregressive models can go very far. Right. There's no question about that.
Starting point is 00:28:04 And I think you can solve some of the latency and cost things with these mixture of expert type models, which is what is being done with the GPT-4 Turbo and these types of, you know, Databricks just released something today that was along the slide. So you can get really far. You can get really far with this.
Starting point is 00:28:22 But I think intuitively intuitively you just think about it you think hey there's definitely opportunities to change the way we do planning and there's lots of different research directions that people have talked about and yes something something's gonna work right and you think like because i know when i some of the stuff i've done with nlp early on one of the big things that we would talk about that was an issue that people didn't realize was they were thinking about just accuracy, right? It's like, I gotta be accurate. And it's like, well, accuracy is important, but like kind of how wrong you are is also important. Because if you give like an answer and it's really wrong, especially if it's going to be in a B2C context, people go, that's nothing.
Starting point is 00:29:04 And they don't go back to it. Yeah. And it's like, you've lost them right there. Yeah. Well, yeah, no, I have a funny story. It's sort of funny. I mean, you can test it now, see if they fixed it. So I, the clog three models from, from Anthropic was, which just came out, you know, a couple
Starting point is 00:29:17 of weeks ago. And my test is of course the, the egocentric test where you ask, who is Tristan Zions? Because I know the reason I asked that is not because it was mainly because I know I don't, you know, it's a pretty long tail question. You know, so not many people, you know, it may fail at it. Right. And I know the correct answer. I really know what my bio is and if it gets it correct. And this model, which is like the biggest GPT-4 quality model says that Tristan Zients is the CEO of Anthropic. And it's like, that's a very weird hallucination. All of the other things, right, that this model can do. And the fact that it probably knows a lot about Anthropic generally because, you know,
Starting point is 00:29:56 it probably does have a bunch of information. Why would it make that hallucination with, okay, there's a random name. And it's probably something I showed it to actually an Anthropic engineer last night and he was like we're gonna we're gonna we were to fix it so it might be fixed by the time it's called um but on the other hand i mean that that model is amazing and i actually dismissed to your point around k you know it's not reliable i dismissed you know quad three as a result of that because i was like okay it's you know it's it shouldn't refuse it just said he's not a notable figure right or something like that but instead it kind of did the solutionation.
Starting point is 00:30:26 But more recently, I experimented and went, oh, no, it is actually a GPT-4 caliber model that actually has some advantages to GPT-4. So I think there is – yeah, this reliability is huge. I mean, I talked to – well, without naming company names. We work with a large financial, like accounting company. And it's huge. I mean, hallucination is a huge problem because they view it as like, they'll get sued. You know, they basically have to give tax advice to, you know, it's like regulated, same thing with financial services sector is where you're getting, you're not a certified financial planner or advisor.
Starting point is 00:31:05 And so there's a lot of regulation around giving financial advice, tax advice. And it makes so much sense. I mean, there's so many tax questions that honestly, GPT-4 can do a pretty good job at answering. Right. But there's a lot of legitimate concern around it really has to meet a threshold that's very high. And we don't really quite know. I mean, obviously human tax advisors make mistakes too. And so in that context, I mean,
Starting point is 00:31:30 what you see is you see the first place that these co-pilots get adopted is actually on the back line. So the human tax advisor is using the co-pilot and sort of is checking in and accelerating their work and still being on the hook for the final answer. I was actually going to ask that if you thought that was the first, like the stepping stone to it is advise the advisor. And then once we feel like it gets to a certain point,
Starting point is 00:31:53 then you can start to do it. I also think that's probably going to be somewhat of a regulation question too, of like, just how do you handle that? Who's responsible for it? If you have, you know, can you certify a chatbot to be a financial planner or something like that? Definitely. I mean, we see in the customers that are like interested in what we offer, we see basically two crowds. One is the startups. They're all saying, hey, how do I deliver like next generation, like breakthrough experiences to the end user? You know, I'm willing to take some risk and I just want to deliver to the end user something that wasn't previously impossible. And that's obviously the startup opportunity.
Starting point is 00:32:26 And then in the large enterprises, yes, there's a lot of hesitancy around customer facing assistance and features, but there's a lot of appetite internally. So internally, if you have employees, insurance companies, another one there, basically their insurance plans are extremely complicated in terms of what are the coverages in any given plan. Even for the agent on the back end to understand, you know, if there's a, you know, if my dog eats my couch, you know, is that covered? Where exactly is that an excluded household damage? Right. Right.
Starting point is 00:33:01 For this particular plan of this particular, you this particular vintage or whatever. And so just these kind of like, hey, assisting the agent on the back end to kind of answer those questions or automate some task is definitely the first place to start. I think it's a huge use case, obviously, but I do think to really be disruptive, you kind of got to push it all the way to the end user experience. Yeah, that's exciting. Well, we're at the end user experience. Yeah. Yeah. That's exciting. Well, we're at the buzzer here,
Starting point is 00:33:27 but yeah, so, so wonderful to hear from you just kind of on where we are today and here, you know, just about, we have unlocked so much and can do so many things so much faster. A long way to go to, I think,
Starting point is 00:33:43 get to where we, the end vision. Humanity is safe. That's right. But Tristan, tell us for folks who are interested in Continual AI Copilot for your applications where can they find out more about Continual and maybe connect with you? Oh, absolutely. So, I mean, continual.ai, easy. You know, if you're thinking about building an AI co-pilot for an internal application, a support application, an end product, you know, we are a very easy way to do that, make it remarkable, improve it over time.
Starting point is 00:34:21 And we're in early access right now. So you can sign up on our website, probably going to announce some things in the coming weeks. So stay tuned. Maybe by the time this is out, we'll be out or shortly thereafter, more probably. Awesome. Exciting. Well, Tristan, thanks so much for sitting down for a few minutes in person here with us. And hopefully, yeah, this is second time on the show. So we'll have to get you back on in another couple of years and get another update. Absolutely. Look forward to it. Thank you.
Starting point is 00:34:49 Working towards your gold jacket. Thank you. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.