The Data Stack Show - Data Council Week: AI Isn’t Just Hype - How To Successfully Apply LLMs Today with Tristan Zajonc of Continual
Episode Date: April 17, 2024Highlights from this week’s conversation include:Tristan's Background and Journey into Data (1:14)Evolution of Machine Learning and AI (3:13)Impact of Generative AI (6:33)MLOps and Challenges in Ear...ly Data Science (8:48)Success and Applications of AI Today (11:34)Continual AI Copilot Platform (18:04)Challenges in building remarkable AI assistants (19:58)Reliability and accuracy in AI responses (25:31)Regulation and adoption of AI assistants (31:30)Future of AI assistants and Continual AI (33:12)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Hey, listeners.
We are so thankful that Ruddersat continues to help us put on this show,
and they're doing something even cooler for you.
They're putting on a live workshop around data modeling in San Francisco.
It's next week on April 23rd.
Our product manager is going to be there,
and it's going to be hands-to-keyboard working with a live data set
showing you how we can help solve problems with identity resolution
and other data modeling.
It'll be a great discussion with some great people there,
and you'll get a chance to meet some other listeners from the show.
You can check it out and register by going to rudderstack.com slash events.
We'd love to see you there and meet you in person.
Welcome to the Data Stack Show.
Each week, we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
What's up, Data Stack Show listeners?
Welcome to Episode 2 of Data Council Week 2024.
We're in the field at Data Council Austin.
This is the third year in a row.
Eric and Kostas both had conflicts this year, so I'm filling in.
I'm Brooks, the producer of the show, coming out from behind the scenes to bring you a
few special episodes this week.
Matt KG, my colleague at Ruddersack, who brings over a decade of experience working in data
science, is joining me to dig into the technical details.
But today is all about Tristan Zients.
Tristan is co-founder and CEO of Continual, and he's a returning guest.
We had him on the show way back in September of 2021, which feels like many epics ago in
the data world.
So we're excited to catch up with Tristan today.
Tristan, welcome back.
Hey, great to be back and in person here.
Yeah, love getting to record live in person. It's a special treat.
Well, Tristan, lots to cover today, especially in the midst of what I'm reading about, I guess,
where you're now calling this the AI revolution. But before we get there, will you just give us
a kind of quick background? How did you get into data and kind of get to where you are today with Continuum?
Well, I'm one of those people who have been around since far before the Gen AI wave that's been happening.
So I'm a statistician by training.
I was a grad student and actually coming out and basically working on statistics during my grad student days.
And that was back in 2013.
And I think data science was the trend there.
Data science was a huge term, tons of hype around the ability to,
big data and data science to use data to do more.
I got quite excited by that as a general area.
And, you know, kind of with the long-term idea that this all led to AI,
and this was going to be a very exciting area to be in sort of a decade or multi-decade potential
career because, you know, you sort of saw AI in the future, even if it still felt a little,
you know, a little far off. We got hints of it with things like what DeepMind was doing at the
time with restricted enforcement learning. And then there was lots of commercial opportunities
around data science itself. So then went into the startup world, founded one of the early enterprise data science platforms called Sense.
That company was acquired by one of the large data platform providers, ClubEra,
which was the leading provider of the Hadoop platform.
So I got to see the whole big data world that was really hyped up there.
Then we were really moving over towards machine learning
operations and how do we get all this into production since data science and just analytics
plus wasn't enough. So saw that whole trend and participated in it. And then obviously the last
two, three years, two years maybe has been this whole next wave of generative AI, which is
undoubtedly probably the most excited I've
been in terms of the industry.
It's fun when you get excited early on and then you get more excited as things go on.
So you really began with the end in mind of you were thinking about AI kind of at the
beginning.
I was thinking about AI at the beginning.
I mean, I know that I remember getting a talk in 2012 or 13,
and I think I was excited at the time by some of the stuff that was coming out of DeepMind.
They were doing deep reinforcement learning for Atari games,
and it was kind of hinted, hey, you could learn from scratch.
So I was excited by that.
I think that was what made me think the whole industry had a long future ahead of it.
But I was also, just to handle huge amounts of data was also interesting from
a technical perspective. So that was the whole data trend. And then
I was a big Bayesian statistician. So there was
probabilistic programming back then, which was a whole idea. So different ideas
were all intellectually interesting um and but the end state i think was okay wait there's something here
you know that's that maybe we could call artificial intelligence how the path to it was a little bit
unclear at that time then tell us a little more about the path to it now that you've kind of lived
through it and we've been through kind of you know since what you're talking about 2012
2013 you know different generations of machine learning and yesterday we were chatting on the
floor at the conference and you said you know you really kind of had this kind of personal kind of
story and connection to it all can you just tell us a little about kind of your experience
going through these different generations
and kind of how you've seen the technology of all?
Well, I do think of it as basically these three phases.
I mean, the one which I was kind of alluding to just previously,
which was one sort of the data science phase.
You know, maybe you put big data in there as well,
but it was largely around, okay, we're, you know, think there's value to looking at data, processing data, understanding data.
There is value.
Data is the quote-unquote new oil or something.
Maybe it wasn't exactly clear how you took that to fruition.
And we knew we needed new tools to be able to handle the data at the scale that it could handle.
We knew that we needed to go beyond SQL and we needed to basically start to do data science,
which, you know, it was a whole set of new tools.
And so that was kind of, I would say, like gen one of the sort of data science, you know,
just sort of gen one, at least for me personally, but I also think it was just in the industry.
And then I think generation two was what I would say is like production machine learning
or MLOps, where we all said, okay, this ad hoc stuff isn't delivering the value that
we wanted.
Okay, what's the problem?
The problem is that we need to get this stuff into production.
We need to have it impact the business if you're in the business context or actually
impact end users.
And so that led to the whole kind of the MLOps trend where we said, okay, how do we efficiently and reliably scale up production machine learning where we can deliver this into real applications to make an impact?
And that had a good run, which I participated in.
And it still obviously is important.
And then I think this last wave, Gen AI, honestly is a huge transformation.
I mean, it's a big break.
It's actually not a continuation, I think, from the traditional MLOps world.
It completely changed the capabilities of these models.
So all of a sudden, we were all talking about things like forecasting and classification
and maybe little bits of optimization.
And now we're thinking about generating some of the most creative texts
that you could imagine or images,
or now we're thinking about agents and kind of autonomous agents
that are taking these tasks.
So we've opened up a huge new set of potential application possibilities.
And then I think secondarily, and maybe even equally importantly, was that it just was immensely simpler to implement and to productionize.
And so the sort of the rise of sort of in-context learning, zero-shot learning, the fact that
you could have these large foundation models, get amazing breakthrough results with very
limited effort.
Results that were never previously even possible to do.
You know, if you think like use cases, even as basic as like summarization, right?
Hey, you know, you have a whole bunch of user reviews that summarize them or identify, you know,
that was just such a hard problem to actually productionize previously.
And then it became like something that anybody could implement.
Any engineer could implement in an hour.
I mean, honestly, not even an hour in a playground environment of one of these tools, you could
get that.
You get amazing results that previously you never were able to get.
So that was a real profound transformation, both in terms of how I perceived what these
models could do and the timeline that that was arriving at.
And the ability to actually put it into real products, to productionize it,
which was historically a huge challenge.
And something that I've been,
you tried to be simplifying with MLOps
and I had spent a lot of time both at Cloudera
trying to think about what does production machine learning
look like and how do you make it simpler?
And then in the early stages of my current company,
Continual was really motivated
by how do you radically simplify
production machine learning and what are the kind of different ways you can think about that? or in the early stages of my current company, Continual was really motivated by how do you radically simplify production
machine learning and what are the kind of different ways you can think about
that. And then, you know,
kind of this Gen AI thing was a huge unlock in terms of realizing that
potential.
Matt, you have,
you have lived through this and have war stories as well.
Anything to add, maybe, especially on the mlop side coming from you yeah i think
because i think like because i got started about 10 years ago and i think
we first were in there it was kind of this idea we've got all this data
clearly obviously we're going to do amazing things with it but there was also kind of this idea of
like it'll be easy and cheap like when people first started and then you kind of got into it and you're like well look i can build a model in python or whatever and look it's done
and it's like but it can't go anywhere it's stuck now now what so what i can predict churn okay how
do you get it to someone who's actually going to do anything with it yeah and that's where within
it was kind of that whole thing there that you kind of go up to. But, but yeah, I will, I think also that that that change with the gen AI, I mean, we got a project at Rudderstock that I just ran where, you know, classifying tickets, right, having it just say, what was this customer success ticket about? And like, that's a project that like, 10 years ago ago we would have started with hey customer success
you need to take the next three months and just label tickets yeah and we were able to i got the
first results in an hour you know like that's better results and better results dramatically
better results than you would have been able to get even with all that custom yes all that custom
label data we would have still gotten even much harder and we were able to ask it to do things like not just
say hey pick from these labels and do it but also say like if it you know is there a source it names
tell me what the source is yeah and format it in this way and do these types of things things that
just like as you said you know even five years ago like i mean i've worked with people who've
been doing kind of the nlp stuff and like they couldn't do that. They were doing some great things,
but they weren't doing that five years ago in a lot of ways.
Yeah,
no,
absolutely.
So it opens up a whole new,
I mean,
and then, and then as soon as you,
it's that easy,
it's that good and that easy.
I mean,
your creative juices start flowing in terms of where you can apply it.
And so even if,
you know,
individually,
you know,
each individual application may be relatively small,
you know, in aggregate when you add them all up, you could totally change the way you do support management.
That is a use case that I see now.
It's one of the most – it's getting completely disrupted.
It's a huge impact on the way we do customer success.
It's one of the most obvious use cases, and people are already seeing a lot of success with that.
What are some of the other use cases and people are already, you know, seeing a lot of success with that. What are some of the other use cases? I think, you know,
we're in this kind of put in the hype cycle where it's just,
there's almost this just like general mandate. It's like figure out, you know,
some way to harness AI at our company.
Put it in, just put it in.
Just slap AI on, you know, the customer success use case is one.
What are some of the other ones where you're seeing people actually have success with this stuff today?
Well, I mean, it is. So, yeah, there is a sort of gap, a hype between reality.
I think the hype is justified because the future is so exciting.
And it does feel like we're in a world where we are going to have future breakthroughs.
And these models are going to become increasingly capable and they are going to be able to do more and more.
But let's be honest, you know, that's, you know, we're not there yet in terms of the full potential.
And so if you're just saying sort of, OK, in reality today with current models, you know, where do you apply them and see success?
I mean, one is definitely any unstructured to structured information task has just been completely opened up. So if you, and the example you just gave is
an example of that, but there's many more broader examples. You have huge amounts of information
coming in. You want to pull information out of that, put it into some sort of workflow process,
even if you don't automate that workflow process. But historically, we're working with a company right now that essentially does
loan decisions. There's loan officers. And the first step is
the end loan applicant uploads a lot of data, including
all their bank statements, transaction history,
which they largely just download from their bank portal and upload as a PDF.
And then the loan officer wants to extract, not just get structured information out of it,
like what transactions were there and what is the balance, but even subjective questions like,
do they have a regular payment schedule? Are they getting regular payments? Do they have one or more
main sources of income? Do they have other outgoing payments that are regularly reoccurring
outgoing payments, right? Because they have other loans or like car loans or things like that.
And that's all from kind of a really messy, very heterogeneous set of data.
Yeah.
And they now, you know, this company Indecina can now make that incredibly easy for these
loan officers to, first of all, they can, you know, pre-can stuff out of the back, you
know, out of the box that, you know, as part of this particular product, they get a whole bunch of, you know, stuff out of the box as part of this particular product. They get a whole bunch of answers out of the box.
And then they actually also can even enable the loan offers themselves to decide on custom questions that they want to ask of the data that can now be pushed even into the hands of the end user so that they can do what previously
had built in some ways a quote-unquote new model,
although it's just a new prompt, essentially,
in these days. And then
they can build that into a product experience. So this
one domain that's super successful is
absolutely this sort of unstructured, destructured
information. I see that delivering
it. Another one, obviously, are these
conversational experiences.
We've all experienced it with ChatGPT. And I would say where today I'm seeing the most success for that
is one, these kind of product success, product support type use cases where the customer,
you have a product and you're asking a question about, it's a complicated product,
and you're sort of asking a question about the product itself. Like, how do I do X? And it's a complicated product and you're sort of asking a question about the product itself like how do i do x and it's just you know or where it is x in this product or something like that like
what is my w-2 you're an ihr and it's just like you're diverting hey you absolutely divert you
know 50 or you know some significant percent of your support cases while actually not being
annoying you know traditionally these chatbots were really annoying and now you're like hey yeah
you know i actually would like to ask the chatbot because i don't want to wait an hour even if you have a one
hour sla on your you know customer support you know human that's actually kind of that's the
annoying task versus hey now you're starting to say you can ask this this chatbot or this you know
assistant inside of a product and then it works and i would say that does work these we've gotten
to the place where that works it works well well. And then the next level, which is something that, you know, at Continual now we're thinking a lot about, is actually being able to ask questions into the application data that's inside of these applications.
So this is, you know, this doesn't replace, it is, especially these ad hoc questions, one-off questions that the product itself didn't have.
It's not repeated efforts were unique enough that the product didn't just create a button that just wore a pre-canned dashboard or something that answers it.
There's a class of type of questions.
And that's how we use ChatGPT, right?
We kind of do these loosey-goosey kind of questions that don't really know what to Google search and you kind of ask.
And so there's a version that happens, I think, within products.
I see a lot of success for that.
Everybody's excited for the next.
I mean, we can talk about what's coming or something.
Because these are what are working today.
Everybody's trying to get to the next level, right?
Where you think about automating work and agents and things like the multimodality
and true generation of different asset types
and anything in certain domains,
like obviously in images and stuff,
you're seeing use cases.
But those are two ones.
There is another one, which is just generating.
There are certain workflows generating RFPs,
generating like draft job postings,
generating product descriptions,
where generating summaries, right? Those are, that draft job postings, generating product descriptions, generating summaries.
That's a third version, which is really focusing on that generative part that does work today. And no question, if your application has that, it completely works and delivers value.
That was one of the first ones I saw was talking with people and they were like,
look, we take these proposals
that we need to write and we feed it in and it
shoots out framework
fills in most of it and we just go in and edit it
and do that and it was like
took the workload down incredibly
it's just an incredibly
time consuming thing today
and it's one where
these models aren't producing
even on the marketing, some people try to do this marketing, but I mean, let's be honest, these models don't produce great marketing materials.
I mean, you know, the readers, you're not really doing the readers a service quite yet.
You know, sometimes when you rely too heavily on these tools.
But there are a lot of other types of things, right, that are pretty formulaic.
Yeah.
Like a job posting where you have kind of the existing template and you're modifying it.
You kind of actually want a lot of consistency in the language.
Product descriptions are similar
where you really want a brand voice that's doing that
or RFPs where it's like,
hey, we're just making sure we're getting the work done
and it's documented.
And it works quite well there.
We've talked around Continual a lot, I think. And I do want to get to kind of what's
next and kind of what pieces need to fall in place for us to take things to the next level.
But can you just, can you tell us about Continual specifically
and kind of where Continual fits into all of this today?
Sure. So Continual, we're building what we call an AI
co-pilot platform for applications. So our goal is to help developers build custom embedded AI
assistants inside of their products. So core thesis is that every application out there is
going to embed a co-pilot, or you could call it, that's a name, but you could call it an AI assistant
or sidekick or something inside of the product.
And as these models get better and better and as these assistants get better and better, they're going to become more indispensable.
And you're seeing that today.
You see that with Microsoft Office 365 Copilot, the Gemini Copilot that's part of the Office suite from our Office 365 suite from Microsoft.
They're probably one of the more ahead folks. You see that with what
Google is doing with what they previously called Google Duet, but now they're calling Google Gemini
for the workspace suite. So these are systems that are embedded into software applications.
You see that with Shopify is doing it with Shopify's Sidekick, which is kind of the
co-pilot for e-commerce that sits inside of Shopify. Intuit is another kind of leading
example. They're doing it for the whole set of Intuit products,
like from TurboTax to QuickBooks, Credit Karma.
And these really are, you know,
so the basic idea is, you know,
all of our applications are going to change.
They're all going to have an assistant inside of them.
That obviously is going to include
a conversational element to it,
but it doesn't have to just be conversation, right?
It's a multi-modality, multi-user interface
set of enhancements that you can add to your products.
But it is intimately connected to your product,
your domain.
And so we're helping people do that,
helping it be deeply connected to the data of your product,
be connected to all the APIs of your product
and both the backend APIs and the front-end experiences
be integrated both conversationally and kind of getting out of
the box conversational capabilities or conversational UI for your application,
but also then build other features that are more general features. Think of like, you know,
summaries or these information extraction tasks all on what we call like a standardized
co-pilot stack or engine. So all the data is flowing into one place. You're being able to monitor it in a centralized place.
You can refine it.
You can evaluate it.
You can see what users are doing, where they're failing.
And then you can kind of loop back
and continually improve it, right?
To use the name Continual.
And so I'm super excited by it.
I think we're going through the evolution
that we just talked about generally,
where you're going to start with sort of these bread and butter,
more simpler use cases.
But I think what's exciting generally about this and our goal is how do you build not just these like, you know, kind of a chatbot 1.5,
but like how do you build a remarkable, indispensable assistant that just enables you to do so much more,
more things faster and do things that you were never able to previously do. And I think that, you know, it exists today, but, you know,
but it's going to exist even more in the future.
What do you see as kind of to get to, again, it exists today,
but a more kind of widespread experience where we're able to build these
really remarkable things using this technology.
What are the kind of next kind of core pieces or let's say problems that we have to solve to get
there? I think the biggest one is reliability at low latency and low cost. So, you know,
in the assistant use cases, you need to respond quickly. You need to be relatively cheap, right, to be able to deliver it to the customer.
And you really need reliability for the task so it becomes trusted.
And that's a hard sort of three set.
They're conflicting three sets of things.
Yeah.
And, you know, some of them aren't appreciated.
Like the latency one, you might say, hey, let's do, you know, let's do, let's use GPT-4 and we'll do
reflection over our answer and then reevaluate whether we successfully answered it and answer
again, you know, multi-shot kind of responses, which can, you know, on benchmarks can improve
performance. But if you're in a conversational chat experience, it's actually very quickly
becomes quite painful. And so, and as you drive latency down drive latency down, you typically have to run smaller models.
Typically smaller models are not as reliable.
Even GPT-4 is not reliable enough
for certain types of applications
like a lot of function calling,
which is calling APIs,
which is very important to these models.
Certainly the ability to call multiple functions,
more complicated queries.
So we have customers that want to do,
you're in like a CRM, right?
And you want to,
here's two examples of hard questions
that are not possible today.
So like one would be something like say,
what happened yesterday?
Or like what were the top complaints over the last month?
Yeah.
And that's not something that can easily,
traditionally, one of the major technology
or kind of ways we connect LLMs
to particular applications is through retrieval augmentation,
real RAG, where we do retrieval over some knowledge base
and then kind of enrich the context
and then the LLM responds with that enriched information.
But there's context window lengths and limits there.
And so something that's a broad question like that, it's very hard
to do. You can't do retrieval over that easily because you really need all the data. And you're
trying to say, there's this massive amount of things that happened yesterday. Now go summarize
them or pull out the major complaints out of all this. Now you could do it in a batch mode,
right? You could build a customized workflow for that, but it's not something today that's easy.
It's actually easy to do that without kind of crafting a kind of a customized experience for that. The other one is something
like, you know, you say like, you know, go into every, using the CRM example, like, you know,
go into every deal that we currently have open and flag any customers that have, that I should
respond to and create a to-do task, you know task for that customer that I should follow up on them.
And that's a task that you could imagine giving somebody on a sales team, right?
Hey, go and do deal review, create a summary and flag, create to-do items or something like that.
But if you think about how to implement, how an agent or kind of a co-pilot might do that,
today requires it to do potentially hundreds of calls to an API, the backend.
Okay, look up all the customers, look at all the records, and analyze that data.
It's feasible to do.
It feels like we're on the borderline of doing it, but it's not really.
Today, it's not really.
These models kind of can't handle that level of the number of function callings without a whole bunch of work.
So we're doing some of that work to make that possible, but without a whole bunch of that work, that's not possible. Yeah. Yeah. I think there's, I think both of work. So we're doing some of that work to make that possible, but not a whole bunch of that work that's not possible.
Yeah. Yeah.
I think there's, I think both of those, I mean, what I'm excited to, you know, just
using those as motivating to kind of go get to the point of the future. You know, there
are potential things on the horizon, right? So we have now models that have massive context
windows. The Gemini model has, you know, 1 million, you know, well, 1 million context
window in the publicly available API and 1 million tokens in the publicly available API.
And it has up to 10 million that they've shown that they can get it to, which is where you could solve that sort of retrieval use case in a fundamentally different way.
Now, they struggle with latency.
In their case, if you do that, it takes 60 seconds currently to give a response.
So that still doesn't work.
But we're on that.
We're pushing that envelope there.
And then obviously, you know, there's a lot of excitement around agents and planning and reasoning.
And we all sort of recognize that, you know, that is still a limitation.
And there's obviously a lot of teams working on that.
And, you know, I think progress will be made this year on that.
But a little bit unclear how fast the progress will happen.
Sure.
So with that, do you kind of feel, because I know,
like I know especially when it first came out
and you could point out some of the errors in it
and the response that I know I got from a lot of people was like,
well, yeah, but just imagine what it'll be like a year from now or whatever.
But you do
read some other stuff that talks about that like kind of we're hitting some of the limits of like
parameters and size or like latency and stuff like that do you feel like there we can get to
where you're talking about with just what we have or is there going to have to be some type of like
architecture or modeling change needed in order to get there?
That's a great question.
And I don't think I have a definitive answer there, except that I do think we're going to need some breakthroughs
to get the level of planning performance reliability
that we need at a low latency and low cost.
Like I do think we're, I mean, if you look at, you know, like for instance,
the Google, the new Google models,
presumably they have an unlimited compute budget there, right?
And they are just meeting GPT-4 level, right?
If you look at what Anthropic just released, right,
which presumably was trained for hundreds of billions of dollars,
they put you with Cloud 3 with their Opus model.
It's just beating or matching now a GPT four level.
Right.
And it's not actually really actually solving,
you know,
it's kind of meeting GPT four level four,
like what we're currently benchmarking against.
Yeah.
Actually not really the next gen applications.
Like the next gen applications are more like these autonomous assistants that maybe could fully
actuate your computer and your applications.
And honestly, none of them even can do that.
They do it, and they do it so slowly and really badly.
I think these models, I think next token prediction and these autoregressive models can go very
far.
Right.
There's no question about that.
And I think you can solve some of the latency and cost things
with these mixture of expert type models,
which is what is being done with the GPT-4 Turbo
and these types of, you know,
Databricks just released something today
that was along the slide.
So you can get really far.
You can get really far with this.
But I think intuitively intuitively you just think
about it you think hey there's definitely opportunities to change the way we do planning
and there's lots of different research directions that people have talked about and yes something
something's gonna work right and you think like because i know when i some of the stuff i've done
with nlp early on one of the big things that we would talk about that was an issue that people didn't realize was they were thinking about just accuracy, right?
It's like, I gotta be accurate.
And it's like, well, accuracy is important, but like kind of how wrong you are is also important.
Because if you give like an answer and it's really wrong, especially if it's going to be in a B2C context, people go, that's nothing.
And they don't go back to it.
Yeah.
And it's like, you've lost them right there.
Yeah.
Well, yeah, no, I have a funny story.
It's sort of funny.
I mean, you can test it now, see if they fixed it.
So I, the clog three models from, from Anthropic was, which just came out, you know, a couple
of weeks ago.
And my test is of course the, the egocentric test where you ask, who is Tristan Zions?
Because I know the reason I asked that is not because it was mainly because I know I don't, you know, it's a pretty long tail question.
You know, so not many people, you know, it may fail at it. Right. And I know the correct answer.
I really know what my bio is and if it gets it correct. And this model, which is like the biggest
GPT-4 quality model says that Tristan Zients is the CEO of Anthropic. And it's like, that's a very weird hallucination.
All of the other things, right, that this model can do.
And the fact that it probably knows a lot about Anthropic generally because, you know,
it probably does have a bunch of information.
Why would it make that hallucination with, okay, there's a random name.
And it's probably something I showed it to actually an Anthropic engineer last night and he was like we're gonna we're gonna we were
to fix it so it might be fixed by the time it's called um but on the other hand i mean that that
model is amazing and i actually dismissed to your point around k you know it's not reliable i
dismissed you know quad three as a result of that because i was like okay it's you know it's it
shouldn't refuse it just said he's not a notable figure right or something like that but instead
it kind of did the solutionation.
But more recently, I experimented and went, oh, no, it is actually a GPT-4 caliber model that actually has some advantages to GPT-4.
So I think there is – yeah, this reliability is huge.
I mean, I talked to – well, without naming company names. We work with a large financial, like accounting company.
And it's huge.
I mean, hallucination is a huge problem because they view it as like, they'll get sued.
You know, they basically have to give tax advice to, you know, it's like regulated,
same thing with financial services sector is where you're getting, you're not a certified
financial planner or advisor.
And so there's a lot of regulation around giving financial advice, tax advice.
And it makes so much sense.
I mean, there's so many tax questions that honestly, GPT-4 can do a pretty good job at answering.
Right.
But there's a lot of legitimate concern around it really has to meet a threshold that's very high.
And we don't really quite know.
I mean, obviously human tax advisors make mistakes too.
And so in that context, I mean,
what you see is you see the first place
that these co-pilots get adopted
is actually on the back line.
So the human tax advisor is using the co-pilot
and sort of is checking in and accelerating their work
and still being on the hook for the final answer.
I was actually going to ask that if you thought that was the first, like the stepping
stone to it is advise the advisor. And then once we feel like it gets to a certain point,
then you can start to do it. I also think that's probably going to be somewhat of a
regulation question too, of like, just how do you handle that? Who's responsible for it? If you
have, you know, can you certify a chatbot to be a financial
planner or something like that? Definitely. I mean, we see in the customers that are like
interested in what we offer, we see basically two crowds. One is the startups. They're all saying,
hey, how do I deliver like next generation, like breakthrough experiences to the end user?
You know, I'm willing to take some risk and I just want to deliver to the end user something
that wasn't previously impossible. And that's obviously the startup opportunity.
And then in the large enterprises, yes, there's a lot of hesitancy around customer facing
assistance and features, but there's a lot of appetite internally.
So internally, if you have employees, insurance companies, another one there, basically their
insurance plans are extremely complicated in terms of what are the coverages in any given plan.
Even for the agent on the back end to understand, you know, if there's a, you know, if my dog eats my couch, you know, is that covered?
Where exactly is that an excluded household damage?
Right.
Right.
For this particular plan of this particular, you this particular vintage or whatever.
And so just these kind of like, hey, assisting the agent on the back end to kind of answer those questions or automate some task
is definitely the first place to start.
I think it's a huge use case, obviously, but I do think to really be disruptive,
you kind of got to push it all the way to the end user experience.
Yeah, that's exciting.
Well, we're at the end user experience. Yeah. Yeah. That's exciting. Well,
we're at the buzzer here,
but yeah,
so,
so wonderful to hear from you just kind of on where we are today and here,
you know,
just about,
we have unlocked so much and can do so many things so much faster.
A long way to go to,
I think,
get to where we,
the end vision. Humanity is safe.
That's right.
But Tristan, tell us for folks who are interested in Continual AI Copilot for your applications
where can they find out more about Continual and maybe connect with you?
Oh, absolutely. So, I mean, continual.ai, easy. You know, if you're thinking about building an
AI co-pilot for an internal application, a support application, an end product,
you know, we are a very easy way to do that, make it remarkable, improve it over time.
And we're in early access right now. So you can sign up on our website,
probably going to announce some things in the coming weeks. So stay tuned. Maybe by the time
this is out, we'll be out or shortly thereafter, more probably. Awesome. Exciting. Well, Tristan,
thanks so much for sitting down for a few minutes in person here with us. And hopefully, yeah,
this is second time on the show. So we'll have to get you back on in another couple of years and get another update.
Absolutely.
Look forward to it.
Thank you.
Working towards your gold jacket.
Thank you.
We hope you enjoyed this episode of the Data Stack Show.
Be sure to subscribe on your favorite podcast app to get notified about new episodes every week.
We'd also love your feedback.
You can email me, ericdodds, at eric at datastackshow.com.
That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack,
the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.