The Data Stack Show - Re-Air: AI is All About Working with Data with Kostas Pardalis of typedef
Episode Date: November 12, 2025This episode is a re-air of one of our most popular conversations from this year, featuring insights worth revisiting. Thank you for being part of the Data Stack community. Stay up to date with the la...test episodes at datastackshow.com. Hosted by Simplecast, an AdsWizz company. See https://pcm.adswizz.com for information about our collection and use of personal data for advertising.
Transcript
Discussion (0)
Hey everyone, before we dive in, we wanted to take a moment to thank you for listening
and being part of our community. Today, we're revisiting one of our most popular episodes in the
archives, a conversation full of insights worth hearing again. We hope you enjoy it and remember
you can stay up to date with the latest content and subscribe to the show at datastackshow.com.
Hi, I'm Eric Dodds. And I'm John Wessel. Welcome to The Datastack Show.
The Datastack Show is a podcast where we talk about the technical, business,
and human challenges involved in data work.
Join our casual conversations with innovators and data professionals
to learn about new data technologies
and how data teams are run at top companies.
Before we dig into today's episode,
we want to give a huge thanks to our presenting sponsor, Rutter Sack.
They give us the equipment and time to do this show week in, week out,
and provide you the valuable content.
RudderSack provides customer data infrastructure
and is used by the world's most innovative companies
to collect, transform, and deliver their event data
wherever it's needed all in real time.
You can learn more at rudderstack.com.
Welcome back to the Datastack show.
We are here with an extremely special guest,
Kostas Pardalis from TypeDev,
formerly a co-host of the Datastack show,
Kostas, welcome back as a guest.
Happy to be back here, Eric.
It's always fun to be back like at the latest like show, which I have some great memories of like being the co-host here with you and working with Brooks.
So I'm super happy to be here.
Well, I know that longtime listeners are very familiar with you, but for the several million people that are new listeners since you left the show, give them just a brief background on yourself.
Oh, are you saying you are not like, I'm not that popular yet, like, so everyone already
knows about me?
I thought they're deeps.
Oh, my goodness.
Okay.
All right.
I'll do it, even if it's just a formality, but I'll say a few things about myself.
So, yeah, I'm cost us.
I'll keep it brief because I'm sure, like, we'll talk more about that, like, during the show.
So I've been building data infrastructure for, like, more than a decade now, especially, like,
startups. So I, what I really enjoyed like the intersection of business building and technology
building. I do have like an engineering background. I can't escape that. But at the same time,
I really enjoy like building products and companies. So naturally, I found something very technical
to productize and commercialize and sell to like technical personas. I'm working on something
new right now, and I'm looking forward. We'd like to talk more about that and what is happening
in the world in our industry today. Awesome. So KOSAS really excited to hear about what you're doing
now. We'll definitely dig into some AI topics. And then I will save this for the show,
but I want a response from you to some of the news the last couple weeks about return on investment
for AI. There's some stuff floating around where all these companies are saying, you know,
they've invested X amount of money and not seen the return on AI. So we've got to
talk about that topic. And what do you want to dig into? Oh, 100%. I think talking about
what has happened so far with AI, where we are today and what's next. I think it's like very
interesting. So definitely looking forward like to talking on that and like what's next. Because
if anything, what we've seen so far is everything has been like extremely accelerated. Like things
that like in other industries like probably take I don't know, like maybe years or even.
like decades like to happen with AI right now and take industry like things take like
literally weeks or months so what we are seeing is pretty normal natural and I think kind of
gives us a hint of what is coming next which I think like super excited exciting awesome well
let's dig in and ja let's do it okay Kostas we're going to save the full story for a little bit
later in the show, but give us just the brief timeline. So was it three years ago you left
Rudderstack? Probably, no, something like that. Yeah. Yeah. Yeah. A little bit more than like
three years, probably like closer to four, maybe. Yeah. Okay, so give us the timeline. What
happened between the time you left and what you're doing today? Yeah. So I left Rutherstack
I left primarily because I was looking for
like some new stuff like to work with
just for the people that probably don't know
about what I was doing until that time
like since 2014 I pretty much have been working
and like building infrastructure for data ingestion
so anything looks like related to let's say like
what was called back then like the modern data stack
with, like, the ELT pattern and, like, extracting data and loading data into the warehouse,
blah, like, all that stuff.
But what I had been doing, the way that I kind of model, like, the industry, is like, an onion,
right?
Like, you have ingestion that needs like to happen in the periphery.
That's what brings the data in from all, like, the places.
But there's, like, in the core, there's, like, compute and storage, right?
And usually these parts are, well, like, I would say, we can't live without that.
We can live without ingestion, probably, but we can't live without computer and storage, right?
Like, if we do not have the peta warehouse, our pipelines wouldn't be valuable.
And then, like, also, like, some very interesting, like, technical problems around that.
And I want to, like, to go deeper into these problems and this space.
And by the way, they're like challenges and opportunities there that both from, let's say, like the technology, like what it means like to build like something like Snowflake, for example, or like something like Spark, right?
But also business related.
Like we do see like the size of like, let's say the data warehousing like market compared like to the adjacent market.
But it was like a little comparison there.
right. So that's what I wanted to do. I was like, I wouldn't like to go deeper into data infrastructure. And I was looking to join like a company that was building some kind of query engine around like that South. I ended up like doing like Starburst data. Starburst data was still is. They are commercializing Trino. Trino was initially created under the name like Presto. Presto and Trino like created around the same time with Spark. And they were.
let's say the Spark was like initially being primarily used for preparing the data
when you have big data and then you would use something like press like to query the data
like after the data has been like EPO amazing project open source like for anyone with like
interesting into getting into that stuff they should definitely check it like it's probably
one of the best projects for an apple wants to see how like databases and though lab systems work
and I joined for that plus
it is a company with a very strong
enterprise go to market motion
and I want to like do that because
until including like rather
my experience with go to market
was primarily like from SMPs up to like
market and more like the PLT kind of way
of like growing and I really want to like to see
how enterprise also like works
and Trino and Starbus has been viewed like
and more like the Fortune 100 company.
So it was like a great way for me, like to see also how that works.
So joined there, spent about a year being a little bit selfies, to be honest.
I knew that I'm joining primarily to learn and like find opportunities of what like to start next
because I wanted like to start another company.
And what I started realizing there was that the data infrastructure that we have today
and the tooling that we have has been like this.
and built pretty much like 12, 13 years ago under a complete different assumptions of
like what the needs are of the market, right?
The main reason that these systems were built are to serve BI, business diligence, right?
Like reporting at scale.
Of course, like with big data, but they are built with these use case like in mind.
Through like this decade, we have more use cases that stop like emerging.
We have a mail, we have a bit of analytics.
Data started actually becoming from, let's say, something that we used to understand what we did
to actually being the product, right?
And that's kind of like the intuition and like the experience like how that was like,
okay, I think what is happening here is that we spent the past 15 years building SaaS
and pretty much digitizing every activity that we have.
This is done.
Like, okay, how many new, like, sales forces we're going to build?
Like, we still have the CRMs out there.
Like, this thing is getting digitally already, like, digital.
Same thing with marketing.
Same thing with, like, product design.
Same thing with pretty much like everything, right?
Same with, like, consumer side too.
So how is this industry going to like to keep accelerating and growing, right?
And the answer for me to that was, like, data.
Now that everything is digital, we create like all the data.
So we need to figure out ways to build products on top of the data.
Like the data will become like the product.
And we are probably entering like the next decade is going to be all about how we deliver value over that.
And then AI happens and it just accelerated everything because if we think about what like AI is,
is all about working with data at the end of the day.
Sure, you can use it like to create a new type of CRM
or like create a new type of like a VRP or whatever.
But at the end of the day,
the reason that this is going to be better than what Shelford was
is because it's primarily going like to be building on top of data
that they are being generated
and they are used like with model.
to do things that were not like possible to do before, like in an efficient way.
Based on that, I mean, actually before the AI happened, like I decided that I won't like to go
and start a company where we are going to build like the next generation of query engines
that will make it like much more accessible to more people to go and build on top of data.
We started with that and as we've been building and interacting like with design partners
and seeing, like, what's going on like with AI,
we ended up building, it's not like a fusion,
like a new type of all-up we re-entering
which outside of pure compute.
It also considers inference,
which is a new type of compute as a first-class citizen.
And you can use this technology, like, to work with your data,
both how you would do, like, traditional,
using something like Snowflake or TagDB,
but also mix in their LLMs and inference in a very controlled way
and in a way that's like very something like to develop us to build.
Man.
That was a really comprehensive.
That was like an LLM summary of the last couple of years.
Yeah, I'm spending too much time with LLMG
and they start like affecting the way I talk.
Yeah, I'm just to say I.
Yeah.
Yeah, yeah.
No, that was great.
Okay, I have a zillion product questions,
and I know John does too.
But let's talk about,
let's just talk about your perspective
on what's happening with AI.
Like you said, it's, you know,
compressing years into months and weeks.
And, you know, it's interesting.
If you read a lot of the posts and comments on Hacker News,
it is, you know, opinions are very polarizing.
You know, there's a lot of people who are, you know, you can sense fear.
There's a lot of developers who, you know, have a deep sense of FOMO, you know,
who are trying to navigate things, you know, and opinions are all over the place.
But you're actually, you know, building with AI as a core part of the compute engine that you've built.
so what is you what's your disposition i guess the other component i would add that i think is really
mind boggling is the amount of money that's being poured into this which i think is hard for a lot
of people to interpret just as far as you know is that feeling hype is that you know especially
based on some of the product experiences so there you go and easy that's a softball for your first
question.
Yeah.
Okay, where do you want to start from?
Like, the, do you want to talk about the reactions that people have?
Or?
Yeah, I think it'd be an interesting place to start.
Like, yeah.
Are you surprised by the varied reactions?
No, I'm not.
I mean, like, that's always the case, right?
Like, I don't think that you're, when you, when something new comes out,
And it's not just like an incremental change to something that we already know and not familiar with.
I think humans tend to be like, get like polarized.
You have like the people who are like, oh, yeah, that's the best thing ever happened, like the humanity.
And then you have people who are like, this is usually.
Like you can see that like with electric cards, right?
I'm sure that the first people who started like buying this.
Cloud were pretty much, you know, like just, they would find this thing like perfectly
even if it was like breaking every two miles, right?
And then you have like the people who are like, okay, if I don't have like a V8 that wakes
up everyone like around me, like, why would I have a car?
Right.
And like they are both valid.
I mean, like I like both, right?
Like I do see the joy of like, you know, noisy V8.
I also see the convenience, so like a car that it's pretty much like an iPhone on wheels.
I think that's like always the case until, you know, like something gets like normalized
and then it's just like everyone accepts it.
I think people who leave like when the iPhone came out, I don't think like everyone who used
it.
We're like, oh my God, this is like, like definitely people who are like, okay, like this thing is like
doesn't work
well enough.
Like I remember my first iPhone, for example,
I was promised like this thing
like to connect on Wi-Fi.
And actually, for me,
it took like a couple of months
until an update came out
that I actually managed
on Wi-Fi, right?
It wasn't anything like related to what we have today.
And, okay,
for the even older people
who experienced Internet
when it just came out,
well,
I don't think that
downloading anything back then was reliable at all, right?
Like you would download something just to go,
it will take forever.
We go run it and, oh, shit, this thing is corrupted.
I have to redownload the whole thing, right?
Of course, like, today we take it.
Well, they was like the $2,400 before that $9,600.
By the way, these are just like numbers of like,
bytes per second.
We're not talking about
gigabits or like
whatever we have to do, right?
And the reason I'm saying that
and I'm sure, like, I don't know,
like when I was like start
as a kid like interacting with the internet.
Okay, like my parents probably were thinking,
oh, it's like a new toy, you know.
I don't think like they could
comprehend what this thing
would become like 20 years later, right?
But for this to happen,
it took a lot of investments, it took a dot-com thing to happen, it took a lot of engineering,
like really hard engineering, and it took time. What we see, I think, like today is that
these times are getting compressed. And usually, because in a way, like money, the way that money
works, especially like in investments and why people like raise money, like for example,
like to build some thing, is because money is kind of like compressed time.
Like, we think that without money would take you, let's say, one year to do.
If you raise money, you can probably do it like in three months.
So why people see all these, like, huge amounts of money being pulled up of that is because
there is a raise to make things happen as fast as possible.
And what took internet like 20 years, we try like to make it in like five years, right?
So I think that's like the mental model at least I have when I'm trying like to like judge why like these amounts of money are like going into that.
Of course there's also the thing that with this technology compared like to something like, I mean like with crypto you also have that.
I think because you needed like people were investing in an infrastructure like to mine this stuff.
But you do have like huge also infrastructure investments, right?
You need like the other centers, and even before that, you need energy.
So there's like a lot of money that, like, they're going for that stuff.
Now, there is one more thing, though, because the interesting thing is that it's not polarizing for people in general.
It's polarizing for engineers too.
And I think what's like the most interesting thing for me with AI is that if the first time after, I don't know, like decades,
where tech workers are not disrupting other industries.
They are disrupting themselves.
And that's scary, right?
Yeah.
So tech workers usually were like, oh, I'm coming into this industry.
I'm digitizing this thing.
And of course, the people who used to do that work before, they were like, oh, like, you are replacing me or like you're doing this or like you're doing that.
But never at no point, anyone was like, oh, this is going to.
to replace, like, the engineers themselves.
Now, there is a feeling that this might be happening.
I don't agree with that, but I think, like, the polarization also,
it's more interesting right now or the drama is more interesting
because it's actually internal drama and internal disruption that is happening
and to take industry itself.
So it doesn't just, like, disrupt other industries, it disrupts itself.
So I have a question, then, saying all that close, just in back to the funding thing,
at what point, and I think maybe we're already there, is AI essentially?
too big to fail. There's too much money, too many people invested, like, we're going to make
it to succeed. Like, build it, like, you know, everything, because there's something, there's
like a human thing where, like, one of the reasons that, like, these things succeed is because
everybody decided that we wanted it to succeed. Yeah. I mean, it's going to fail to meet
some people's expectations for sure. Like, it can't, like, it can't meet everybody's expectations,
but. Yeah. I think what we lack in these conversations, in my opinion, is like a definition.
of, like, success and failure, right?
Like, what does it mean for AI, like, to fail, for example?
If we set the conversation, like, okay, the goal of what we are doing right here is, like,
to create, like, theinator who's going, like, to, I don't know, like, roll the world
and we will just all retire as humanity.
Yeah, like, of course, going to fail.
Like, I don't see that's happening, like, in the next, like, two or three years.
Like, probably never will happen because there's, it's much more complex.
on that, right? Like, even if you had that, if you have created that, the deploying that thing
is like a human endeavor. And, like, the way that you mind works is like, it's certainly complicated.
So you can't just, like, reduce this whole process into a statement of, oh, when we have AGI,
like, it's game over. Like, that's a goal. And, like, then we succeed. Without that, we don't, right?
So, I think, like, in my opinion, like, we can't talk about success and failure yet, primarily
because what we are doing right now is that, okay, we have a new thing out there.
This new thing has new capabilities that we didn't have before, okay?
We are still trying to figure out the limits of this thing, but most importantly,
We are trying to figure out the problems that make sense to solve.
And what it is to make it viable, right?
So I put this way, like, there are problems today that you can solve with AI,
but it's not viable because AI is still like too expensive for 55 use cases.
Right?
Yep.
You have cases where you have new things that you couldn't do before,
that you can do it with
AI, but it's not
reliable enough
to put it into production,
right?
There's like, and there's still like a lot of
stuff that we don't even know yet
that we can solve them like with these
new technology.
So it's still like an exploratory phase
of like trying to figure out
what makes sense
like to do with this thing.
What is like, let's say,
the killer app for these,
which I think already being deployed in some case
and delivers value there,
but there are other cases, obviously, where it fails.
And, like, as every other R&D project out there,
like, there's going to be, like, a lot of failure.
Like, that's what R&B is, right?
Like failure is, like, you have to embrace that.
Like, a lot of that stuff, like, are going, like, to fail.
The difference, in my opinion, is that experimenting is quite cheap
compared, like, to doing it in the past, right?
If someone wanted, let's say, a couple of years ago to go and experiment with building,
like incorporating like ML models to build like recommend for their system, like it couldn't
be just an experiment.
Like they would have like to make sure that this is going to work because it will be a big
investment for them.
Like you have to find the people.
You have like to find the data.
You have like to iterate on these.
It takes months, maybe years.
And like many times like what was happening was that.
We're faking that these things were like succeeding because we did invest individually like too much into them.
It doesn't hurt probably in the company, but it doesn't also mean that it adds like the value that we were expecting that's going to add.
Right. Right. What I like to look, I think it's really fun to look back. So you're probably familiar with the TV show, The Jetsons, the old TV show, the animated.
So it's fun to look back, you know, it's the future looking, you know,
what is future, what's the future going to be like?
And the two things that come out at me from that, which is, you know,
they originally made the show decades ago,
blind cars are part of that show and robots.
And if you add, like, those are two,
it's hard to think of like the things people thought would be now 30 years ago,
50 years ago, right?
But it's helpful to like, to bring that up, like,
so there's going to be some AI applications
we're working on today that it will be the equivalent of a flying car like we just haven't gotten
there the physics don't work like we don't know how to solve like that problem or like the robotics
like you know so far has been slower than a lot of people thought like we don't all have you know
robots in our houses other than maybe vacuums right so like what does that look like so i think that's
really interesting to because we're in this experimentation phase to think about which categories right now
that we're throwing AI at which
categories are going to hit the walls,
they're going to be the future, you know, flying cars, for example.
Yeah.
Yeah. First of all, I mean, you mentioned, like,
robotics. I think robotics is, like, a big thing,
like for many different reasons,
not like only because of AI.
I think there is traditional, like robotics has been like a space
that building goes, like, extremely slow.
But it is a space that now has been, like,
accelerating, like, a lot.
And new models of, like, building,
Like there's open source robotics now, like things like that.
And I think that there's definitely going to be, I'll say that like a very interesting intersection
between like the robotics itself and AI and what together they can do.
But I think like, first of all, one of the things that I don't know, like people, I think
like they need like to take a step back and think a little bit about what has happened in the past
like three years like with AI.
We are still trying like to figure out what's like they're right.
way for us as humans, like, to interact with this thing, right?
Like, co-pilot came out, and for a while, like, copilot was like the thing, right?
Like, let's build, like, a copilot for everything.
Let's build a copilot for writing, go.
Let's build a copilot for Word and Excel.
Let's build a copilot, I don't know, like, or whatever.
And I think, like, what people, like, started realizing is that the copilot thing,
which pretty much means we have this model and then we have the human who,
is in the loop there
to make sure that the model always
stays like on track
it's not very efficient
right because like what happens
is you have tasks
sure like some of them like might be
accelerated because you are using the copilot
but then you have like a human
who instead of doing other things
is like babysitting a model like to do something
right so of course
you are not going to see like crazy
ROI there. Like what's the point? I mean, it's just like instead of typing instead of the computer
like on Excel, like now you have someone who's like typing in free text like to a model
trying to convince the model to do the right thing, right? So that part of like the automation, I think
it became obvious that like it didn't work that well. There are like some cases where it works,
but like it's not as global like of universal as like we would think it would be. Then we started like
seeing new paradox of like how these things
can be done. Like at the end of the day, like
if someone tries to abstract what is
happening is how we can
see there like these models of like as we
were like considered like software before
which is okay I want this thing like
when it has a task like to just go and do
it and come back and make sure
that when it comes back it's like the right thing.
But the problem with models is that
by their nature
they are not deterministic.
So things might go like
the wrong way or things. So we need like
to figure out new ways to both, like, interact and also build systems out of this.
Like an engineering problem at the end of the day.
Like, it's not, like, the science has been done.
Like, the thing is out there.
Okay.
How do we move this thing, like, reliable at the end of the day?
Yeah.
Yeah.
I think, you know, one thing that John and I have talked about is that, you know, a lot of cases.
And actually, one interesting thing, when we start to get into type-deaf here in a little bit,
Kassas, I think is really applicable to the API that you've built.
But like, like you said,
the, unfortunately, in my opinion, like having a co-pilot chatbot as like the thing that just
everyone deployed in every possible way for every use case was really a bad start because
I think the best use case is that a lot of it or like maybe a better way to say it would be
I think some of the best manifestations of this as far as user experience is that you won't
really notice AI. It's not like it's at the forefront, right? It's just
sort of disappearing behind a user experience that feels like magically fluid or
high context. I mean, it's going to hide an immense amount of complexity and make hard things
seem really simple. Yeah, because as an example of that, like, Netflix is one of my
favorite examples of that. Like, the, like, brilliance of, like, their recommendation engine stuff
they did. It's completely invisible to the user. Others, I'm like, oh, I might like to watch
That, you know, like, no, like, those are the experiences I think will be fascinating to see, like, come into lots of different products, like, with AI.
And I haven't seen as much of that yet.
Well, I think you can see them, like, in some cases.
Like, I, sorry for the wrapping Eric, like, but there are, like, some cases where it's not like, like, in development, for example, right?
And again, you have something like load code, which, okay, like, it is an experience on its own with its own limitations, right?
It doesn't mean that you just throw this thing out there and like it's going to build like a whole Linux kernel and so on.
But stuff like using models to do like a preliminary review of a new PR, right?
Or actually using a model you as you do like a PR review.
Like these things are accelerating processes like a lot.
Okay.
Now they are not replacing the engineer, right?
And I don't think why this is a bad thing, but it does make the engineer, like, much more productive at the end of the day.
Same thing with, like, sales, like, for example, right?
Like, okay, like, you want to go and personalize, like, messages that you are going to, like, to send, like, 200 people.
Like, in the past, if you wanted to do that, we'll take, like, I don't know, like, two hours probably, like, to go and do that.
Now, it will probably take half an hour.
Now, it doesn't replace the SDR, or some people might claim it does, but I think it's a very good idea, but it does make the people more productive.
And I think that is the reason that what's, there was like a conversation like for a while that we're saying that the observation is when it comes like to impact to jobs, the main, the first layer that of professional that's been affected.
by that's like middle management.
And the reason for that is because when you, in the past, for every like five SDRs,
you probably needed like one sales manager.
Now you need one sales manager for like 100 of them or like 50 of them, right?
Because a lot of the stuff that you had to do to make sure that these people like were doing
the right thing now can happen with a much more efficiently.
The same thing also like with customer support, which is like one of like the most common
like use case where like AI is like heavily.
Like one of the things that the managers had to do was like to go through the recordings that
the agents had and make sure that they were doing the right thing.
That's super time consuming, right?
Like you literally have someone with work like for eight hours as an agent and talks like in
total I don't know like let's say three hours.
Someone had like to go through like three hours of transcript and figure out if they are doing
the right thing.
Now they can do that like for many more people in less time because they have.
like the tool to do that, right? So I think there is like impact happening out there. It's just
that the way that the dream of AI is being sold, it's not like what is happening is not as sexy
as the dream. We're going to take a quick break from the episode to talk about our sponsor,
rudder stack. Now, I could say a bunch of nice things as if I found a fancy new tool. But John
has been implementing rudder stack for over half a decade. John, you work
with customer event data every day, and you know how hard it can be to make sure that data is
clean and then to stream it everywhere it needs to go. Yeah, Eric, as you know, customer data can get
messy. And if you've ever seen a tag manager, you know how messy it can get. So Rutterstack
has really been one of my team's secret weapons. We can collect and standardize data from
anywhere, web, mobile, even server side, and then send it to our downstream tools.
Now, rumor has it that you have implemented the longest running production instance of rudder stack at six years and going.
Yes, I can confirm that.
And one of the reasons we picked rudder stack was that it does not store the data and we can live stream data to our downstream tools.
One of the things about the implementation that has been so common over all the years and with so many rudder stack customers is that it wasn't a wholesale replacement of your stack.
it fit right into your existing tool set.
Yeah, and even with technical tools, Eric,
things like Kafka or PubSub,
but you don't have to have all that complicated
customer data infrastructure.
Well, if you need to stream clean customer data
to your entire stack,
including your data infrastructure tools,
head over to rudderstack.com to learn more.
Let's start to talk about data infrastructure
because I really want to talk about type def,
mainly because I got a demo of it.
I got a demo of it right before we started
recording, so I'm all excited about it. But let's talk about data infrastructure because I agree,
you know, totally cost us that a lot of the significant impact that's happening isn't super
sexy. Where are you seeing? I mean, obviously, you're building some of this with type deaf,
but, and John, I would ask you this question too, because you use, you know, you have a really
good handle in the landscape and use new tools all the time. Like, it's interesting because
non-deterministic, you know, having a non-deterministic tool for data infrastructure is really
different than like summarize a transcript and give me the gist of it, right? Like you're not
going to, the threshold for making non-deterministic changes to a production system or to data
that, you know, is business critical. You know, clearly there's a different threshold there.
But what does the landscape look like with using LLMs and data infrastructure? Well, I haven't really
small anecdote here that I'll share Eric that I think it was interesting. So I occasionally do
mentoring stuff. And I had a mentoring call earlier today. And somebody's using an LLM to generate
SQL to look at web analytics. I'm sure that happens all over the place, especially,
you know, with startups. And it was a startup. So get on this call. And it was so funny,
like even a few months ago, I probably would have like walked them through like, because then
really no sequel, right? So walk them through like and taught them a little bit about SQL.
cool. But I actually thought them a little bit about prompting. It's like what I did. So like it was
the simplest solve. They were getting a little loss in this query. And essentially like it was a really
short solve of like, hey, break this down into CTEs. Like let me show you how to prompt it to make
you use CTE instead of sub queries. So we did that and then said, all right, run each CTE. And if there's
an error in the CTE, take that one part out, drop it in the new window, tell it to fix that piece,
move it back over. And then like work through it. And we did it together. And like,
15 minutes. She's like, oh, like, this is an amazing. This one's great. And it was just like
something that even like six months ago, like that's not how I would have walked through,
you know, a problem with somebody. So yeah, yeah, I got to think of the implications of it,
but yeah. Yeah. Yeah, it's interesting. I think like working with data is also like an
interesting topic like when it comes like to a little names for a couple of different reasons.
First of all, SQL was created because it was supposed to be used by,
business people, not technical people, right?
So, like, it kind of
resembles, like, natural language when you write it,
right? It's a way, like, for
how industry was, like, trying to create
a BSL that could be used by, like, non-technical people
at the end of the day. Like, that was, like, the goal of that.
Now, obviously,
things get more and more complicated
as we try to do more and more things, right?
And, of course, like, when you start
going to these people
who are supposed to be
like business people
or like business analysts
or even like managers
and you're suddenly like
explain to them
like use like terms of like CTEs
or projections
or joins like
what are you talking about?
But it comes out that
it's a good language
for
claims like to generate
and for people like to debug
because they are
usually end up like
writing your logic, and because it is, like, data flow,
the model is, like, data flow driven instead of, like,
decision-driven, like, like, brands-driven.
It's, you will get something back for your question that,
okay, you can spend some time, like, understand what this thing is doing.
Like, you don't have, like, to go to, like, thousands of clients of code,
like, to figure out what's going on.
Now, having said that, at the same time, as with everything else with AI,
people jumped directly, like, to the dream.
And I like, okay, let's do, like, text to SQL.
Right? Let's have like Eric saying, go there and be like, hey, how did my product team perform in the last quarter and expect something to come back that makes sense, right?
We're not there yet. I don't know if we are going to get there. I think what will happen to your point.
And I think, like, John, what you described is like great is that you need to, you have like these generalists, which is like the model that can do everything.
good enough, but if you want to do it, like, really good as an output, like, you really
have, like, to constrain of how it's going, like, to operate, right? And you have to constrain
it based on, like, the problems that you try to solve, and its problem is different. So,
you need, like, a different context. Like, it's not, like, something like generics that you can
just put there and, like, it'll solve, like, every problem. That's where we're engineering
coming, right? So there are, I think we are at the time where, okay,
okay, we need to engineer solutions.
We need to sit down, and for the problems that we are trying to solve,
find the ways that these models can operate in good enough, like, margins of errors,
and put them into production and keep improving, as we did in the past, right?
Like, that's what engineering has always been doing.
No difference.
I think one thing that I'd be interested in,
both of your opinions on is
I agree that we need to engineer solutions.
I think part of that is in the modeling layer, right?
So, like, one of the challenges,
if you think about an LLM writing SQL
is that the underlying data sets
are, like, wildly different
even for the same basic use case, right?
And so if there was a way to normalize
on, you know, a basic data model.
So you mentioned web analytics, right?
Well, that's actually a fully known, you know,
there are like standards you can use for that.
It's a fully known, you know, it's, you know,
you have page views, you have sessions, you have whatever, right?
Those are all like almost ubiquitously defined terms, right?
And so, in fact, if you weren't able to have a consistent underlying data model,
then you would be setting the LLM up for success because it's not having to try to interpret
like wildly different underlying data models to produce the same result.
And I think about the same thing with frameworks, right?
I mean, if you think about, you know, V0 from Versel, like, it's running, it's generating next steps, right?
I mean, that that framework is super well defined.
There's a zillion examples, right?
And so, like, within a certain set of parameters, it can do some pretty awesome stuff, you know, like with those guardrails there.
So do you think we will also see a big part of this become a normalization or sort of standardization of underlying data in?
in order to create an environment in which the LLM is set up better for success?
No.
The reason I'm saying that is because I think like standardization, when it comes like to data and schemas and little stuff has been tried a lot in the past.
And it's always like failed because the problem with these things is that it's extremely hard like first of all like to agree about like the semantic.
like what it means
like actually there's
there's like a very rich literature
out there like scientific research
on like how to model like specific domains
like especially like in archiving
for example like if you go there
you will see that depending on like the type
of like medium that you want to use
like they are very well defined
like schemas and most important
semantics around like
how do you digitize like a book
right like what are the parts that you break down
but the metadata that you need for these things.
Like, there is a lot of work that has been done.
But, like, the problem with that stuff is that it's extremely hard, like,
to put humans, like, to agree upon these things.
And for a good reason, it's not like, because we're, like, a problematic species.
It's just that all these things are very context sensitive.
And the way that I will do these things like in my company,
like might be very different compared to, like,
how Eric does things like in his company.
And you want to agree on something, it has to be good enough for both of us without causing problems to any of us because of, like, whatever exists in there to satisfy, like, another stakeholder, right?
So it's really hard.
I think, like, the way that, and there's another thing there, which is continuity, right?
We are not just resetting.
Like, the enterprise, like, go, like, to Bangal America.
I don't know.
How long, like, is Bangal America operating?
probably they started with like IBM started building like the first mainframes or whatever right
it's like you can't go in there and just like remove everything and put something new in there
like you need to yeah like you need you need to continue it right like so things that you know it's
really I think what can happen is like a couple of different things one is either you decide of
how models should come up on consensus of like how to do things and you let the models like figure this out and you don't care at the end of the data model or you have another layer of abstraction which is what semantic layers are right like the whole concept of semantic layer is that okay I have my data on my data lake or data warehouse I model these things like in any way I want
But I centralize also like the semantics around the meaning of this data.
So when I'm going to talk about revenue, it doesn't matter if I'm cost us from sales and Derek from marketing.
We are going to use the same definitions of what revenue is, right?
Or we will have multiple different ones, but we would know which one, each one of us is using.
So the solution, usually like to these things is like,
to add abstractions, that's like how we've been doing it so far.
And I think that's what is going like to happen now.
The main difference is that so far we've been building the abstractions,
considering one type of entity interacting with that, which is the human.
We also have to make into account that we have another entity, which is the model.
And the model needs a different experience than a human to interact with these systems.
So we don't have only, like, user experience.
Now we need also like, I don't have a model experience,
whatever, but at least a thing.
All right.
Well, we have to use our remaining time to talk about type-deaf.
So I know you gave us a brief overview at the very beginning,
but what, give us type-deaf in like two minutes.
We have more than two minutes to talk about it.
Yeah.
So when we started type-diff, like our goal was to,
find, like to build the tooling that we need today to work with data.
And when I'm talking, it sounds like very generic, but I'm, we started from like a very
all-up perspective, right? What do we do with the data that we have like on our data lake or
like our data warehouse, right? So we're not talking about like transactional use cases here,
like how you build your application with your post-ness database. It's more about.
about, okay, we have collected everything.
What do we do with that now?
Like, how do we build new applications on top of this data?
Traditionally, they're like you are using systems like Spark, right?
Yep.
But Spark has started like showing its age because, again, as I said, like at some point
at the beginning, like these things were like built primarily with like the BI,
like the business and like use case in mind.
So when you try like to build them builds, I don't like a recommender or like,
other types of applications on top of your data, more customer-facing things, it becomes hard
to do it. The way that we've been solving it so far is by using talent, right? Like, very specialized
people who can make sure that this thing was going to be working properly regardless of
like what we throw on it. That's really hard like to scale outside of like the big area in
in a way, right? It's extremely hard to go and ask, like, every engineer out there to become
an expert on, like, building and operating, like, distributed systems, especially, like,
with data. So we're like, okay, what's how we can solve that, like, how we can turn building
applications, like with data, like a similar experience to how, like, phone engineers and
the backend engineers have with application development, right? What happens with MongoDB and
It's not J.S becoming like a thing, and not becoming like a thing.
And suddenly we have this explosion of like millions of engineers, like building things, right?
But do it for data.
That's how we started.
To do that, we had like to build pretty much like from scratch in your query engine.
We want to like to use familiar interfaces.
So people can, but they have some experience with working with data.
They can already like use it.
So we build on top of like the Pi Spark API.
we used, like, the data frame API as a paradigm because it's a good way to mix together
imperative programming with declarative programming, so kind of have the best of both
work, like from what you have with SQL, but also with like a language like Python.
And then we had that when I also like to make it serverless, but then, as we said, like,
AI happened.
So now we have like a new type of compute.
So it's not all, like, the workloads completely changed.
We don't have CPU is not the bottleneck anymore.
The bottleneck is all about reaching out to LLM's and like hoping that we get something back.
And also we get something back.
Do we know if this is correct?
Like that's not like a deterministic answer, right?
So how do we engineer and put things like into production when we have like these new workloads?
So our next step was, okay, we are going to make.
inference, LLM inference, like a first-class citizen, and we got kind of objects of like,
okay, how we can do that without having to introduce, like, completely new concept, like the engineers.
So we kind of introduced, like, these new operators in the Data Frame API, where as you have
like a joint before, now you have a semantic join, as you have like a filter before, now you have
like a semantic filter and extends the operations that you already know how to do on data.
but using both like natural language and also using unstructured data where something has to be inferred.
It's not like explicit already in your data set.
And then reducing all the, removing all the hard work of like having to interact with inference engines,
figuring out like buck pressure, what to do with failure, all the things that are like extremely painful
because these new technologies are still like young and make.
things haven't been figured out yet in terms of infrastructure, but all these things end up
like making, working with them like unreliable enough to make it hard to put into production.
So our goal is like, okay, at the end, use TypeDiv to build like AI applications, both let's say
like static applications that they have like a static execution graph, but also a genetic ones where
you can let a model, like, decide what to do based on, like, the tools that it has access to.
Do it on data.
So, not like a generic environment that you can go and build, let's say, like, any type of, like, a genetic workload there.
Like, if you want to go scrape the web and come back with insights, type def, and fennick is not the way to do it.
But if you won't like to implement that on top of your, like, data warehouse data, then it's a great.
like to use. And make it also really fast like to experiment because it's like very familiar
like to work wings. And when you're ready like to get into production, remove all like the boiler
plates that someone is like to build in order like to monots the underlying infrastructure
and making things like much more efficient at the end and more reliable like to put into
production, which is like quite a big problem right now and why like many AI projects are like
failing. So I have to digest.
Yeah, it's a lot.
I know.
It's hard to talk about these things without using a lot of words.
Yeah.
You left us speechless.
No, we were both on me.
Yeah, can you go back to the semantic?
Yes.
Can you go back to the, like, I want to talk a little bit on the semantic layer because
this has been a really fascinating one for me.
Because I like your point a lot around like, as we talked about earlier,
historically, you've got BI tools, now we've got.
like we've got maybe agents or first class citizens or people equipped with like
AI tools is kind of another class of people but back to the semantic layer like there's a
startup that I've followed their journey and talked to their founders a lot and it's been
interesting just to follow them where they were like really hard like semantic layer like it's
not going to work at all without a semantic layer and then they were kind of and then the and then like
back to that like comment on like talent it's like well how many companies are in a point where
they have a mature enough warehouse and they have all this organized into, you know,
a modeling tool like DBT and they have like a mature semantic layer.
Like even that number is like not super high.
And so it's just interesting because even they, I think, have like gone back and thought like,
well, but if we did kind of go back to text to SQL and think about like basically dynamically
generated, you know, semantic layer.
So there's not as much like engineering involved in that.
So I wonder how many of those.
like reinventions will happen on like just pragmatically right where it's like okay this is how
it should work this how it works best we're going to have to go back and reinvent pragmatically
because like to our tan like our tool addressable mark is not big enough so we need to like go
you know yeah yeah i mean i think that a lot of that stuff goes back like to kind of like
what we're showing about like the continuity right like if if you have like a company that has been
operating like a BI stack for a couple of years now, right?
They probably have a code base of SQL that already exists there.
And migrating that to like a semantic layer, which by the way, the semantic layer also
needs to depopulated, that monarchs, right?
Yes, you do add there like an abstraction that can probably make things better, assuming,
that it has been curated, right?
And most importantly, curate, like, someone has created it, right?
That's, like, one of the reasons that traditional, like,
the semantic layer is not something new.
Like, has been around, like, for a very long time.
But it was primarily, like, an enterprise thing.
And it was an enterprise thing because the enterprise had the resources,
like, to go and build and maintain these things, right?
Now, can NLMCH help with that?
Maybe, I don't know,
that's like something for the semantic layer people like to figure out.
But at the end of the day, if you come to a team that already spends probably 40, 50% of their time,
asking requests like, hey, I'm trying to calculate this thing, do we already do it?
And if yes, where, and can we update it to also add this new thing there?
Because we also have a new source that we want to track SEO coming from related to it.
And tell them, well, you can solve this.
If you go through like a six-month project to build a semantic layer and educate also the whole company that they have like to use whatever we put in there.
Yeah, it's like super hard.
Like even if on paper like it works, you have to both change the organization behavior and to invest like in the technology resources that you don't already have.
So it's a hard sell.
right
you need to
I think in my opinion
there's a more of a product
opinion
you have to fix
the problems that already exist
like what people
carry from the past
and make the transition
easy if the transition
is not easy
to this new word
that you are promising
people wouldn't
like
it's too much
yep
yep
and that's like part of
like why we build
like type of the way we did
is because
we
if you have to educate people
a lot
it's
you put a lot of risk
in like what you are building
people don't have time
and you don't have the money
also like to do it
so it has to be something
that it's like very familiar
for people like to use
and makes it easy
so all the decisions that we made
is that familiar APIs
for both humans and machines
right
PISPAR has been out there
like for a long time
these models have been trained on that
so like the API is kind of like
known you can
go and like ask it at the end of the day like to build something on our framework and it
will probably succeed like after one or two iterations just because of this family are like with
the syntax. So we need like to reduce the amount of like effort that people have to put in
order to migrate into these new worlds. Because at the end of the day like we kind of solve
the same problems like in a better way. But it will like to make this reality happen fast. We
have like to help people migrate also like fast, right?
We can't just like promise a dream, that will take them six months of implementation
before they can't even like taste the dream.
And that's what we are trying like to do, like with type of remove everything as much
as possible that makes it really incompatible with what people already know.
Like the same way that you would build like a pipeline in the past, like the process of your
data, like you should do the same thing.
using LLMs without having to learn like new concepts.
If we might not like to do that with type-dive,
from a product perspective,
we are going to, I'll call it like a success.
It's going to be a commercial success,
lots of different conversation.
But that's kind of like the goal, right?
Do the things you were doing in the past,
but in a much, much better way,
because now transparently you can't use LLM's
to do some of the stuff that would, like,
extremely hard like to do before.
but without compromising on how you put things into production,
how you operate things,
and how fast you can iterate on the problem you are trying to solve.
I love it.
Yeah, that was when you were giving me a demo earlier today,
that I think was actually pretty surprising
because when we talked about what would it take
to productionize this for the use case we were discussing,
it was just kind of it didn't really feel that unfamiliar yeah that is so i mean this kind of feels
you know this feels very natural right like here's all the tables you know you have a pipeline
set up so yeah i was yeah i that's super interesting i didn't even really think about that i just
my main thought was oh that's like sounds way easier than i thought it was going to
sound. So hopefully that's commercial success.
Sure. Yeah. Yeah. It's on the way.
100% and I think like a positive side effect of using like familiar paradigms is that when
things go wrong and of course things will go wrong, it will be easier for people like to
reason about them and like figure out the issues and fix things. Again, I'll keep like kind
of, I don't know, becoming like boring, but it is engineering at the end of the day. Like
We've been spending so much time building these best practices, these ways of, like,
operating systems, operating unreliable systems in a reliably way.
We just need, like, to use the same principles.
And as you said, like, put AI in there, but the AI should feel like almost magical.
Like, it shouldn't feel like, oh, now everything that I was doing is breaking
because I'm trying to use this damn new thing that's, I don't know why it breaks.
Yep.
And I think that goes back to what you were talking about with the use case.
Awesome.
Well, we are at the buzzer, as we like to say.
Brooks is telling us we're out of time.
Because I would love to have you come back on for a round two.
And I want to do two things.
Let's talk about some use cases that you're implementing for your customers.
And then the other thing that we didn't talk about that I would love to talk about.
This is just for me talking with, you know, some of our larger customers and their restrictions on even using LLMs, you know, especially as it relates to certain types of data is a huge challenge, right?
And I mean, you know, in the startup, you know, like you were saying, John, okay, this person's like, you know, just throwing SQL, you know, probably straight into GPT and, you know, hearing data, you know, whatever, right?
And it's like, okay, well, you, I mean, you cannot do that at a large company, right?
And there are like a lot of security, like legitimate, you know, security concerns and other things like that.
So I'd love to cover that too, Kastas, because the types of workloads that you're running, that's clearly a, you know, a concern.
Yeah, yeah, 100%.
I think, like, a lot of that stuff is being addressed.
And, like, I think it's getting easier.
I like to use, like to find solutions not either through using, let's say,
proprietary open source model that you only run or use like from the big providers but like in
very like secure ways like it's something but like the big like like open AI and like all these
people are like this is kind of like a solved problem like at this point I would say and I would say
that like most people probably end up using open source models not that much because of security
but more because of performance interesting
but that's we can talk about that yeah okay it's a round too
love it thank you so much guys I love it I'm looking forward to come back again
yeah we'll do it soon the data stack show is brought to you by rudder stack
learn more at rudder stack.com
