The Data Stack Show - 259: Too Big to Fail? The Hype, Hope, and Reality of AI with Kostas Pardalis of typedef
Episode Date: August 27, 2025This week on The Data Stack Show, Brooks and John welcome back Kostas Pardalis, long-time co-host of the Data Stack Show and now Co-Founder of typedef. The group discusses the rapid evolution of AI an...d data infrastructure. The conversation also explores how AI is accelerating industry change, the challenges of integrating large language models (LLMs) into data workflows, and the limitations of current semantic layers. Kostas shares insights on building next-generation query engines, the importance of using familiar engineering paradigms, and the need to make AI seamless and almost invisible in user experiences. Key takeaways include the necessity of practical, incremental innovation, the reality behind AI hype, strategies for making advanced data tools accessible and reliable for engineers and businesses alike, and so much more. Highlights from this week’s conversation include:Kostas’s Background and Career Timeline (1:10)Transition from RudderStack to Starburst Data (4:25)AI Acceleration and Industry Impact (9:37)AI Hype, Investment, and Polarized Reactions (12:05)Historical Parallels and Tech Adoption (13:54)AI Disrupting Tech Workers and Internal Drama (18:56)Experimentation Phase and Future AI Applications (24:01)Invisible AI and User Experience (28:21)AI in Data Infrastructure and LLMs (34:24)SQL, LLMs, and Engineering Solutions (36:35)Standardization, Semantic Layers, and Data Modeling (41:01)Introduction to typedef (45:49)Productionizing AI Workloads with typedef (51:36)Familiarity, Reliability, and Engineering Best Practices (57:24)Security, Enterprise Concerns, and Open Source Models (1:00:48)Final Thoughts and Takeaways (1:01:47)The Data Stack Show is a weekly podcast powered by RudderStack, customer data infrastructure that enables you to deliver real-time customer event data everywhere it’s needed to power smarter decisions and better customer experiences. Each week, we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Hi, I'm Eric Dots.
And I'm John Wessel.
Welcome to The Datastack Show.
The Datastack Show is a podcast where we talk about the technical, business, and human challenges involved in data work.
Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies.
Before we dig into today's data.
episode, we want to give a huge thanks to our presenting sponsor, RudderSack. They give us
the equipment and time to do this show week in, week out, and provide you the valuable
content. RudderSack provides customer data infrastructure and is used by the world's most
innovative companies to collect, transform, and deliver their event data wherever it's needed
all in real time. You can learn more at rudderstack.com. Welcome back to the Datastack show.
We are here with an extremely special guest.
Costas Pardalis from TypeDev, formerly a co-host of the Datastack show, Costas, welcome back as a guest.
Happy to be back here, Eric.
It's always fun to be back like at the Datastack show, which I have some great memories of like being the co-host here with you and working with Brooks.
So I'm super happy to be here.
Well, I know that longtime listeners are very familiar with you, but for the several million people,
people that are new listeners since you left the show. Give them just a brief background on
yourself. Oh, are you saying you're not like, I'm not that popular yet. Like, so everyone
already knows about me. I thought they did. Oh, my goodness. Okay. All right. I'll do it,
even if it's just a formality, but I'll say a few things about myself. So, yeah, I'm
cautious. I'll keep it brief because I'm sure, like, we'll talk more about that, like, during
So I've been building data infrastructure for, like, more than a decade now, especially, like, startups.
So I, what I really enjoyed, like, the intersection of business building and technology building.
I do have, like, an engineering background.
I can't escape that.
But at the same time, I really enjoy, like, building products and companies.
So naturally, I found something very technical to productize and commercialize.
and sell to, like, technical personas.
I'm working on something new right now,
and I'm looking forward to like to talk more about that
and what is happening in the world and our industry today.
Awesome.
So, POSAS, really excited to hear about what you're doing now.
We'll definitely dig into some AI topics.
And then I will save this for the show,
but I want a response from you to some of the news
the last couple weeks about return on investment for AI.
There's some stuff floating around where all these companies
they're saying, you know, they've invested X amount of money and not seeing the return on AI.
So we've got to talk about that topic.
And what do you want to dig into?
Oh, 100%.
I think talking about what has happened so far with AI, where we are today, and what's next?
I think it's like very interesting.
So definitely looking forward like to talking on that and like what's next?
Because if anything, what we've seen so far is everything has been like extremely accelerated.
like things that in other industries like probably take, I don't know,
like maybe years or even like decades like to happen with AI right now
and take industry like things take like literally weeks or months.
So what we are seeing is pretty normal, natural,
and I think kind of gives us a hint of what is coming next.
It's I think like super excited.
Exciting.
Awesome.
Well, let's dig in and ja.
Let's do it.
Okay, Kostas, we're going to save.
the full story for a little bit later in the show, but give us just the brief timeline.
So was it three years ago you left Rudder's Sack?
Probably, no, something like that.
Yeah, like, yeah.
Yeah.
left rather stack. I left primarily because I was looking for like some new stuff like to work
with just for the people that probably don't know about what I was doing until that time.
Like since 2014 I pretty much have been working in like building infrastructure for
data ingestion. So anything that's like related to let's say like the what was called
back then like the data for the modern data stack.
with, like, the ELT pattern and, like, extracting data and loading data in the data warehouse,
but, like, all that stuff.
But what I had been doing, the way that I kind of model, like, the industry, is like an onion,
right?
Like, you have ingestion that needs, like, to happen in the periphery.
That's what brings the data in from all, like, the places.
But there's, like, in the core, there's, like, compute and storage, right?
And usually these parts are, well, like, I would say we can't live without that.
We can live without ingestion, probably, but we can't live without computer and storage, right?
Like, if we do not have the better warehouse, our pipelines wouldn't be valuable.
And then, like, also, like, some very interesting, like, technical problems around that.
And I want to, like, to go deeper into these problems and this space.
And by the way, they're like challenges and opportunities there that both from, let's say, like the technology, like what it means like to build like something like Snowflake, for example, or like something like Spark, right?
But also business related.
Like we do see like the size of like, let's say the data warehousing like market compared like to the adjacent market.
But it was like a little comparison there.
Right.
So that's what I wanted to do.
I would like, I wouldn't like to go deeper into data infrastructure.
And I was looking to join like a company that was building some kind of query engine around like that stuff.
I ended up like doing like Starburst data.
Starburst data was still easy.
They are commercializing Trino.
Trino was initially created under the name like Presto.
Presto and Trino like created around the same time with Spark.
and they were, let's say, the Spark was, like,
initially being primarily used for preparing the data
when you have big data, and then you would use something like
to create the data, like, after the data has been, like, EPO.
Amazing project, open source, like, for anyone with, like,
interesting things to getting into that stuff,
they should definitely check it.
Like, it's probably one of the best projects for an Apple
who wants to see how, like, databases and download systems work.
And I joined for that.
Plus, it was, it is a company with a very strong enterprise, go-to-market motion.
And I want to, like, do that because until, including, like, rather,
like, my experience with go-to-market was primarily, like, like, from SMPs up to, like,
market and more like the PLT kind of way of, like, growing.
And I really want to, like, to see how enterprise also, like, works.
And Trino and Star Business has been viewed like a more like a Fortune 100 company.
So it was like a great way for me like to see also how that works.
So joined there, I spent about a year, being a little bit selfies, to be honest.
I knew that I'm joining primarily to learn and like find opportunities of what I'd like to start next
because I wanted like to start another company.
And what I started realizing there was that the data infrastructure that we have today
and the tooling that we have has been like designed and built pretty much like 12, 13 years
ago under a completely different assumptions of like what the needs are of the market, right?
The main reason that these systems were built are to serve BI, business diligence, right?
Like reporting at scale. Of course, like with big data, but they are built with these use case
that I can mind. Through like this decade, we have more use cases that stuff like
emerging, we have a mail, we have a peddle analytics, data started actually becoming from,
let's say, something that we used to understand what we did, to actually being the product,
right? And that's kind of like the intuition and like the experience like how that was like,
okay, I think what is happening here is that we spent the past 15 years building SaaS and pretty
much digitizing every activity that we have.
This is done.
Like, okay, how many new, like, sales forces we are going to build?
Like, we still have the CRMs out there.
Like, this thing is getting digitally already, like, digital.
Same thing with marketing.
Same thing with, like, product design.
Same thing with pretty much like everything, right?
Same with, like, consumer side too.
So how is this industry going to like to keep accelerating
and growing, right?
And the answer for me
to that was, like, data, because now
that everything is digital, we create, like, all
data, so
we need to figure out ways to build
products on top of the day. Like, the data will
become, like, the product. And we are probably
entering, like, the next decade
is going to be all about how
we deliver value over that.
And then AI
happened, and it just
accelerated everything, because
if you think about what, like, AI,
is all about working with data at the end of the day.
Sure, you can use it like to create a new type of CRM or like create a new type of like
a VRP or whatever.
But at the end of the day, the reason that this is going to be better than what Shelford was
is because it's primarily going like to be building on top of data that they are being
generated and they are used like.
with models to do things that were not like possible to do before, like in an efficient way.
Based on that, I mean, actually before the AI happened, like I decided that I won't like to go
and start a company where we are going to build like the next generation of query engines
that will make it like much more accessible to more people to go and build on top of data.
We started with that and as we've been building and interacting like with design partners
and seeing, like, what's going on like with AI,
we ended up building, like, a fusion,
like a new type of all-up we re-entering,
which outside of pure compute.
It also considers inference,
which is a new type of compute as a first-class citizen.
And you can use this technology, like,
to work with your data,
both how you would do, like,
traditional using something like Snowflake or TagDB,
but also mix in their LLMs and inference in a very controlled way
and in a way that very something we are like to develop as to build.
Man, that was a really comprehensive.
That was like an LLM summary of the last couple of years.
Yeah, I'm spending too much time with LLNs
and they start affecting the way I talk.
Yeah, I'm just to say I.
It's close to say I.
Yeah.
No, that was great.
Okay, I have a zillion product questions,
and I know John does too.
But let's talk about,
let's just talk about your perspective
on what's happening with AI.
Like you said, it's, you know,
compressing years into months and weeks.
And, you know, it's interesting.
If you read a lot of the posts and comments on Hacker News,
It is, you know, opinions are very polarizing.
You know, there's a lot of people who are, you know, you can sense fear.
There's a lot of developers who, you know, have a deep sense of FOMO, you know,
who are trying to navigate things, you know, and opinions are all over the place.
But you're actually, you know, building with AI as a core part of the compute engine that you've built.
so what is you what's your disposition i guess the other component i would add that i think is really
mind-boggling is the amount of money that's being poured into this
which i think is hard for a lot of people to interpret just as far as you know is that feeling
hype is that you know especially based on some of the product experiences so there you go
an easy. That's a softball for your first question.
Yeah. Okay. Where do you want to start from? Like,
the, do you want to talk about the reactions that people have?
Yeah, I think it'd be an interesting place to start. Like,
yeah. Are you surprised by the varied reactions?
No, I'm not. I mean, like, there's, that's always the case, right? Like, I don't think that
you're when you when something new comes out and it's not just like an incremental change to
something that we already know and not familiar with I think humans tend to be like
get like polarized you have like the people who are like oh yeah that's best thing ever
happened like the humanity and then you have people who are like this is usually like you can see
that like with electric cars
Right. I'm sure that the first people who started like buying Tesla were pretty much, you know, like just, they would find this thing like perfectly benefit was like breaking every two miles. Right. And then you have like the people who are like, okay, if I don't have like a V8 that wakes up everyone like around me, like, why would I have a car? Right. And like they are both valid. I mean, like I like both. Right. Like I do see that.
joy of, like, you know, noisy V8.
I also see the convenience or, like, a car that it's pretty much like an iPhone on wheels.
I think that's, like, always the case until, you know, like, something gets, like, normalized
and then it's just, like, everyone accepts it.
I think people who leave, like, when the iPhone came out, I don't think, like, everyone
who used it, we're like, oh, my God, this is, like, definitely people who are like, okay,
like this thing is like
doesn't work
like well enough. Like I remember
like my first iPhone for example
I was promised like this thing like to connect
on Wi-Fi and actually
for me it took like a couple of months until an update
came out like that I actually managed
on Wi-Fi, right? It wasn't
anything like related to what we have today.
And okay
for the even older people
who experienced Internet when it just came out
well
I don't think that downloading
anything back then
was
reliable at all
right like you would
download something just to go
it will take forever
we go run it
and oh shit
this thing is corrupted
I have to re-download
the whole thing right
of course like
56K
well
there was like the
2400 before that
9600
let
these are just like
numbers of like
bytes per second. We're not
talking about
gigabits or like whatever we have.
Yeah, yeah. Right.
And the reason I'm saying that
and I'm sure like I don't know.
Like when I was like start
as a kid like interacting with the internet.
Okay like my parents
probably were thinking, oh it's like a new toy
you know. I don't think like they could
comprehend what this thing
would become like 20 years later.
Right. But for this to happen, it took a lot of investments. It took a dot-com thing to happen. It took a lot of engineering, like really hard engineering. And it took time. What do we see, I think, like today is that these times are getting compressed. And usually, because in a way, like money, the way that money works, especially like in investments and why people like raise money.
like for example, like people build some things because money is kind of like compressed time.
Like we think that without money would take you, let's say, one year to do.
If you raise money, you can probably do it like in three months.
So why people see all these like huge amounts of money being pulled for that is because there is a raise to make things happen as fast as possible.
And what took internet like 20 years, we try like to make it in like five years, right?
So I think that's like the mental model at least I have when I'm trying like to like judge why like these amounts of money are like going into that.
Of course there's also the thing that with this technology compared like to something like, I mean like with crypto you also have that.
I think because you needed like people were investing in an infrastructure like to mine this stuff.
But you do have like huge also infrastructure investments, right?
You need like telacenters, and even before that, you need energy.
So there's like a lot of money that, like, required for that stuff.
Now, there is one more thing, though, because the interesting thing is that it's not polarizing for people in general.
It's polarizing for engineers too.
And I think what's like the most interesting thing for me with AI is that if the first time master, I don't know, like decades,
where tech workers are not disrupting other industries.
They are disrupting themselves.
And that's scary, right?
Yeah.
So tech workers usually were like, oh, I'm coming into this industry.
I'm digitizing this thing.
And of course, the people who used to do that work before, they were like, oh, my, you are
replacing me or like you're doing this or like you're doing that.
But never at no point, anyone was like, oh, this is going to.
to replace, like, the engineers themselves.
Now, there is a feeling that this might be happening.
I don't agree with that, but I think, like, the polarization also,
it's more interesting right now or the drama is more interesting
because it's actually internal drama and internal disruption that is happening
and to take industry itself.
So it doesn't just, like, disrupt other industries, it disrupts itself.
So I have a question, then, saying all that close, just in back to the funding thing,
at what point, and I think maybe we're already there, is AI essentially?
too big to fail. There's too much money. So many people invested, like, we're going to make it to
succeed. Like, doesn't, like, you know, everything, because there's something, there's like a human
thing where, like, one of the reasons that, like, these things succeed is because everybody
decided that we wanted it to succeed. Yeah. I mean, it's going to fail to meet some people's
expectations for sure. Like, it can't, like, it can't meet everybody's expectations, but.
Yeah. I think what we lack in these conversations, in my opinion, is like a definition.
of like success and failure, right?
Like, what does it mean for AI, like to fail, for example?
If we set the conversation like, okay, the goal of what we are doing right here
is like to create like the remator who's going like to, I don't know, like, roll the world
and we will just all retire as humanity.
Yeah, like, of course, going to fail.
Like, I don't see that happening like in the next like two or three years.
Like, probably never will happen because there's, it's much more complex.
on that, right? Like, even if you had that, if you have created that, the deploying that thing
is like a human endeavor. And, like, the way that humanity works is like, it's certainly complicated.
So you can't just, like, reduce this whole process into a statement of, oh, when we have AGI,
like, it's game over. Like, that's a goal. And, like, then we succeed. Without that, we don't.
right so I think like we in my opinion like we can't talk about success and failure yet
primarily because what we are doing right now is that okay we have a new thing out there
this new thing has new capabilities that we didn't have before okay we are still trying
to figure out the limits of this thing but most importantly
we are trying to figure out
the problems that make sense to solve
and what it is
to make it viable, right?
So, I can put it this way,
like,
there are problems today
that you can solve with AI,
but it's not viable
because AI is still like too expensive
for the use cases.
You have cases
where you have new things
that you couldn't do before
that you can do it
with AI, but it's not reliable enough, right, to put it into production, right?
They're like, there's like, and there's still like a lot of stuff that we don't even know yet
that we can solve it, like with this new technology. So it's still like an exploratory phase
of like trying to figure out what makes sense, like to do with this thing. What is like, let's say
the killer app for this
which I think like already being like deployed in some case
and delivers value there but there are like other cases
obviously where it fails and like as every other R&D project
out there like there's going to be like a lot of failure
like that's what R&D is right like failure
it's like you have to embrace that like a lot of that stuff like are going like
to fail the difference in my opinion is that
experimenting is quite cheap compared like to doing it in the past
Right. If someone wanted, let's say, a couple of years ago to go and experiment with building, like incorporating like ML models to build like recommended like for their system.
Like it couldn't be just an experiment. Like they would have like to make sure that this is going to work because it will be a big investment for them.
Like you have to find the people. You have like to find the data. You have like to iterate on these. It takes months, maybe years.
And, like, many times, like, what was happening was that we're faking that these things were, like, succeeding because we did invest individually, like, too much into them.
It doesn't hurt, probably in the company, but it doesn't also mean that it's not, like, the value that we were expecting that's going to add, right?
Right.
But I like to look, I think it's really fun to look back.
So you're probably familiar with the TV show, The Jetsons, the old TV show, that.
animated. So it's fun to look back, you know, the, it's the future looking, you know,
what is future, what's the future going to be like? And the two things that come out at me from
that, which is, you know, they originally made the show decades ago. Blind cars are part
of that show and robots. And if you add, like, those are two, it's hard to think of like the
things people thought would be now 30 years ago, 50 years ago, right? But it's helpful to like,
to bring that up like so there's going to be some AI applications we're working on today
that it will be the equivalent of a flying car like we just haven't gotten there the physics
don't work like we don't know how to solve like that problem or like the robotics like you know
so far has been slower than a lot of people thought like we don't all have you know robots
in our houses other than maybe vacuums right so like what does that look like so i think that's
really interesting to because we're in this experimentation phase to think about which
categories right now that we're throwing AI at which categories are going to hit the walls,
they're going to be the future, you know, flying cars, for example.
Yeah.
Yeah.
First of all, I mean, you mentioned like robotics.
I think robotics is like a big thing, like for many different reasons, not like only because
of AI.
I think there is traditional like robotics has been like a space that building goes like extremely
slow, but it is a space that now has been like accelerating like a lot and new models
of, like, building, like, there's open source robotics now, like, things like that.
And I think that there's definitely going to be, I'll say that, like, a very interesting
intersection between, like, the robotics itself and AI and what together they can do.
But I think, like, first of all, one of the things that I don't know, like, people, I think
like they need, like, to take a step back and think a little bit about what has happened
in the past, like, three years, like with AI.
we are still trying to figure out
what's the right way
for us as humans
like to interact with this thing, right?
Like,
co-pilot came out
and for a while,
like,
copilot was like the thing,
right?
Like,
let's build like a copilot for everything.
Let's build a copilot for writing code.
Let's build a copilot for Word and Excel.
Let's build a copilot,
I don't like,
or whatever.
And I think like what
people like started realizing
is that the copilot thing,
which pretty much means
we have this model and then we have the human who is in the loop there to make sure that
the model always stays like on track, it's not very efficient, right?
Because like what happens is you have tasks, sure, like some of them like might be
accelerated because you are using the copilot, but then you have like a human who instead of
doing other things is like babysitting a model like to do something, right?
So, of course, you are not going to see, like, crazy ROI there.
Like, what's the point?
I mean, it's just, like, instead of typing, instead of the computer, like, on Excel,
like, now you have someone who's, like, typing in free text, like, to a model,
trying to convince the model to do the right thing, right?
So that part of, like, the automation, I think, it became obvious that, like, it didn't work that well.
There are, like, some cases where it works, but, like, it's not as global.
like of Universal as, like, we would think it would be.
Then we started, like, seeing new paradox of, like, how these things can be down.
Like, at the end of the day, like, if someone tries to abstract what is happening,
is how we can see there, like, these models of, like, as we were, like, considered, like,
software before, which is, okay, I want this thing, like, when it has a task, like,
to just go and do it and come back and make sure that when it comes back, it's, like, the right thing.
But the problem with models is that by their nature, they are not deterministic.
things might go like the wrong way or things so we need like to figure out new ways to both like
interact and also build systems out of this like an engineering problem at the end of the day like
it's not like the science has been done like the thing is out there okay how do we need this thing like
reliable at the end of the day yeah yeah i think you know one thing that that john and i have talked
about is that in a lot of cases and actually one interesting thing when we start to get into type
deaf here in a little bit.
Gassus, I think is really applicable to the API that you've built.
But like, like you said, unfortunately, in my opinion, like having a co-pilot chatbot
as like the thing that just everyone deployed in every possible way for every use case
was really a bad start because I think the best use case is that a lot of it, or like,
maybe a better way to say it would be, I think some of the best manifestations of this as far as
user experience is that you won't really notice AI.
It's not like it's at the forefront, right?
It's just sort of disappearing behind a user experience that feels like magically fluid or
high context.
I mean, it's going to hide an immense amount of complexity and make hard things seem
really simple.
Yeah, because as an example of that, like Netflix is one of my favorite examples of
that. Like, the, like, brilliance of, like, their recommendation engine stuff they did,
it's completely invisible to the user. Others, I'm like, oh, I might like to watch that,
you know? Like, no, like, those are the experiences I think will be fascinating to see, like,
come into lots of different products, like, with AI. And I haven't seen as much of that yet.
Well, I think you can see them, like, in some cases. Like, I'm sorry for the wrapping
Eric, like, but, there are, like, some cases where it's not like, like, in development,
for example, right?
And again, you have something like cloud code
which, okay, like, it is an experience on its own
with its own limitations, right?
It doesn't mean that you just like throw this thing out there
and like it's going to build like a whole Linux kernel on its own.
But stuff like using models to do like a preliminary review of a new PR, right?
Or actually using a model you as you do like a PR review.
Like these things are accelerating.
processes like a lot. Okay, now they are not replacing the engineer, right? And I don't think
why this is a bad thing, but it does make the engineer like much more productive at the end
of the day. Same thing with like shelf, like for example, right? Like, okay, like you want to go
and personalize like messages that you are going like to send like 200 people. Like in the past,
if you wanted to do that, we'll take like, I don't know, like two hours probably like to go and do
that. Now, it will probably take half an hour. Now, it doesn't replace the SDR, or some people might
claim it does, but I think it's a very good idea, but it does make, like, the people more, like,
productive. And I think that is the reason that what's, there was like a conversation, like, for a while
that we're saying that the observation is when it comes like to impact to jobs, the main, the first
layer of professional that's been affected by that's like middle management.
And the reason for that is because when you, in the past, for every like five
SDRs, you probably needed like one sales manager.
Now you need one sales manager for like 100 of them or like 50 of them, right?
Because a lot of the stuff that you had to do to make sure that these people like were
doing the right thing now can happen with a like much more efficiently.
The same thing also, like, with customer support, which is, like, one of, like, the most common, like, use case where, like, AI is, like, heavily.
Like, one of the things that the managers had to do was, like, to go through the recordings that the agents had and make sure that they were doing the right thing.
That's super time consuming, right?
Like, you literally have someone with, who works, like, for eight hours as an agent and talks, like, in total, I don't know, like, let's say three hours.
Someone had, like, to go through, like, three hours of transcript and figure out if they are doing the right thing.
now they can do that for many more people in less time
because they have the tooling to do that, right?
So I think there is, like, impact happening out there.
It's just that the way that the dream of AI is being sold,
it's not, like, what is happening is not as sexy as the dream.
We're going to take a quick break from the episode
to talk about our sponsor, Rutter Sack.
Now, I could say a bunch of nice things
as if I found a fancy new tool,
But John has been implementing RudderStack for over half a decade.
John, you work with customer event data every day and you know how hard it can be to make sure that data is clean and then to stream it everywhere it needs to go.
Yeah, Eric, as you know, customer data can get messy.
And if you've ever seen a tag manager, you know how messy it can get.
So RudderStack has really been one of my team's secret weapons.
We can collect and standardize data from anywhere, web, mobile, even server side, and then send it to our downstream tool.
Now, rumor has it that you have implemented the longest-running production instance of Rudder Stack at six years in going.
Yes, I can confirm that.
And one of the reasons we picked Rudder Stack was that it does not store the data and we can live-stream data to our downstream tools.
One of the things about the implementation that has been so common over all the years and with so many Rudder-Stack customers is that it wasn't a wholesale replacement of your stack.
it fit right into your existing tool set.
Yeah, and even with technical tools, Eric,
things like Kafka or PubSub,
but you don't have to have all that complicated
customer data infrastructure.
Well, if you need to stream clean customer data
to your entire stack,
including your data infrastructure tools,
head over to rudderstack.com to learn more.
Let's start to talk about data infrastructure
because I really want to talk about type def,
mainly because I got a demo of it.
I got a demo of it right before we started
recording, so I'm all excited about it.
But let's talk about data infrastructure, because I agree, you know, totally
cost us that a lot of the significant impact that's happening isn't super sexy.
Where are you seeing?
I mean, obviously, you're building some of this with type deaf, but, and John, I would
ask you this question, too, because you use, you know, you have a really good handle in
the landscape and use new tools all the time.
Like, it's interesting because non-deterministic
you know, having a nondeterministic tool for data infrastructure is really different than like
summarize a transcript and give me the gist of it, right? Like you're not going to the threshold for
making nondeterministic changes to a production system or to data that, you know, is business
critical. You know, clearly there's a different threshold there. But what does the landscape
look like with using LLMs and data infrastructure? Well, I have a really small annexate here
that I'll share Eric, and I think it was interesting.
So I occasionally do mentoring stuff, and I had a mentoring call earlier today,
and somebody's using an LLM to generate SQL to look at web analytics.
I'm sure that happens all over the place, especially with startups, and it was a startup.
So get on this call, and it was so funny, like, even a few months ago,
I probably would have, like, walked them through, like, because then it really no sequel, right?
So walk them through, like, and taught them a little bit about SQL.
but I had actually thought them a little bit about prompting is like what I did.
So like it was the simplest solve.
They were getting a little loss in this query and essentially like it was a really short solve
of like, hey, break this down into CTEs.
Like let me show you how to prompt it to make it use CTE instead of sub queries.
So we did that and then said, all right, run each CTE.
And if there's an error in the CTE, take that one part out, drop it in the new window,
tell it to fix that piece, move it back over.
And then like work through it.
And we did it together and like,
15 minutes. She's like, oh, like, this is amazing. This is great. And it was just like something that even like six months ago, like that's not how I would have walked through, you know, a problem with somebody. So. Yeah. Yeah. I think of the implications of it. But yeah. Yeah. Yeah. It's interesting. I think like working with data is also like an interesting topic like when it comes like to a little names for a couple of different reasons. First of all, SQL was created because it was supposed to be used by.
business people, not technical people, right?
So, like, it kind of
resembles, like, natural
language when you write it, right?
It's a way, like, for
how industry
was, like, trying to create
a BSL that could be used by, like, non-technical
people at the end of the day. Like, that was, like,
the goal of that. Now,
obviously,
things get
more and more complicated as we try
to do more and more things, right?
And, of course, like, when you start
going to these people
who are supposed to be
like business people
or like business analysts
or even like managers
and you're suddenly like
explain to them like
use like terms of like CPEs
or projections
or joins like
what are you talking about?
But it comes out that
it's a good language
for
flames like to generate
and for people like to debug
because they are
usually end up like
writing your logic, and because it is, like, data flow,
the model is, like, data flow driven instead of, like,
decision-driven, like, like, brands-driven.
It's, you will get something back for your question that, okay,
you can spend some time, like, understand what this thing is doing.
Like, you don't have, like, to go through, like, thousands of clients of code,
like, to figure out what's going on.
Now, having said that, at the same time, as with everything else with AI,
people jumped directly, like, to the dream.
And I'm like, okay, let's do, like, text of SQL.
Right? Let's have like Eric saying, go there and be like, hey, how did my product team perform in the last quarter and expect something to come back that makes sense, right?
We're not there yet. I don't know if we are going to get there. I think what will happen to your point. And I think, like, John, what you described is like great is that you need to, you have like this generalist, which is like the model that can do everything.
good enough
but if you want to do it
like really good
as an output like
you really have like
to constrate it
of how it's going like to operate
right and you have to constrain
it based on like the problems
that you try to solve
and its problem is different
so you need like a different context
like it's not like something like
generic that you can just put there
and like it'll solve like every problem
that's where we're engineering coming right
so there are
I think we are at the time where
okay, we need to engineer
solutions. We need to sit down
and for the problems that
we are trying to solve, find the ways
that these models
can operate in
good enough, like
margins of errors and
put them into production and keep improving
as we did in the past, right?
That's what engineering has always been doing.
No difference.
I think one thing that I'd be
interested in both of your opinions on is, I agree that we need to engineer solutions.
I think part of that is in the modeling layer, right? So one of the challenges, if you think about
an LLM writing SQL, is that the underlying data sets are wildly different even for the same
basic use case, right? And so if there was a way to normalize on
you know a basic data model so you mentioned web analytics right well that's actually a fully known
you know there are like standards you can use for that it's a fully known you know that's it's
you know you have page views you have sessions you have whatever right those are all like
almost ubiquitously defined terms right and so in fact if you weren't able to have a
consistent underlying data model, then you would be setting the LLM up for success because
it's, you know, it's not having to try to interpret like, like, you know, wildly different
underlying data models to produce the same result. And I think about the same thing with
frameworks, right? I mean, if you think about, you know, V0 from VERS, like, it's running,
it's generating next apps, right? I mean, that that framework is super,
well-defined. There's a zillion examples, right? And so, like, within a certain set of parameters,
it can do some pretty awesome stuff, you know, like with those guardrails there. So do you think
we will also see a big part of this become a normalization or sort of standardization of
underlying data in order to create an environment in which the LLM is set up better for success?
No. The reason I'm saying that is because I think like,
when it comes like to data and schemas and total stuff has been tried a lot in the past.
And it always like failed because the problem with these things is that it's extremely hard, like, first of all, like to agree about like the semantics, like what it means.
Like there are like actually there's like a very rich literature out there and like scientific research on like how to model like specific domains.
Like, like, especially like in archiving, for example.
Like, if you go there, you will see that depending on, like, the type of, like, medium that you want to use.
Like, they are very well-defined, like, schemas and, most important, like, semantics around, like, how do you digitize, like, book, right?
Like, what are the parts that you break down?
What are the metadata that you need for these things?
Like, there is a lot of word that has been done.
But, like, the problem with that stuff is that it's extremely hard, like, to put human,
want to agree upon these things. And for a good reason, it's not like because we're like
a problematic species. It's just that all these things are very context sensitive and the way
that I will do this thing like in my company, like might be very different compared to like
how Eric does things like in his company. And if you want to agree on something, it has to be
good enough for both of us without causing problems to any of us because of like whatever
exist in there to satisfy like another stakeholder, right?
So it's really hard.
I think like the way that, and there's another thing there, which is continuity, right?
We are not just resetting.
Like the enterprise, like go like to Bangal America.
I don't like, how long like is Bangal America like operating?
For a while they started with like IBM started building like the first mainframes or whatever, right?
It's not like you can go in there and just like remove everything and put,
something new in there.
Like, you need to continue.
You need to continue it, right?
So things that you know, it's really, I think what can happen is like a couple of different
things.
One is either you decide of how models should come up on consensus of like how to do things
and you let the models like figure this out and you don't care at the end of the data
model or you have another layer of abstraction, which is what semantic layers are.
Right, like the whole concept of semantic layer is that, okay, I have my data on my data lake or data warehouse.
I model this thing like in any way I want, but I centralize also like the semantics around the meaning of this data.
So when I'm going to talk about revenue, it doesn't matter if I'm cost us from sales and Derek from marketing.
we are going to use the same definitions of what revenue is, right?
Or we will have multiple different ones,
but we would know which one each one of us is using.
So the solution, usually like to these things is like to add abstractions,
that's like how we've been doing it so far.
And I think that's what is going like to happen now.
The main difference is that so far we've been building the abstractions,
considering one type of entity,
interacting with that, which is the human.
We also have to make into account that we have another entity,
which is the model, and the model needs a different experience
than a human to interact with these systems.
So we don't have only, like, user experience.
Now we need also, like, I don't have a model experience,
whatever, but this is the thing.
All right, well, we have to use our remaining time to talk about type-deaf.
So I know you gave us a brief overview at the very beginning.
But give us type-deaf in like two minutes.
We have more than two minutes to talk about it.
Yeah.
So when we started type-diff, like our goal was to find, like to build the tooling that we need today to work with data.
And when I'm talking, it sounds like very generic, but I'm, we started from like a very all-up perspective, right?
What do we do with the data that we have, like, on our data lake or, like, our data warehouse, right?
So we're not talking about, like, transactional use cases here, like, how you build your application with your performance database.
It's more about, okay, we have collected everything.
What do we do with that now?
Like, how do we build new applications on top of this data?
Traditionally, they're like you're using systems like Spark, right?
Yep.
But Spark has started, like, showing its age because, again, as I said, like, at some,
at the beginning, like these things were like built primarily with like the BI, like the
business intelligence like use case in mind.
So when you try like to build them, builds, I don't like a recommender or like other types
of like applications on top of your data, more customer facing things, it becomes hard to do it.
The way that we've been solving it so far is by using talent, right?
Like very specialized people who can make sure that this thing was going like to be.
working properly, regardless of, like, what we throw on it.
That's really hard, like, to scale outside of, like, the big area in a way, right?
It's extremely hard to go and ask, like, every engineer out there to become an expert
on, like, building and operating, like, distributed systems, especially, like, with data.
So we're like, okay, what's how we can solve that, like, how we can turn building applications,
like with data, like a similar experience
to how like phone end engineers
and the backend engineers have with application
development, right?
What happens with MongoDB and Node.js
becoming like a thing
and node becoming like a thing
and suddenly we have this explosion of like
millions of engineers like building things, right?
But do it for data.
That's how we started.
To do that, we had like to build
pretty much like from scratch in your query engine.
We want to like to use familiar interfaces.
So people can, but they have some experience with working with data, they can already, like, use it.
So we build on top of, like, the PISPARC API.
We used, like, the data frame API as a paradigm because it's a good way to mix together imperative programming with declarative programming.
So kind of have the best of both work, like from what you have with SQL, but also with, like, a language like Python.
And then we had that.
Well, I also like to make it serverless, but then, as we said, like, AI happened.
So now we have, like, a new type of compute.
So it's not all, like, the workload's completely changed.
We don't have CPU is not the bottleneck anymore.
The bottleneck is all about reaching out to LLLM's and, like, hoping that we get something back.
And also, we get something back.
Do we know if this is correct?
Like, that's not like a deterministic answer, right?
So how do we engineer and put things like into production when we have, like,
new workloads. So our next step was, okay, we are going to make inference, LLM inference,
like a first classic engine. And we got kind of object of like, okay, how we can do that
without having to introduce like completely new concept like the engineers. So we kind of introduced
like these new operators in the data frame API where as you have like a join before, now you
have a semantic join. As you have like a filter before now you have like a semantic filter. And
extends the operations that you already know how to do on data, but using both like natural
language and also using unstructured data where something has to be inferred. It's not like
explicit already in your data set. And then reducing all the, removing all the like hard work of
like having to interact with inference engines, figuring out like back pressure, what to do with
failure, all the things that are like extremely painful because
these new technology
are still like young and many
things haven't been figured out yet in terms of
infrastructure but all these
things end up like making
working with them like
unreliable enough to make it hard
like to put into production. So our goal is like
okay, the end
use type dev to build
like AI applications
both let's say like static
applications that they have
like a static execution
graph but also
agentic ones where you can
let a model like decide
what to do based on like
the tools that it has access to.
Do it on data. So it's not like a generic
environment that you can
go and build let's say like any type
of like a genetic workload there.
Like if you want to go scrape the web and
come back with insides
type def and fennick is not
the way to do it. But if you won't like to
implement that on top of your
data warehouse data, then
it's a great tool like to use
and make it also
really fast like to experiment
because it's like very familiar
like to work wings
and when you're ready like to get into production
remove all like the boilerplate
that someone is like to build in under
like the monots the underlying infrastructure
and making things like much more efficient
at the end and more reliable like to put into production
which is like quite a big problem right now
and why like many AI projects are like failing
so I have to digest
Yeah, it's a lot
I know
But it's hard to
Talk about these things
Without using a lot of words
Yeah
You left us speechless
No, we were both on me
Yeah, can you go back to the semantic?
Yes
Can you imagine this like
I want to talk a little bit on the semantic layer
Because
this has been a really fascinating one for me
Because I like your point a lot around like
This we talked about earlier
Historically, you've got BI tools
Now we've got like, we've got maybe agents for first class citizens or people equipped with like AI tools.
It's kind of another class of people.
But back to the semantic layer, like there's a startup that I've followed their journey and talked to their founders a lot.
And it's been interesting just to follow them where they were like really hard like semantic layer.
Like it's not going to work at all without a semantic layer.
And then they were kind of and then like back to that like comment on like talent.
It's like, well, how many companies are in a point where they have a mature enough warehouse
and they have all this organized into, you know, a modeling tool like DBT?
And they have like a mature semantic layer.
Like even that number is like not super high.
And so it's just interesting because even they, I think, have like gone back and thought like,
well, but if we did kind of go back to text to SQL and think about like basically dynamically
generated, you know, semantic layers.
So there's not as much like engineering involved.
in that. So I wonder how many of those like reinventions will happen on like just pragmatically, right, where it's like, okay, this is how it should work. This is how it works best. We're going to have to go back and reinvent practically because like to our tan, like our tool addressable mark is not big enough. So we need to like go. Yeah. Yeah. I mean, I think that a lot of that stuff goes back like to kind of like what we're showing about like the continuity. Right. Like if if you have like a company that has.
has been operating like a BI stack for a couple of years now, right?
They probably have a code base of SQL that already exists there.
And migrating that to like a semantic layer, which by the way, the semantic layer also
needs to depopulate, that monads, right?
Yes, you do add there like an abstraction that can probably make things better, assuming
that it has been curated, right?
And most importantly, curate, like, someone has created it, right?
That's, like, one of the reasons that traditional, like,
the semantic layer is not something new.
Like, has been around, like, for a very long time.
But it was primarily, like, an enterprise thing.
And it was an enterprise thing because the enterprise had the resources, like,
to go and build and maintain these things, right?
Now, can an LEMC help with that?
Maybe, I don't know.
That's like something for the semantic layer people like to figure out.
But at the end of the day, if you come to a team that already spends probably 40, 50% of their time,
asking requests like, hey, I'm trying to calculate this thing.
Do we already do it?
And if yes, where, and can we update it to also add this new thing there?
Because we also have a new source that we want to track SEO coming from related to it.
And tell them, well, you can solve this.
if you go through like a six months project to build a semantic layer and educate also the
whole company that they have like to use whatever we put in there yeah it's like super hard
like even if on paper like it works you have to both change the organization behavior and to
invest like in the technology resources that you don't already have so it's a hard sell right
you need to
I think in my opinion
there's more of a product
opinion you have to fix the problems
that already exist like what people
carry from the past and make
the transition easy if the transition is
not easy to this new world that you are
promising people
wouldn't like
it's too much
and that's like part of like
why we build like type of the way
we did is because
if you try
if you have to educate people
a lot
it's, you put a lot of risk in, like, what you are building.
People don't have time, and you don't have the money also like to do it.
So it has to be something that it's, like, very familiar for people, like, to use and makes it easy.
So all the decisions that we made is that familiar APIs for both humans and machines, right?
Vice Park has been out there, like, for a long time.
These models have been trained on that, so, like, the API is kind of, like, known.
You can go and, like, ask it at the end of the,
they like to build something on our framework and it will probably succeed, like after one
or two iterations just because of this family are like with the syntax. So we need like to reduce
the amount of like effort that people have to put in order to migrate into these new worlds.
Because at the end of the day, like we kind of solve the same problems like in a better way.
But it will like to make this reality happen fast, we have like to help people migrate.
also like fast. We can't just like promise a dream. We'll take them six months of implementation
before they can't even like taste the dream. And that's what we are trying like to do, like
with type of remove everything as much as possible that makes it really incompatible with what
people already know. Like the same way that you would build like a pipeline in the past, like to
process your data. Like you should do the same thing using LLMs without having to
to learn like new concepts.
That's, if we might not like to do that with type Dave,
from a product perspective,
we are going to, I'll call it like a success.
This is going to be a commercial success,
lots of different conversation.
But that's kind of like the goal, right?
Do the things you are doing in the past,
but in a much, much better way,
because now transparently you can't use the LEMS
to do some of the stuff that would, like,
extremely hard like to do before.
But without compromising,
on how you put things in the production,
how you operate things,
and how fast you can iterate on the problem you are trying to solve.
I love it.
Yeah, that was when you were giving me a demo earlier today,
that I think was,
it was actually pretty surprising
because when we talked about
what would it take to productionize this
for the use case we were discussing,
it was just kind of
it didn't really feel that unfamiliar.
Yeah.
That is so.
I mean, this kind of feels,
you know,
this feels very natural, right?
Like, here's all the tables.
You know,
you have a pipeline set up.
So yeah, I was,
yeah, that's super interesting.
I didn't even really think about that.
I just, my main thought was,
oh, that's like,
sounds way easier than I thought it was going.
to sound. So hopefully that's commercial success.
Sure. Yeah, yeah. It's on the way.
A hundred percent. And I think like a positive side effect of using like familiar
paradox is that when things go wrong and of course things will go wrong,
it will be easier for people like to reason about them and like figure out the issues
and fix things. Again, I'll keep like kind of, I don't know, becoming like boring,
but it is engineering at the end of the day. Like we've been spending so much
much time building these best practices, these ways of like operating systems, operating
unreliable systems in a reliably way. We just need like to use the same principles. And as you
said, like put AI in there, but the AI should feel like almost magical. Like it shouldn't feel
like, oh, now everything that I was doing is breaking because I'm trying to use this damn new
thing that's, I don't know why it breaks.
And I think that goes back to what you were talking about with the use case.
Awesome.
Well, we are at the buzzer, as we like to say.
Brooks is telling us we're out of time.
So, Cacostas, I would love to have you come back on for a round two.
And I want to do two things.
Let's talk about some use cases that you're implementing for your customers.
And then the other thing that we didn't talk about that I would love to talk about,
and this is just for me talking with, you know, some of our larger customers.
and their restrictions on even using LLMs,
you know, especially as it relates to certain types of data
is a huge challenge, right?
And I mean, you know, in the startup, you know,
like you were saying, John, okay, this person's like,
you know, just throwing SQL, you know,
probably straight into GPT and, you know,
hearing data, you know, whatever, right?
And it's like, okay, well, you, I mean,
you cannot do that at a large company, right?
And there are like a lot of security,
like legitimate, you know,
security concerns and other things like that.
So I'd love to cover that too, Kastas,
because the types of workloads that you're running
that's clearly a concern.
Yeah, yeah, 100%.
I think a lot of that stuff is
being addressed and like I think
it's getting easier.
I like to use,
like to find solutions that either through
using
let's say proprietary like open source
model that you only run
or use like from the big providers,
but, like, in very, like, secure ways.
Like, it's something but, like, the big, like, open AI and, like,
all these people are, like, this is kind of, like, a solved problem, like, at this point, I would say.
And I would say that, like, most people probably end up using open source models,
not that much because of security, but more because of performance.
Interesting.
But that's, we can talk about that, yeah.
Okay, it's a interesting topic.
Love it.
Thank you so much, guys.
I loved it.
And I'm looking forward to come back again.
Yeah, we'll do it soon.
The Datastack show is brought to you by Rudderstack.
Learn more at Rudderstack.com.