The Data Stack Show - The PRQL: Kaskada Serving as a Recommendation Engine with Davor Bonaci of DataStax
Episode Date: May 29, 2023In this bonus episode, Eric and Kostas preview their upcoming conversation with Davor Bonaci of DataStax. ...
Transcript
Discussion (0)
Welcome to the Data Stack Show prequel, where we replay a snippet from the show we just
recorded.
Kostas, are you ready to give people a sneak peek?
I am, of course.
Let's do it.
Let's do it.
A fascinating conversation with Davor of Cascada, which was acquired by Datastacks. It caused this
really interesting story about what they envision in terms of Cascada being integrated into Datastacks,
you know, which, you know, sort of operates a lot of stuff on top of Cassandra. So lots of cool
stuff there, I think, for the future. But Cascada is also open source, and it does a lot of interesting
things in terms of making it easier to not only discover interesting potential features
and data sets, but also deliver those and serve those, which is really interesting.
One of the things that I thought was fascinating
about this conversation was the decision to essentially create a new language
as part of the system. Because the system in and of itself is capable of doing some really
interesting, cool things. But they chose to sort of write a language that this is, you know, probably a really,
a really bad way to describe it, but it's almost a mix between SQL and Python, right? It's
declarative, but it's in the flavor of Python, which I thought was fascinating. And so it is,
it really does seem like they're kind of meeting in the middle of these two worlds of sort of the operational side and more of the statistical side.
So that, I don't know, that was a fascinating approach.
I'm certainly going to be thinking about that one.
What stuck out to you?
Yeah, 100%.
I think like the most, there are like two things I keep like from this conversation.
One has to do with building the technology itself
and how part of the problem it is
and why it's not something that can be,
let's say, solved with just stitching together technologies.
But you really need to start thinking in first principles
and build a new system in a way, right?
That's one thing,
but that's, let's say,
the bread and butter of innovation
and technology, right?
What I found extremely interesting
is how important the user experience also is.
And that's what's the connection with what you're saying about the language.
The reason they ended up building a new language is because they were trying to figure out
what's the right way for our users, in this case ML engineers, to interact and work with
the data and somehow guardrail them into figuring out what's
the signal out of all this noise out there, right?
And exactly what you said, like it's, they had to find the good things from all the different
paradigm shops out there and put them together in a way that feel like native to their user,
which is the ML engineer, right?
And the ML engineer, yeah, lives in Python launch.
They use Python.
Like, you cannot change that.
All the libraries are in Python.
No matter, like, how they work with the data,
when they will have to do some processing with the data,
Python will be needed.
So it is important to build the right experiences there.
And we see that the need for these experiences
also drives innovation,
like building a new language
on top of the processing system that we have.
And that's something that I think we will see more and more of in the
data infrastructure space as we try to make like democratize access to all these
technologies, which is probably something that will get even further accelerated
because of all the recent developments with AI and all that stuff.
So yeah, like that's what I keep.
And I'm looking forward to chat again
and see what comes out
from putting Cassandra together with Cascada.
Absolutely.
Well, another good one in the books.
Thanks for listening to The Data Stack Show as always.
And we will catch you on the next one.