The Data Stack Show - The PRQL: Kaskada Serving as a Recommendation Engine with Davor Bonaci of DataStax

Episode Date: May 29, 2023

In this bonus episode, Eric and Kostas preview their upcoming conversation with Davor Bonaci of DataStax. ...

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show prequel, where we replay a snippet from the show we just recorded. Kostas, are you ready to give people a sneak peek? I am, of course. Let's do it. Let's do it. A fascinating conversation with Davor of Cascada, which was acquired by Datastacks. It caused this really interesting story about what they envision in terms of Cascada being integrated into Datastacks,
Starting point is 00:00:34 you know, which, you know, sort of operates a lot of stuff on top of Cassandra. So lots of cool stuff there, I think, for the future. But Cascada is also open source, and it does a lot of interesting things in terms of making it easier to not only discover interesting potential features and data sets, but also deliver those and serve those, which is really interesting. One of the things that I thought was fascinating about this conversation was the decision to essentially create a new language as part of the system. Because the system in and of itself is capable of doing some really interesting, cool things. But they chose to sort of write a language that this is, you know, probably a really,
Starting point is 00:01:26 a really bad way to describe it, but it's almost a mix between SQL and Python, right? It's declarative, but it's in the flavor of Python, which I thought was fascinating. And so it is, it really does seem like they're kind of meeting in the middle of these two worlds of sort of the operational side and more of the statistical side. So that, I don't know, that was a fascinating approach. I'm certainly going to be thinking about that one. What stuck out to you? Yeah, 100%. I think like the most, there are like two things I keep like from this conversation.
Starting point is 00:02:02 One has to do with building the technology itself and how part of the problem it is and why it's not something that can be, let's say, solved with just stitching together technologies. But you really need to start thinking in first principles and build a new system in a way, right? That's one thing, but that's, let's say,
Starting point is 00:02:32 the bread and butter of innovation and technology, right? What I found extremely interesting is how important the user experience also is. And that's what's the connection with what you're saying about the language. The reason they ended up building a new language is because they were trying to figure out what's the right way for our users, in this case ML engineers, to interact and work with the data and somehow guardrail them into figuring out what's
Starting point is 00:03:06 the signal out of all this noise out there, right? And exactly what you said, like it's, they had to find the good things from all the different paradigm shops out there and put them together in a way that feel like native to their user, which is the ML engineer, right? And the ML engineer, yeah, lives in Python launch. They use Python. Like, you cannot change that. All the libraries are in Python.
Starting point is 00:03:40 No matter, like, how they work with the data, when they will have to do some processing with the data, Python will be needed. So it is important to build the right experiences there. And we see that the need for these experiences also drives innovation, like building a new language on top of the processing system that we have.
Starting point is 00:04:02 And that's something that I think we will see more and more of in the data infrastructure space as we try to make like democratize access to all these technologies, which is probably something that will get even further accelerated because of all the recent developments with AI and all that stuff. So yeah, like that's what I keep. And I'm looking forward to chat again and see what comes out from putting Cassandra together with Cascada.
Starting point is 00:04:39 Absolutely. Well, another good one in the books. Thanks for listening to The Data Stack Show as always. And we will catch you on the next one.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.