The Data Stack Show - Shop Talk: Kostas Settles the Real-Time vs. Streaming Debate

Episode Date: November 18, 2022

In this bonus episode, Eric and Kostas talk shop around the topic of streaming. ...

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Sack Show, Shop.Costas. We have talked with people who built amazing data technology at companies like Netflix, Uber, and LinkedIn. But you and I actually don't record our talks about data very much. But we actually talk about data together a ton. And so Brooks had this amazing idea of just recording some of the conversations that you and I have before and after the show about data and our opinions on it. And really, this has been one of my favorite things that we do. So welcome to Shop Talk. It is where Costas and I share opinions and thoughts on a personal level about what we're seeing in
Starting point is 00:00:45 the data space. And it really is simple. We ask one another a question and the other one tries to answer it. So without further ado, here is Shop Talk. Welcome to the Data Sack Show Shop Talk, where Costas and I talk shop about all things data and probably share too much of our personal opinions. Costas, I believe that it's my turn this time. It is, yes. You can torture me as much as you want. Any question you might have, I'm all yours. Okay, I saw an interesting company launch.
Starting point is 00:01:21 This was on H news recently and they call themselves a real-time api data connector you basically like i guess it's like sub they they say like sub minute you know sort of you never basically have more than like sub minute drift in between data sources and i mean i looked it and it kind of just looks like a, it looks like it takes, this is just at a very, very high level. I haven't actually read the docs or anything like that. But just based on the architectural diagram
Starting point is 00:01:56 on their marketing site, which we all know is the definitive source of truth for every product's architecture. Yeah. It kind of seems like, you know, almost real-time APIs are almost like a streaming ETL, if you will, right? Like what you would
Starting point is 00:02:12 traditionally load in batch, you're now loading, you know, essentially in real-time with sub-minute latency to some sort of data store downstream. Anyways, this got me thinking, and I'm really interested to know your thoughts on this.
Starting point is 00:02:24 So there are multiple sort of streaming technologies in this sort of vein, right? You know, you have Materialize, right, which is sort of streaming SQL. You have a number of other technologies. And one thing I'm interested to know is, do you think that these will get super wide market adoption or is, say, sub-minute latency really only a problem for a certain subset of companies? Or actually, I have a third flavor. Okay. Do you think that it will become so cost-effective that it just doesn't matter? Right?
Starting point is 00:03:10 Like, why not stream in real time if you can, because it's cheap? Like, right now, part of the challenge is, like, the infrastructure to do that at skills, you know, tends to be, like, pretty hefty. I mean, I don't know. I would disagree a little bit with that. Like, I don't think that it's that expensive to like start working with this system, right? Like they scale up and down.
Starting point is 00:03:37 Yeah. Like if you have a lot of data, obviously it's going to be more expensive, but if there's more data, it means that like there is a reason you have more data. Hopefully it represents also like that your business is generating more data. It's about your growth, right? Like, David Pérez- yeah, I guess maybe another way to say it would be like, cost has
Starting point is 00:03:57 multiple vectors, right? Like if you just set up a set and forget it, like 24 hour job that you never look at unless it fails versus like managing Kinesis, like those are very different. Like, yeah, I mean, okay. I don't know exactly like how we like works like from a quick look at the websites. Like, oh yeah. Moving lake is the tool. Yeah. I didn't even mention the tool.
Starting point is 00:04:23 Yeah. Okay. First of all, we are talking about like a very heavy product, which means like they are still trying to figure out like the product market suite probably, right? Yeah. So we need to keep that in our minds. They're going to be batch ETL, traditional batch ETL in like three weeks when they pivot. Alex Wrigley- I mean, yeah, I don't know. Like I, for example, like if you go to the connectors, like, and select like
Starting point is 00:04:54 the Bank of America connector that they have, you will see that like the entities that they support, like three of them that they mentioned here, one of them is real-time, the other two are not, right? And that's because obviously like Bank of America does not offer like a light seat over accounts and sub-accounts, but it does over like transactions. So that's what you can get like in real time, right? I mean, there's like, there are like, let's say inherent limitations with what like the systems will like to expose in real time and whatnot, right?
Starting point is 00:05:24 Yep. And the way that I see it, from what I see here, yeah, obviously you can push data from your Parcom America API or account to a database, right? But I don't think you're going to be generating that much data. Unless you are, I don't know, I mean, how many transactions a day you can have like on the park account, right? Usually they're like thousands, like maybe millions. So why I'm saying that, like, and what I'm trying to say is that real time, like we need, when we are like approaching these questions, like to start always like from
Starting point is 00:06:07 the use case, like what are we trying to like to achieve with these systems and how we are going to be using the data from the systems. Yep. Right. Like, yeah, like if I need like the transactions to consolidate, let's say all the transactions for my P and L, like at the end of the month or whatever, do I need to do that like real time? Probably not.
Starting point is 00:06:27 Do I need like the transactions to like create a notification so like, I don't know, like a salesperson can do something like as soon as possible? Yeah. Is this as soon as possible sub milliseconds? No, we are still working with humans. Like they are not doing, like, to react like that fast, right? Do we use these transactions, like, to do
Starting point is 00:06:49 HFT, like, high-frequency trading? Oh, yeah, like, then but then again, like, we're talking about a completely different type of system, right? Yeah, yeah, that's it. Yeah, yep. So, real-time, like, traditional, like, streaming is one thing, real-time is another thing, okay? Yeah. Streaming, like traditional, like streaming is one thing,
Starting point is 00:07:05 real-time is another thing. Okay. Yeah. Streaming and like... That's a great distinction. You have pool, sober pool and like all that stuff. Like it's like provide like different ergonomics around like working with your data, right?
Starting point is 00:07:17 Real-time has to do with latency. Like how fast you have like to react to any piece of information, right? Let's say you are the system that scans the sky for inbound nuclear warheads from the enemy. You probably would like to react pretty fast, right? And you want to guarantee that, right? So it's going to be fast. You don't want like one time to be fast, another time being a little slower, you know? Right? So it heavily depends like on like, what are you trying to do with the data that you have?
Starting point is 00:07:56 Right. And what are like the notifications or like the real-time dashboards that you are going to build and who is like consuming them? the notifications or like the real-time dashboards that you are going to build with like consumer. So my question to you as like a marketeer, which like one of like the very standard like go-to market strategies when it comes like to data was like, oh, marketing needs like real-time data. They need, I don't know, like sub-minutes latencies and stuff like that. Is it true? Like, what do you think, like, what marketers need when it comes to data?
Starting point is 00:08:34 Well, I would start out by saying I think that the... I do think some of these technologies are really compelling because, you know, from a marketing perspective or even like a product perspective, you know, you could do the analytics. You've been able to do like the analytics thing, like real-time analytics,
Starting point is 00:09:02 say for quite some time, right? I mean, real-time web analytics or real-time product analytics. You know, you can sort of... Like, there are really great products out there that do that. But also, as it's becoming easier to get more data, you know, together
Starting point is 00:09:18 and sort of to basically compute interesting things with separate sets of data, some of the, like, infrastructure that actually allows you to compute some of this stuff, say in near real time, so that you get, instead of just observing a user behavior and then seeing that in a dashboard, sort of with direct lines like product analytics through whatever pipeline, you're actually doing some sort of compute along the way that includes additional data, which is really compelling, right? Because then you get a lot more insight downstream, even if it's in the, let's just say it's in the same dashboard that
Starting point is 00:09:55 you're looking at, right? Do you have some sort of compute along the way? So that is very compelling because the amount of context and fidelity that you can get is way, way higher, potentially. Still pretty hard to do, actually, like technologically, you know? Or, I mean, it's not like the patterns aren't a mystery, but it's also like a lot of pieces that you have to put together and run and, you know. So I I would say I agree with you that it really depends on the use cases. Right. So let's take an example of like a situation in the real world where real time, you know, or near real time or whatever.
Starting point is 00:10:40 Actually, we should probably discuss like the definition of real time because actually it's sort of at the root of the issue. Well, let's say you have some sort of app that, you know, like a ride sharing app or, you know, whatever, some sort of like transportation thing. Weather can be a really big influence on that, right? So if you think about like customer acquisition from a marketing standpoint, you know, or app activation, right? Like we want to increase usage or get people beyond their first ride or first interaction or whatever that is. Weather can be a big driver for that, right? So rain is coming, you know, go ahead and book your ride or schedule it or like whatever that is, right? You know, from that, so from that standpoint, you actually need to like pull a bunch of data in, run a bunch of computes, and in a pretty quick manner, send out a message to certain users in a certain location to try to get them to take that level of, you know, kind of, let's call it like creating like a personalized
Starting point is 00:11:46 experience based on a high level of context on those particular users, particular situation in a particular location. That also includes a lot of context around like their individual usage of your, you know, service or whatever. So those things, sure. I would argue though that the companies who will truly benefit from that level of detail and that level of infrastructure tend to be like really large companies with really large user bases. Right? Like that's not very common. Yeah.
Starting point is 00:12:29 Yeah, I agree with you. What I would like to add, like, especially because like we started talking about this because of what's the name of the moving link. Moving link. Yeah. And please, like, that's not like, I'm not trying to say anything bad about them. Right. Let's just make this clear.
Starting point is 00:12:49 But sure. Well, all the hyper news comments already did that for you. So there's probably nothing you can. Yeah. Like, to be honest, like I have like huge respect, like for someone who is trying like to build something like this today. Okay. Like it takes like a lot of, how to say that, like, it's not exactly like
Starting point is 00:13:06 an easy to penetrate right now. It doesn't. Like there are many solutions out there, right? Like so I have like huge respect, like for people who are trying to do that. And usually what happens is that like, you need to understand like a little bit how like you start the company, right? And how you start like building a product, like you have like an idea in general, right? Like you know where you want to be, but at the same point, like you need at the same time, sorry,
Starting point is 00:13:37 like you need to differentiate enough so you can have a starting point. Yep. Okay. So yes, you throw something out there, like you try like to create like a new way of, let's say, solving a problem. And that's like, let's say the conversation started with the market. That's what you see here. It's like a conversation starter. Like, hey, like we are solving this problem. Is it important for you? Cool.
Starting point is 00:14:04 Come here. Like, that's how we solve it. Maybe it's not the right way to do it. Maybe it is. We don't know. But sometimes you need to start. And that's what we see here in a company like MoonGrade. And again, huge respect for what they are doing, because this is the ugly part of building a company where everyone can easily have an opinion.
Starting point is 00:14:24 I can very easily say, this is going to fail. Yeah, obviously. Like it's easy to say at this point that like it's going to fail. Right. But that's not the point here. It's not like you're trying to, what you're trying to do is like start like a dialogue with the market until you figure out like what's the real opportunity and how like exactly the opportunity in the market that you have
Starting point is 00:14:45 chosen, right? In this case, it's like data management. So that's how we should see these things. And yeah, Mauro, is it going to be like real time? Is it going to be bots? Is it going to be something else? Maybe it's both, right? We'll see. I think it's going to be very, very interesting now that I have a first impression of what Moby Click is like to revisit that in six months. On another discussion and see where the product is in six months from now. And try to understand what happened in between. Right? Like that's generated the changes that hopefully we're going to observe.
Starting point is 00:15:26 We really should do that and we can like replay clips of this conversation. You know, and then they raise like a huge amount of money and are super successful and then...
Starting point is 00:15:35 Yeah, and hopefully, yeah, like guys, if you're going to do that, like let us know. Maybe, who knows, like we might find like angel investors. Yeah.
Starting point is 00:15:48 I'm not moving enough real-time transactions from my Bank of America account via ADI to to write big checks yes
Starting point is 00:15:55 it doesn't have to be a big check it can't be a small check it's true a check is still a check right that is true that is very true
Starting point is 00:16:03 I will say for one, I'm excited. I'm bullish on real-time stuff. I think as the experience gets better and better and more accessible, a lot of times,
Starting point is 00:16:15 even in my job, we don't need to know stuff in real-time, but it's really nice to. It's really convenient. I think I can go wrong. Look at stuff too often. Looking at numbers too often can actually be unhealthy or a distraction. But when you think about things like campaigns that you're running or product launches or
Starting point is 00:16:37 other things like that, it is kind of cool to see like, is there initial resonance? You know, it's kind of neat. I don't know. I'm excited. What I would add to this is that, like, moving data around in real time is not that hard. Like, what is, like, much more complicated and where it's really, like, gets hard to set, let's say, very strict SLAs is, like, when you have to process the data in real time. If you want to execute very complex queries where you have, I don't know,
Starting point is 00:17:12 like joints between tens of tables and I don't know how many aggregations, blah, blah, blah, like all these things, this is hard, right? That's where things start like really, really hard. So yeah, like moving the data around is one thing, processing the data and making them. For someone like to consume is a completely different kind of problem. Because that's really where all the values created though, actually, right? Like is in, well, I mean, if you're trying to get some sort of insight that requires compute, like that's actually where most of the values created when the data lands downstream.
Starting point is 00:17:55 Yeah. I mean, It depends. Like you're trying to like to build a service that it's more like Zapier, let's say, right? Where you want like to trigger something when something happens. That's one thing, right? You don't need to do like any crazy kind of like processing there, right?
Starting point is 00:18:11 Like it's more about like how many requests you can, like how much data like in the unit of time, like you can, you can process internally the requests. Now, if you want like to get the data and also do very complicated algorithms on the data, now that's a different thing. That's why we usually see in the Lambda architecture, you see you have the bots in the streaming or real-time part of the architecture, where most of the huge cases around the real-time come more to do with like notifications, because in notifications, usually you don't have like to go and process like a lot of like different data, do like a lot of rambling around the data.
Starting point is 00:18:55 It's more about like taking a look into the data and see like, oh, is there something like that I need to act upon because like the temperature is like higher than it should be. You know, yeah, I'm exaggerating a little bit, simplifying things too much, but that's where you see like the, I think for everyone who wants like to understand better, like the distinction between like the two paradigms, like studying inclementations of the Lambda architecture and how like companies did that and for what reason, I think it's an excellent starting point. Totally agree.
Starting point is 00:19:30 Totally agree. All right. While Brooks Chuttle's were at the buzzer, I could talk about this for a long time, but we have so many more shop talks to dig in. And next time it's your turn. So I can't wait to see what you ask. You know, Costas, we learned so much from the data leaders that we talked to, but I learned so much from picking your brain and actually your questions really make me think really hard.
Starting point is 00:19:55 So I appreciate ShopTalk. I think it makes me a sharper thinker. Well, it's fun. Like, I think it's good to just sit and chat about the stuff that we experience. And yeah, I think like, I hope like people enjoy it. That's why I'll keep asking for people to reach out. Please do this. Come on, fuck.
Starting point is 00:20:18 Like, you can do that. Like, send an email. Yeah. And let us know how you feel and like, what are your opinions of like, your experience with the show. So, please do that
Starting point is 00:20:31 so me and Derek, we can keep being happy. Please. Of course. And of course, we try to take the same types of questions to, you know,
Starting point is 00:20:42 data leaders from all sorts of companies, large and small. So definitely subscribe to the main show if you haven't yet. Tons of really good episodes there and tons of really good thoughts from data leaders, you know, really around the world. So definitely subscribe if you haven't and we'll catch you on the next Shop Talk.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.