The Data Stack Show - Data Debrief: What Open Source Data Projects Have Come Out of Facebook, Whoops, *Meta?

Episode Date: October 29, 2021

On this week's debrief, Kostas and Eric talk about the variety of open source projects that come from Facebook. ...

Transcript
Discussion (0)
Starting point is 00:00:00 Okay, welcome to the third edition of Data Sack Show Debrief. And Kostas and I were just commenting on the blurred background, which I actually think the fidelity is pretty good for both of us. My background's slightly moodier. Yours looks much more blurry than mine for some reason. I think it's lighting. Yeah, you're a more dark kind of person. I'm the light of the show.
Starting point is 00:00:37 That's right, yes. We're talking about like darkness in my heart. Okay, so this is something that I couldn't wait for the debrief. So we've talked with people from some really cool companies. So Uber, Netflix, amazing technologies have emerged from those companies. A lot of cool technology has emerged from Facebook, but it seems like it's just not quite as... It doesn't show up as much in conversation at the very least, at least on the show among our audience. And is that because they're embroiled
Starting point is 00:01:19 in so much controversy? I mean, of course, Uber has controversy, but there are some things from a leadership standpoint, there's all of the on the ground, like legal implications of the actual delivery of the service and employment. But Facebook is embroiled in all sorts of controversy around content, they get deeply involved in politics, because of the nature of what's being shared on the platform. And so just, and for me, being a little bit of an outsider to the data space, hearing that Presto came out of Facebook was a little bit novel to me, I guess,
Starting point is 00:01:52 because I've come to expect to hear that those technologies emerge from other companies. But Kostas, you have seen the space emerge. So what's your take on that? Yeah, okay. Presto is a very interesting piece of technology in terms of like how it has matured and it's not new it's been around for a while and i mean okay i knew that it was coming from from facebook what i didn't know was all this story around trino how trino came up, how the governance of the
Starting point is 00:02:27 project broke into two different projects and all that stuff. So I would say that, okay, on one side, I think that probably Facebook didn't manage that part very well, how they manage the governance of the project and all these things that are very, very important when it comes to large-scale open-source projects. On the other hand, I think that one of the reasons that we didn't hear until now that much about Presto is because it took a while for Presto to become,
Starting point is 00:03:00 how to say that, something that makes sense to be used outside of very large enterprises. And I think that's say that, something that makes sense to be used outside of very large enterprises. And I think that's something that's from the conversation that we had with Justin. And if you hear what Justin was saying was that we started with Hadoop, right? And we had to wait until today to have the data lake at the level of maturity where Presto or Trino or Starburst can actually be used on top of that to become like the query engine that is going to do the analytics. So I think because of the nature of the project itself that was stateless in a way, like it
Starting point is 00:03:41 didn't have storage, it never like was a complete database product right as snowflake was it had to wait until all these data lake related technologies matured enough to become much more available and i think that we will start hearing more and more about this project especially as part of like the this data mesh movement where it naturally fits because of the decentralized nature of the technology. Yeah, well, it's certainly something we're hearing more among Redrack customers, right? Like Presto's requirement is coming up more and more. I think on the other side of the conversation, React actually is something that came out
Starting point is 00:04:21 of Facebook that has widespread adoption. It's less in the realm of the show in terms of data processing, et cetera, but certainly a technology that has seen widespread adoption that came out of Facebook. Yeah, yeah, yeah. They have quite a few, let's say, important open source projects. I think RocksDB also comes from them. But they have, I mean, if someone goes to their open source repositories, there's a big wealth of very interesting projects
Starting point is 00:04:51 that come from them. Okay, of course, not all of them are as relevant as Presto or React, for example, which is probably the dominant framework to do front-end development. But yeah, I think there was some kind of mismanagement around Presto. Yeah, well, I was reading, of course, there's the famous quote, someone at Facebook said, the greatest minds of our generation are figuring out how to try to get someone to click on an ad,
Starting point is 00:05:19 which is certainly a drastic oversimplification. But great for a podcast debrief to bring that quote up. But it is interesting to think about Presto and React and then a number of other things that really the world is benefiting from in many ways as a result of, I guess, those great minds trying to get you to click on a Facebook ad. That's true. That's true. Okay. So second question, because we're running up on time here on the debrief. I don't know if we have a time on these, Brooks, probably for our mental health and other people's mental health, we should keep it to like five minutes, but data mesh. Do you have any updated thoughts on data mesh? I mean, subjective,
Starting point is 00:06:02 somewhat controversial. I feel I have a little bit more clarity, but what's the cost to stake? We probably need like a full episode with Justin to discuss about it. It's still a vague concept for me. It would be interesting probably at some point outside of vendors who are part, let's say, of this data mesh pattern or whatever, to also find someone to chat with who has actually implemented data mesh architecture and get some insights from there. It's early. There is a reason that it exists, absolutely. I'm not saying something against data meshes, but it's something that needs more clarity. And I think we should try in this show to bring this clarity in this concept,
Starting point is 00:06:50 for this concept. Yeah, absolutely. Brooks, mark it down. We need someone who has implemented a data mesh. Kostas, any other takeaways? I mean, that was a fun show. Super interesting. Ah, yeah.
Starting point is 00:07:04 I mean, it's amazing when you get to chat with people that they were doing data 10 years ago and they are like, okay, serial entrepreneurs in a way because it's like the second company that's here. Sure. It's very interesting to hear the perspective from these people that they have lived both eras, let's say, of this market. So it was a very, very interesting perspective.
Starting point is 00:07:32 And I really want to thank Justin for that. description that he gave about the landscape talking about how snowflake you can think of it as the teradata of today five trend like the informatica which makes sense yeah yeah for sure those are great ones but like it takes for someone who has lived through all these iterations of the markets to have this kind of insights yeah yeah for yeah, for sure. One thing, actually, we're just going to run this one long since it's our only, it's only our third one, but the iterable Snowflake connection that you mentioned
Starting point is 00:08:13 and Justin's insight that they're probably also running on Snowflake was really helpful for me. And I'm sure like Justin, it'd be interesting to talk to him just about that subject. I think that that dynamic will create a massive market in and of itself just within the Snowflake ecosystem.
Starting point is 00:08:35 Right. So like if you think about all the companies that are using Snowflake as a data warehouse, and then you think about being able to integrate with other companies who are however they're doing it but presumably running snowflake on the back end yeah so that you have a almost like a marketplace data mart that's readily accessible in your cloud data warehouse like i agree with justin in that like that's not the end-all be-all in terms of the broader data stack when it comes to the complexity faced by Fortune 500 companies. But without a doubt, that's going to be a huge market in and of itself and hot take on the debrief. I think a huge way that Snowflake grows significantly in the next five years just because that type of functionality is pretty huge. Yeah, actually, I would suggest to our listeners to go and read the first pages of the S1 filing
Starting point is 00:09:37 of Snowflake because that's exactly what they describe there. And the way that they describe it is by using the term network effects. And that's exactly what they're trying to create with this data sharing mechanism. Because suddenly you get iterable that has its own customers, that they have a good reason to also use Snowflake, and you create some very, very strong network effects there that if you manage to implement them and create them, yeah, like it's going to be probably much more, how to say that, much more impressive compared to what like Teradata managed to do on the market.
Starting point is 00:10:18 Yeah. I don't think that it's easy to do it, especially when on the other side, you have all this openness that comes with all these open source projects. And of course, companies, especially the big companies, they know exactly what vendor-located means, right? So it remains to be seen if they are going to succeed in this vision, but it's amazing to see how clear the vision is
Starting point is 00:10:46 and how the executor is of going from a data warehouse to a data cloud that has network effects, which is amazing. Like it's very, very impressive from like a business strategy perspective. For sure. Frank Slutman. No wonder the RunnerStack CEO went to work for him
Starting point is 00:11:04 out of doctorate school. One point on that, though, that I will say that's interesting, which is in some sense, history repeating itself, or maybe the beginning signs of it is marketing and marketing slash go-to-market data tend to be the tip of the spear when it comes to data technology, because the needs there create a significant amount of demand within an organization. We've talked about this with a concept of CDPs where it's like, okay, well, the initial CDPs actually focus on like marketing execution and customer journey engagement. When in reality, the original intent was customer data across the entire stack.
Starting point is 00:11:43 And my sense is that you're starting to see that with the network effects that Snowflake is trying to create with the marketplace concept. A lot of the big initial data set availability or integrations relate directly to marketing. And I'm not surprised by that, but it's really interesting to see history repeating itself from that regard in terms of marketing and go-to-market data being the tip of the spear. Maybe that's just, I'm projecting my own experience on that. So check me if that's not accurate. Yeah, yeah.
Starting point is 00:12:18 I think, and okay, that's what I'm going to say. It might sound like a little bit controversial to some people, but when it comes to technological progress, there are two major drives behind it that we humans don't want to talk that much about them. One is marketing and the other is sport. Which, that Venn diagram overlaps a lot. You don't have to go into detail. Yeah, exactly.
Starting point is 00:12:46 So there are like some very, actually outside of like joking, but like I mean what I'm saying, like there is this very interesting fact about Betacam versus VHS in the 80s and VHS winning because it was adopted by the porn industry.
Starting point is 00:13:01 Sure. So outside of like, okay, like the moral or whatever, like issues of like this conversation, there are some very strong drives behind technology. And yeah, marketing is one of them for sure. Like it's the first reason that someone is looking for data.
Starting point is 00:13:18 Yeah. All right. Well, you heard it here in this debrief. That was probably longer than five minutes, but great. Thanks for joining us. Subscribe to the Data Sack Show. If you haven't yet, you can subscribe on your favorite podcast network. And of course, subscribe to us on YouTube here where you're watching this video. Lots more interesting content coming up. you

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.