The Data Stack Show - 196: Why Big Query Was a Big Deal, Observability AI, and How AI is Like a Guy at the Bar, Featuring David Wynn of Edge Delta

Starting point is 00:00:00 Hi, I'm Eric Dotz. And I'm John Wessel. Welcome to the Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Welcome back to the show. We're here with Dave Nguyen. Dave, welcome to the Data Sack Show. We're excited to chat with you.

Starting point is 00:00:37 Absolutely. Glad to be here. All right. Well, we know you work for Edge Delta and Observability, but give us the brief overview of where you came from before that. Oh man. Well, if we want to go back far enough, there was a cold and snowy night in February of 1984 and a cry rang out at four in the morning, which is unusual for that time of year. But if we fast forward a little bit from there.

Starting point is 00:01:01 So I've been a lifelong geek and I bounced around a number of different places from doing econometrics at UPS headquarters in Atlanta to hopping around a few startups in Silicon Valley with some ETL software and some observability software. And then I was at Google Cloud for a number of years doing all things there, both on the compute and on the data side. But what until I finally am here at a new startup, where we're doing observability. So a little bit of this, a little bit of that. Very cool. Nice. So one of the topics we talked about before the show was your time at Google. And we talked a little bit about BigQuery. So I'm interested in digging in a little bit more there because you were at Google, I think, during some of the crucial years where the best data warehouse products that exists in the clouds because it's the closest you can get to a sql api with really not having to worry about any of the

Starting point is 00:02:15 back end right um no knocks on athena athena is great for what it does don't get me wrong but if i'm gotta open another tab and start managing all of the S3 stuff and have my, all my parquet files in just the right format. I'm not having the best day if that's what's happening. So BQ just makes it super easy to dump data in there at the appropriate time in the right increment in the right spot. And you can just go about and start querying it, whether you're talking to add the gig scale or at the petabyte scale, it just doesn't matter. So I found that really slick.

Starting point is 00:02:49 Awesome. Well, tons to talk about. Let's dive in. Yeah, let's do it. Dave, there are so many topics that we want to dig into. I'm actually interested about your, so you studied economics and then right out of school, you did econometrics work at UPS. What did you do there?

Starting point is 00:03:09 And what does an economist hired by UPS figure out for them? So it's a great question. I joined, actually, I was hired to be part of one group and was transferred to a different group before my first day. Yeah, nice. But the way this worked is that I joined the forecasting team. And this is 2008, where we were coming into a giant recession. And the thing that this team was responsible for was predicting as far out into the future as possible. How do we know how much we need to ship where and when? Great, so we need to maintain all these time series

Starting point is 00:03:46 that gives us forecast around all these things. In the past, the models didn't actually have to be that sophisticated because UPS more or less tracked GDP, period, within a percent or two. So not that hard to predict. You could just sort of hit the button. Until 2008, wait a minute suddenly some things aren't quite working right so a number looks slightly different so would they there

Starting point is 00:04:15 was a guy who thought that maybe we could bring in some more econometrics into the forecast to figure that out they hired a phd in order to implement some of the key ideas. And they hired a grunt to do all of the work in Excel to make scale happen. And I'm not going to say which role I served, except that I don't have a PhD. Let's go with that. Wild. Okay. So, I mean, dig into that a little bit more. What did you, I mean, what did you find? How did you address the problem of, you know, sort of your core metric that had been used to forecast the business changing? Oh my gosh. Well, so we're almost thinking about it more on the operational side of what I was responsible for.

Starting point is 00:04:55 So we had a number of time series that we published as a team to the wider organization. This is before we were using a database. Like someone passed around a CD of SQL, you know, SQL server 2005 as like, Whoa, maybe we should try this. And we were before that. And in that sense, we were sharing Excel files about what were the different series that we maintained the PhD who is still there and is doing tremendous. Tremendous work. As I understand it, her name is a Juana Mazzara and she's great.

Starting point is 00:05:26 She would literally read academic papers and books and things that were published and translate the different processes that were available into stuff for time for a time series that would matter. And so that would work once. And then I had to figure out how to do it more for all of them. And this involved a lot of VBA that I was very grateful for IntelliSense. No matter what level of intelligence you would consider 2008 IntelliSense to have just to help me keep moving and get going.

Starting point is 00:06:02 But a lot of these operations were very annoying, right? So you can't just fill down across the whole row because you've got month resets, right? And you've got different moving averages and you've got different, all different like little operations that, if I was doing it today, I probably would have functionalized everything

Starting point is 00:06:20 and gone about it that route. But I wasn't quite that smart. Get younger Dave that had fewer, I'm going to call these blonde hairs on my chin. Younger Dave that had fewer blonde hairs didn't quite know that. So we were trying to do everything half in GUI land and half in VBA. And that led to a lot of files with a lot of very particular changes. And you know, it was a job. Yeah. Yeah.

Starting point is 00:06:44 Okay. Just out of curiosity, because I want to jump actually to the, there's so much to talk about with Google, and I know John has a ton of questions, but time series data at that scale is very interesting, right? And so the tool set that you're talking about sounds, you know, particularly painful.

Starting point is 00:07:01 What stack would you use today, right? I mean, there are like time series databases, you series databases like Influx and other tools like that that are really good. Maybe that's not the right tool for what you were doing, but what stack would you put together today to do that? And before you answer that, what percentage of the time did you spend waiting for Excel to become responsive again after you made a critical change and you didn't hit save? So I'll start with the latter question first.

Starting point is 00:07:28 None at all, because I was managing 250 Excel books. Why would I put these all in one workbook when I could do this copy and paste 250 times? Got it. So performance problem solved. Yes. That is the original brute force. Wow. Okay. I is the original brute force. Wow.

Starting point is 00:07:46 Okay. Yeah. Wasn't expecting that. To answer your former question, I've become a, and this was actually true with a different issue that we had when I was there where we had a Microsoft Access application that wasn't idempotent and it needed to be run on a scheduled basis. And there was a problem with it that we didn't notice because the non-idempotency had messed it up and it wasn't until a different consumer that was rather important noticed that our forecasts were the same week to week for a few weeks now.

Starting point is 00:08:21 And we were like, oh, so, so redoing that now, I mean, I've become much more on the idempotent train, much more on the functional train where I would be trying to bake as much of those as we can. Storage is cheap, which was even true enough then, but it's much more true now. And so why would we bother trying to mutate state in place when we could just have a much clearer lineage about how these things get transformed from place to place. So it's all of the really cool and interesting stuff we talk about, like originizing a project correctly and making sure that your tables are well named and all that other good stuff where you don't just hope that someone else can read VBA and has your brain. I often wonder if weather forecasts ever work that way

Starting point is 00:09:05 where like somebody somewhere, it's like, oh, like, you know, the weather forecast is the same as it was last week and somebody didn't run something. Yeah. You know? I absolutely would believe that. At a local level, that's gotta be possible.

Starting point is 00:09:16 For sure. I often wonder if the precision is cut off because there's like moss on the thermometer and it's just like, sorry, we can only get to one decimal point. Hopefully that's good enough. Yeah, it's like moss on the thermometer and it's just like sorry we could only get to one decimal point hopefully that's good enough yeah it's like literally an instrumentation problem yeah it could be it's probably why they did it at airports because you know they have to yeah those instruments clean yeah good point yeah i mean you got to think about the meteorologist you

Starting point is 00:09:38 know who it's raining outside and the forecast comes in and it's different than the actual. Absolutely. Have you guys read that article? Gosh, it was on Hacker News the other week about crazy real life bugs. And the bug was the Wi-Fi works when it's raining and not when it's not. No. Fantastic. So it's worth looking up to hear the whole story. But basically, there was a guy who came back in from college and he was always tech support first thing, but his dad was also very capable. And he was like, yeah, I don't know why, but the wifi only works when it rains. I haven't looked into it yet. And right. Which depending on where you live is like, well, you're not logging onto the internet that much.

Starting point is 00:10:17 For sure. Well, and also it's backwards than what you would expect in the whole host of things. So the long story short is that they installed, they got their internet from a microwave beam that they were getting from an opposing house. And what had happened 20 years ago was that someone had planted a tree. And so when it rained, it weighed the leaves down enough to be clear. And when it stopped raining, it was just enough to block most of the signal. That's awesome. I love that. So when it snows, it's just like perfect internet.

Starting point is 00:10:51 I would imagine so. As I recall, the article wasn't there for wintertime, so I can't speak to that, but I would imagine. Man, that is so great. Okay, well, just a reminder to the listeners, we're here with David Nguyen from Edge Delta, and we're chatting about Wi-Fi signals, BBA, and forecasting, and breaking up Excel workbooks. But John, you had a bunch of Google questions. I have some too. But David, just give us a brief overview of your time at Google. Cause you worked on all sorts of stuff, but how long were you there and what were the sort of the biggies?

Starting point is 00:11:28 Yeah, I was there for about seven years. We started, I started when there were enough customer engineers to fit in one training room across the entire world, which we did once, and we didn't have enough training material to last the entirety of two days. So half of a day was scheduled for five-minute lightning talks from every person in the room, which was fascinating because it could be on any topic that you wanted.

Starting point is 00:11:51 And so that was the vibe. It was young and it was fun. We also didn't have all of the enterprise things that people routinely demand when I joined, like being able to directly peer to Google. That was a pretty big one. That didn't exist at that Google. That was a pretty big one. That didn't exist at that time.

Starting point is 00:12:11 This was also before Kubernetes and before GKE and before various other things. So I was out there talking to people directly about architecture, how they could migrate to the cloud, how they could re-architect so that things might be more effective across the entire suite of products. I like to joke that the job was not that hard. All you had to do was know the some 300 products that we had, know the some 400 products that AWS had, know some 500 open source offerings, and how they all fit together in every conceivable scenario. It's not that big a deal.

Starting point is 00:12:43 But that, you know, that led to an interest in basically all different flavors of stuff. Because at some points I was territorial, where I would cover the entirety of the West Coast, because that's how territories go when you're early, to smaller and smaller territories. And then I started focusing on an industry because I have tried to quit video games several times. And I'm sure I'll succeed one of these days. But I figured maybe I should make that more of a job thing, uh, because we had several notable gaming customers at GCP Niantic, uh, makers of Pokemon go was probably the biggest one earlier on, but there have been others, uh, like unity and apex legends and various other things have also used different degrees

Starting point is 00:13:20 of Google cloud, which I may or may not have had a hand in. And so, yeah, like, that's where I was for most of that time doing the customer-facing architecture side, and then also doing a little bit of partner stuff as well. Nice. So, Google Cloud, I mean, that is probably the most, like, fundamental, like, if I picked, like, seven years, you know, to be at at google cloud it seems like those were some of the most transformational years totally like what so we want to talk about more about i would argue today is pretty well we'll see we'll see we'll see yes oh we could probably have a chat about that for sure but yes it was definitely big times yeah for sure so so

Starting point is 00:14:03 we want to talk about big query but were there any other like you know in your years there were like a product comes out you guys are introducing a new product when you're like like wow this is going to be incredible or maybe just we could talk big query or if there's some other product that you felt that way about inside the ecosystem i don't think so i think bq is really the one that i'm the most enamored with, just because it delivers so well on the core promise that solves so's too complicated. And I tried to understand it for a couple of halves there. I had it on my OKRs to try and figure this thing out. And usually when I had trouble, I would go ask the team and they'd be like, go read the source code. And I'm like, the last Java that I saw was at JL Mann High School and computer science AB and I cannot read that was

Starting point is 00:15:05 into like Java five or something. And where they didn't have decorators and I don't know what any of the syntax means anymore, so no, thank you. But I also knew enough about it to advise people on what the architecture patterns they needed and common pitfalls they were. But I, even the Python SDK that they built, I think was just a little bit beyond what's pretty reasonable for people to get. So I think that BQ hits the right thing. I think VMs are very commoditized. I think GKE is great and is definitely probably the best Kubernetes platform,

Starting point is 00:15:36 but I mean, that's borderline commoditized as well because everybody's doing Kubernetes. Well, but I think you bring up a good point that I think a lot of companies struggle with is like they can have a brilliant solution to a problem that is not accessible enough to enough people to make a difference. For sure. Right? And it seems like you're saying that the BigQuery

Starting point is 00:15:57 kind of hit that like brilliant solution to a problem and very accessible to a large number of people. Yeah. The only challenge that you would really have is migrating the data and getting it in there. That was really the only one. Because if you have a petabyte capable system, your next problem is getting a petabyte of data in there. Sure. In order to make use of it. Yeah. Sure. What I'm interested, we actually

Starting point is 00:16:23 interestingly enough, we have not talked about BigQuery much on the show, which I love that in recent shows, we've covered a bunch of topics. We got into the other day, we got into the details of SAP HANA. Yeah. Details of that. Yeah. You know, that was great. Yeah. Got some hardware.

Starting point is 00:16:42 Yeah, totally. That was awesome. That's good stuff oh yeah totally it was great but in terms of you know when you I think there's sort of this perception of like you know you have Snowflake

Starting point is 00:16:54 you have Databricks and then BigQuery is you know the third one on the list but all the headlines go towards Snowflake and Databricks and you know I mean part of that could be because Snowflake and Databricks. And, you know, I mean, part of that could be because Snowflake and Databricks are sort of that's the main thing they do, whereas Alphabet is gigantic and Google Cloud is, you know, a sprawling list of products only to be

Starting point is 00:17:18 dethroned by the AWS, you know, portfolio of products. But in terms of sort of the Snowflake, Databricks, BigQuery, give your perspective on that. I'd be interested. And one other thing here, like think about what we're talking about. We should be talking about Microsoft Azure SQL, AWS Redshift, Athena, whatever, and BigQuery. That should be the conversation.

Starting point is 00:17:42 Man, I need to get out of the social data sphere and stop reading the headlines about Battle Royale. I think it's a really significant conversation that there's clearly who should be. And then Oracle.

Starting point is 00:17:59 We just skipped over Oracle. Those are the four people that should be in this conversation. Only one of them is, which is a big deal like like you know snowflake and data bricks are great too but it's a big deal that big queries in that conversation and i'd be interested if like if you have any thoughts on like why how did that team win how did that team beat out you know all these other products that should be just as viable, theoretically? So first, I'm not going to dunk on Databricks or Snowflake. Those are both great products. Oh, yeah.

Starting point is 00:18:31 And I've lost to, I'm not going to say which one, but I've lost to one of them more than I would care to admit when I was studying BigQuery. Sure. The challenge that I think comes with it is, especially when you're talking about a hyperscaler, there's a question of how much do I have to commit in terms of getting a return on what this is, right? Because if you're running most of your application in AWS or in Azure,

Starting point is 00:18:57 you probably will just use whatever they have and kind of suck it up and deal with it, right? And GCP for most most people for better or worse i would argue worse but that's not what we're here to talk about we'll not have gcp as their default and so we'll miss out on what goodness that this could provide so i think that really is what holds people back whereas you look at snowflake you look at databricks a big core value prop is multi-cloud do the whole thing thing, it doesn't matter where. It's like, yep, that's not a thing that BigQuery could for a long time talk about. They just recently

Starting point is 00:19:31 got towards that in the last couple of years I was there with federated queries and stuff. But even then, now you have even less of a tie to this platform that I don't want to know if I have to go learn and figure all out. And I have to give some empathy to that because here's a little bit of humble pie that I'm going to go ahead and talk about eating. I was in Google Cloud for like a bunch of years. I think I'm a pretty sharp guy, mostly. I thought I understood what cloud was.

Starting point is 00:19:57 I hadn't really dabbled with AWS until about a month and a half ago, too terribly much. And I was very humbled at how different these two things were in so many respects. And I can see, perhaps, a lot of the architectural decisions they've made of like, oh, I see how they got there. I don't understand why I have to open so many browser tabs. And I don't understand why all of the instructions are out of order. I don't know if you guys already know AWS, but trying to learn AWS in 2024 is insane

Starting point is 00:20:33 because there is no from ground zero tutorial out there that is up to date. They're all half old with APIs and stuff. It's a monster. It's an absolute monster. All I have to say is that at GCP, that doesn't work. It's a monster. It's an absolute monster. All I have to say is that GCP, that doesn't really exist. Someone is in charge of making sure it all

Starting point is 00:20:49 works together. Boy, is that a change. But I recognize that other people don't want to take on what is they expect to be that madness times two, if not more so. I hear it. This was almost 10 years ago, but I had to buy Pluralsight classes to get through some of it. So I hear it. Yeah. This was almost 10 years ago, but I had to buy like

Starting point is 00:21:06 Pluralsight classes to like get through some of it. So I did a bunch of digital, like modernization efforts, you know, almost 10 years ago now. And even then the documentation was either fairly inaccessible or just like you said, I like, I don't know. So I just got a Pluralsight subscription and like, you subscription and walked through some of the classes. ChatGBT definitely failed me in terms of trying to get me up to speed on AWS because it was

Starting point is 00:21:33 some number of versions behind me. You should have asked Alexa. I don't own an Alexa device. Or one of the Anthropic models maybe would be better. Maybe they train those on the Amazon manual. I should really sign up for Cloud. I understand that one to be a bit more linguistically advanced,

Starting point is 00:21:53 though not technically advanced. Yeah, that's what I've heard. It is. I mean, one thing that I just to return to what you said about the system working together and then also considering what you said about the system working together and then also considering what you said about I don't

Starting point is 00:22:10 remember the specific name but Google's implementation of Beam we don't have a lot of people on naming they're not very good at it I mean that's like the hardest that's a very difficult thing to get really good at especially with that level of product you know catalog but I mean there is a lot to be said for, okay, this is in a combined platform. And if I were just going to go to market, and I could buy anything I wanted, and, you know, build this perfect thing, that's great. But the reality for a lot of people is like, whoa, and these things work together. And so even if it's not ideal, it's just a pipeline, right?

Starting point is 00:22:51 It's going to run. And so did you see that dynamic a lot where it's like the advantage of a connected ecosystem can outweigh the challenge maybe or like the rough edges of an individual product. Yeah. I think a lot of GCP customers can testify to that for sure. And I think that has to do with the different development approaches that the different hyperscalers have. So AWS famously built on two pizza teams, right? You got

Starting point is 00:23:17 features and stuff need to be shipped on like a relatively small number of teams. What that means is that your interface boundaries grow a lot and what we see in 2024 if again you're coming to this new is there's so many check boxes they're so out of order they so have different expectations around all these because this team built that checkbox this team built that checkbox this one did this and you can feel it whereas in gcp like it just someone is in charge of the console and the flow of it and it just makes so much more top-down type of coherent sense that yeah whether

Starting point is 00:23:54 or not the dashboarding solution inside of gcp is like the greatest thing since sliced bread it definitely works and it definitely plugs straight into bigquery and takes advantage of a ton of optimizations that they have under the hood that keeps everything fresh in a way that is harder to do when you're not. Shout out to all of the dashboarding solutions that do great stuff, not trying to knock any of them, but there's just more cohesion that you can take from that perspective. Yeah.

Starting point is 00:24:20 Okay, one more question for me on Google Cloud. Do you think that Google's, and this is a, how do I want to ask this? So Google's different business units, you know, at least I've never worked for Google, but just from my experience, you know, sort of building some technology on Google in a previous life, like even product, like individual products can have like parts of them that are pretty disconnected, not to the level of the Amazon sort of two pizza checkboxes are out of order. But one interesting thing, at least as a user of BigQuery, I use the Google Cloud to scaffold

Starting point is 00:24:56 out a bunch of personal projects. And it is very approachable. Just even using their different APIs and other things is very approachable. And so you can build out a project really quickly, right? And just everything works with BigQuery and it is super nice. Do you think that comes from Google's competency in consumer-facing products, right? I mean, that's really where they came from was deeply consumer-facing. Like Gmail. Yeah, exactly. You know, search Gmail

Starting point is 00:25:25 where there's this significant emphasis on, you know, sort of emphasizing like simplicity and flow. Or is that disconnected? Because you could tell me either way and I wouldn't necessarily be surprised, but I'm curious. Obligatory disclaimer here that all opinions announced here in this podcast

Starting point is 00:25:41 are solely the property of David Nguyen and not of any particular analysis of any entity. And this does not constitute as investment advice. This does not constitute investment, legal, or technical advice. Please consider everything I say stupid. I don't think so. I think, because here's the really interesting thing about Google Cloud. What was Google Cloud's first product? Do you guys remember? Storage, but I actually don't know. So that's S3 you're thinking of. S3 was the first product for AWS,

Starting point is 00:26:12 which was released in 2004. But not for Google Buckets. But not for Google, no. Google's first product was App Engine, which is the entire development platform built in one. Now, the reason for that is that is how Google developers work internally. And so the idea was down to the part

Starting point is 00:26:35 where they actually run it on infrastructure inside of board, inside of the thing that runs Google. This is the development model that we use here. Everyone should use the development model here. That didn't catch on for a lot of reasons, partly at least because people would have to rewrite a lot of applications they didn't want to rewrite. So, oh, okay. Maybe if we want to go get this market faster and and more directly i think aws had a much better approach to that where it's like

Starting point is 00:27:11 let us give you exactly what you are familiar with it teams and we will slice it up for you and charge you by the slice and have a nice little thing right here whereas google tried to bring a bit more of the google way of doing things. So when we talk about projects, which I do think is a meaningful boundary that they drew in GCP early on, I think that was more of a happy accident from the way that App Engine was structured. Because I do think it's a much more coherent way to organize stuff

Starting point is 00:27:40 than, I mean, does AWS have a project boundary now? I feel like you can do some things with ACLs and stuff like that, but mostly it's still just like, I hope you logged in with the right account because here it goes. I don't know. Azure has resources that are kind of like projects. Google has the most clear boundary.

Starting point is 00:28:00 I mean, you can tag things in AWS and you can have different accounts and you can have a unified account with sub accounts. But beyond that, I don't know. Yeah. It is really nice in Google, though. Like the other day I had this, you know, 250 page PDF exported from a note taking app on an iPad. And I was like, I don't know why I wanted to experiment with OCR stuff. But Google has some really, very cool products around that. And I mean, spinning up a project, because they're required because those are pretty heavy.

Starting point is 00:28:35 And so they require you to add billing. Well, I mean, we could discuss why they require you to add billing. That one makes sense because if you really hammer the system. But I was like like this is unbelievable you know like i ran a test in a couple minutes you know it was like super cool yeah it's good stuff highly recommended especially if you like light blue it's got a pretty tight theme on there yeah it does i will i will go on record uh i assume sundar is listening to this sundar i'm gonna go ahead and tell you something that I didn't get a chance to tell you in person,

Starting point is 00:29:07 which is that the old logo for Google Cloud was better with the rivets. It should come back. I recognize it didn't have all four colors and that maybe is branding standards and like as a thing, but it felt nice. Anyway, that's my high horse. I'll just step up and step off real quick.

Starting point is 00:29:22 And Sundar, the Data S sack show has a message for you. We would love for you to come on the show and talk about data at Google. The Google logo, you know, and the Google logo. Really? You set the agenda. It's great.

Starting point is 00:29:36 Yeah. Yes. Yes. Uh, okay. That was great. That was great. Just as a reminder to the listeners who were driving and trying to look at maps

Starting point is 00:29:49 and Twitter at the same time, we are here with David Nguyen from Edge Delta, and we're talking about all things Google Cloud. What was next on the list, though? Well, we got to talk some about observability, for sure. Yes. I want to put this in here because i'm curious about your take on bigquery so open source table format specifically iceberg is making a lot of splashes big splashes which and the concept is great right like you can have this open storage concept that can be an s3 or GCP, like whatever storage you want,

Starting point is 00:30:25 and then you're less locked into all these products. So then that pushes all the battles up to the compute layer, right? So you got a Snowflake engine, you got a Databricks engine. It's good for the consumer. It's good for the consumer, allegedly. So where does GCP stand with that? You think, this could be a thing today. Can you use GCP today to access data in Iceberg? You know, that's a great question, but I'm afraid

Starting point is 00:30:51 Iceberg came along a little bit after I left GCP, so I am not sure that I'm really equipped to answer it. I'm asking Google. Okay, perfect. I mean, directionally, it seems like directionally, right, that would be that I'm asking Google. Okay, perfect. This is obviously where we have to ask Gemini. It seems like directionally, right?

Starting point is 00:31:08 That would be that, okay, Amazon, Oracle, Microsoft, this is your chance. Have a query engine that basically is just compute and access data and iceberg, like ready go. Do you think any of them will do it? I mean, functionally, that's what Athena and BigQuery already do, right? They have separate compute stacks

Starting point is 00:31:30 on top of some either proprietary or non-proprietary formats that they can spin up at will. But embracing a new open source standard is really the question. Obviously, they're capable technologically, but will they play in the iceberg? Yeah, I mean, they took on Parquet. I don't see any reason they wouldn't take on Iceberg. Obviously they're capable technologically, but will they play in the iceberg world?

Starting point is 00:31:45 Yeah, I mean, they took on Parquet. I don't see any reason they wouldn't take on iceberg. Yeah. Right, like it's invariable. The much more interesting questions to me are as we evolve our understanding and our practice of what we need to do as data analysts and data engineers, how does that change what we need to do?

Starting point is 00:32:06 Right? Cause I just came back from monitor Rama in Portland last week. A shout out to the organizers. There was a great conference, much appreciated. They're one of the dominant themes because in observability, we've got this concept of observability data, but it's not in tables, it's not an open format, it's not in anything like that. Like there is this concept of open telemetry, which does standardize the

Starting point is 00:32:30 line protocol a little bit and has a, an agent associated with it, but mostly there's a whole bunch of all kinds of stuff floating around here from log lines to time series data to trace information, which is sort of logs, but with parent IDs and back and forth and stuff the old approach was what i call the patrick star model of why don't we take all of the data over here and put it over here so that at least then i don't have to go to 80 000 machines if i want to see if something went wrong that that's like a level one improvement for sure and was viable at gigs of data per day but now we've got terabytes of data per day now But now we've got terabytes of data per day.

Starting point is 00:33:05 Now we've got hundreds of terabytes of data per day. Now we've got some of the big organizations are generating a petabyte of observability data per day. And it's like, we've got to take this one step back and think about, okay, what are we doing here? Yeah. Because you just can't move that all across the wire fast enough to matter and so edge delta is obviously helping to push this forward but there

Starting point is 00:33:35 are other people that are in the same vein of like we need to push that distributing down as far to where the data can be as possible so we can do aggregations and filtering and routering and stuff where all of that data is created. That's one method to think about it. But you know, we might even have to think about what kind of data it is we're making and how do we use it? Because man, we've got to, we've got to tie the dots on these things. I'll can I talk about my worst meeting at you don't have to say yes, but maybe I'll go ahead and say it anyway.

Starting point is 00:34:06 When I was at UPS, one of the things that inclined me to get out of data analysis was I had been given a charge and put together a dashboard and an analytical report on, I honestly don't even remember what, I remember working on it for, I think it was a week or something, you know, good chunk of time, particularly as a young guy who didn't know what I was doing. And then I walk into the meeting and I start talking and I can see the guy's face change right away. And within 30 seconds, he stops me and he says, David, this looks great, but I wanted to let you know, this isn't what we're looking for. I wanted to see this and this and this.

Starting point is 00:34:48 And I was like, huh, that was a week's worth of effort for a whole host of things that I thought were interesting bubble up style that I'm now being told to do a little bit more top down style in a different direction. And it's like, huh. Something about this went wrong that I wasted a week here. Did he just ask for hard-coded values? He's like, can you just code these values? I want it to be up and to the right.

Starting point is 00:35:14 Man, I really wish I remembered the specifics, but I mostly remember his face. Yeah, it was like, you know, if you go in with data to present something, and in the first 30 seconds, that big blinking red light on your internal dashboard is like, we've lost the audience here. Yes. There's such a thing that I'm always interested in ways that we can tie this type of stuff closer together. And I feel like as analysts

Starting point is 00:35:45 and engineers, we can get a little bit caught up in the properties of what this is without thinking enough about how it ties back to the greater objectives of what we have that's actually going on here, right? And I think the next wave you'll see in observability, but honestly, a little bit from the analytical side as we start taking more and more control over our data via open formats or what have you is this needs to line up to the thing that we all need to do here so how do we tie those things together so that we don't burn any cycles that we can miss on yeah for those keeping score at home by the way talking about open table formats straight from gemini yes you can query apacheberg data in Google Cloud using BigQuery. BigQuery supports

Starting point is 00:36:28 Iceberg format through Big Lake Metastore. Big Lake Metastore. They're up in their naming game. Big Lake. Yeah. That's kind of cool though. Yeah, I kind of like it. I like that they made sure you knew it was big. Yes. And lake. They got the data lake. Big, it's a lake.

Starting point is 00:36:44 What's the largest lake? The number of customers that wanted to change the name of data lake, they're like, we don't want a data lake. We want a data ocean. We want a data galaxy. And you're just like, yeah, man, absolutely. Keep going. So our conversation about observability reminds me of something that we've talked about with

Starting point is 00:37:05 RudderStack and AutoTrack. You remember? Oh, AutoTrack. Yeah. Right. So there's this problem. I'll let Eric describe it. But it's what you're saying.

Starting point is 00:37:13 We're like, hey, let's just like go in and collect everything. Right. And you have this decoupled like technical team, like, I don't know what's useful. I'll collect and store everything. And then like downstream, you know, and business team that's like, I don't know what's useful i'll collect and store everything and then like downstream you know and business team that's like i don't care about any of this stuff and it's not like i might care about some of it is i literally will never care about this piece of it so every moment wasted engineering collecting tracking storing retrieving is complete waste well the interesting thing about that so the context is is, you know, so RutterSAT collects

Starting point is 00:37:45 user behavioral data, you know, so telemetry from like your website or app, etc. And early on in the life of the company, they had experimented with an auto track feature, which is basically you install the script on your site, and it just tracks every change in the DOM on your website, and just sends that as a payload. Sounds noisy. So noisy. Now, something really interesting. I don't know if you listened to this show, but we had someone from the analytics company Heap

Starting point is 00:38:16 on the Data Stack Show. And Heap's, one of their big differentiators was AutoTrack. And they stuck with it and actually ended up figuring out how to make it work. But listen to this. This astounded me. It took their engineers, because they, like our model is very different. We send everything, you know, to the warehouse or whatever.

Starting point is 00:38:39 We don't actually store any of the data. We have, you know, sort of standardized schemas or whatever. But Heap is an analytics tool. So not only do collection, but they actually provide like an analytics have, you know, sort of standardized schemas or whatever. But Heap is an analytics tool. So not only do you collection, but they actually provide like an analytics visualization, you know, layer or whatever. But I think the guy, I think the guy said it took their engineering team like five or six years to build a system that had reasonable SAS cogs on AutoTrack. Wow. And then they did an immense amount of work

Starting point is 00:39:09 to reduce the noise. And now it allowed them to do some very interesting things because if you can actually solve those two problems, then you do have an interesting data set to work with. Right. But that was astounding, right? And it was actually, I mean, I seem to remember, I can't remember the exact details of the conversation,

Starting point is 00:39:30 but the founders had to be extremely opinionated both with operators and investors to say, we are going to have really bad cogs until we figure this problem out. And so not only is there noise, but the infrastructure impact and underutilization of what is required under the hood to even process that is significant.

Starting point is 00:39:58 Same with observability. Absolutely. Well, I mean, you're basically talking about a different form of observability, right? When you're zeroing in on user and behavior, that's not what we traditionally think of in observability because we're looking at the application and its actions. Yeah, sure. Presumably, usually users initiate those actions.

Starting point is 00:40:15 Yeah, a lot of timestamp messages. Okay, so let's talk about, we can't not talk about AI when we're talking about petabytes of disparate data. I was just going to say, can you imagine how disappointed everyone's going to be that we've We can't not talk about AI when we're talking about petabytes of disparate data. I was just going to say, can you imagine how disappointed everyone's going to be that we've gotten this far without putting A and I together? It's like a game every week, honestly. We're like, how long can we push this conversation without talking about AI? We do pretty good. Although we did disregard Microsoft and Oracle in favor know, in favor of the, you know, the darlings of the valley.

Starting point is 00:40:47 So we at least checked that box, you know, and we talked about Iceberg. And we talked about Iceberg. You know, so, okay. But legitimate question. I mean, when you think about petabytes of data, petabytes of different types of data in a context of observability, like, of course, you go to, I mean, of course, the default is it can AI solve that problem, right? But it's a machine learning application, right? Where you're looking for, you're looking for anomalies in like a giant, you know, stream of data.

Starting point is 00:41:27 But how do you think about that at Edge Delta? Yeah. So we are using a very hybrid approach of traditional approaches with sort of your standard type of alerts and search and various other things that go on that people would expect, as well as some machine learning driven sort of dynamic behavior. But it just makes alerting a little bit easier because we're re-baselining everything for you. So in that sense, we're letting the model. It is very hard for a constantly changing application to have fixed

Starting point is 00:41:56 alerts that make sense over a long enough period of time, because you get drift. There just is no, there's not a great way to do it. And currently, we're solving it by the fact that SREs hopefully remember what alerts they have. And if they have gotten quiet for too long, they'll go in and check them. Or if they've gotten too noisy, they will fix them and

Starting point is 00:42:15 not just send them straight to spam. We've recently experimented as well with putting some LLM AI on top of our anomaly detection. So that is some very high signal to noise type stuff. I call it almost like the 2 a.m. checklist of if you get paged at 3 in the morning because something has gone wrong and you are like, oh my God, how did I ever keep this monitor this bright? I just want to do a thing and get back to sleep. The AI, we've added a little LLMs in there to just give you,

Starting point is 00:42:49 hey, maybe you want to look at this. Maybe you want to look at this. Yeah, yeah, yeah. It doesn't auto do anything on purpose because you say, sure, there are people that think that's the way forward. Not at 2 a.m. It is. Oh.

Starting point is 00:43:08 If it gets the alert to go away, there are absolutely SREs that would push the button and do that. If there's no rollback button, but there's an AI fix button, they would absolutely do that. I know, that's the reason it's not the solution. So it just makes suggestions for you along the lines of, hey, you might want to look at this, you might want to look at this, based on the anomaly and the information that we could all

Starting point is 00:43:24 correlate across the different information. So that's the direction that we are taking with it. I personally, I am on record as thinking that AI is not going to quote, unquote, fix observability, which I liken to, Hey, we've got a petabyte of data. Let's dump it in there. And it's like, yeah, are you going to train that model? Do you, are you ready to spend all that information? And that's just the cog side.

Starting point is 00:43:46 Even more importantly, if developers are good at one thing, first of all, any developer listening to the pod to the podcast right now is amazing and never makes these errors, but other developers, right? All the other ones. If you've met any other developers, they're really good at creating new ways for software to screw up. And the idea that you will have a dataset that has all of the errors that you could want to track in the future is comical.

Starting point is 00:44:16 Yeah. And I've told the story several times of my favorite database error when I was working at the ETL company was a database error that I got that said you haven't paid us. And I was like, what? And it turns out we were using a Salesforce syncing tool to go from local database writing. You basically circum this force.com because you could write to your local database and they would handle the syncing into Salesforce for you.

Starting point is 00:44:44 But we forgot to pay them. So I was like, I've never seen that error again. Is it useful to have any LLMs trade on that? No. And that analog holds two infinite ways that we can combine bits together. So I'm very skeptical of the idea that AI is coming to fix observability in particular. And similarly, I'm a little bit skeptical of it sort of in the broad too, though that's a bit more of an open question. So the idea just in this one example would be like,

Starting point is 00:45:16 okay, we've got an AI in place. It is going to be able to, never having seen this before, read this error message that says something very generic, like you haven't paid us, to know that there's some vendor out there that you're using to sync from a to b and to prompt somebody like hey you need to go pay them like that yeah that makes sense that that wouldn't work insane context requirements yeah exactly yeah yeah well i think of it i really think one of the big challenges we have is it's like if we're not directly at the frontier of research we're getting a lot of second degree assessments of what ai can do and what you can't do yeah and so i think what we really need even for people in our position and i can't speak to how familiar you guys are with it as well i'm not making any disparagements it's just we need better mental models of what it's

Starting point is 00:46:01 like and so the one that I give to everybody is this. An LLM, our current understanding of LLMs as of time of recording, is it's a bit like a guy in a bar who has overheard 10,000 hours of conversations about motorcycles. So he's never seen one, he's never touched one, he's never ridden one,

Starting point is 00:46:21 but if you ask him any question about a motorcycle, he probably knows the answer, but occasionally he might compliment how your torque smells. And so you've got to... He doesn't know the difference. It's not his fault. And the way they work is not, at least based on our current understandings, again, at time of recording,

Starting point is 00:46:42 is they don't reason, but they're associative. The reason that chain of thought works is that it snaps the words into a reason like looking object. And so when a lot of AI products and startups pitch themselves on the idea that AI will be able to think or reason or do the decision-making part, I'm pretty skeptical. But if it can do some of the things that computers are very good at, like computers never get tired. So I think they're very good at brainstorming, pulling together different associative ideas. You know, a lot of baseline stuff that can help clear the blank page problem. I think, A, I'll be great for like a hundred different paper cuts in normal everyday life, just like data was,

Starting point is 00:47:27 you know, 20 years ago or something. Like data is going to change everything. It hasn't knocked out the economy. It's just made all of the little things that we do a little bit different. And I think we'll see that too. And of the 10,000 hours of the guy listening at the bar, he was drunk for several hundred of them, but we don't know which ones, right?

Starting point is 00:47:46 Yeah. He's not so sure about... He's got some hazy gaps, right? Or he wasn't paying attention to who was drunk and who wasn't. Well, yeah. He keyed into every conversation about motorcycles, including the one where he's like,

Starting point is 00:48:02 guys, I just had the most amazing day. And he's like, okay, I just had the most amazing day. And it's like, okay, that guy sounded like he had fun. I'm going to remember that. Yes. Training on Reddit data is the analogy there. It's too good. All right. Well, we're at the buzzer. I think one of my big takeaways is that being an SRE is like having children in that, you know, that really bad gut feeling that you get when you're like, our house is way too quiet. Something's wrong.

Starting point is 00:48:32 This is an excellent analogy. Unless there is a type of, maybe a type of pet owner that has a very defined cage and fence where they're okay with silence because they know exactly where everything is. You can handle those SREs out there, but not too many. All right. Well, Dave, thanks so much for joining us on the show.

Starting point is 00:48:53 It was an absolute blast. And we'd love to have you back sometime soon. Absolutely. We'll do it. Take care, guys. The Data Stack Show is brought to you by Rudderstack, the warehouse-native customer data platform. Rudderstack is purpose-built to help data teams

Starting point is 00:49:09 turn customer data into competitive advantage. Learn more at ruddersack.com.

Your Ad Here

The Data Stack Show - 196: Why Big Query Was a Big Deal, Observability AI, and How AI is Like a Guy at the Bar, Featuring David Wynn of Edge Delta

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.