The Data Stack Show - Data Council Week (Ep 5): A Primer on Spatial Data With Gabriel Hidalgo of Carto

Episode Date: April 29, 2022

Highlights from this week’s conversation include:How Gabriel got into data (1:54)What Carto is (5:28)Location data vs spatial data (6:37)Time data vs space data (7:50)System supports for spatial dat...a (9:50)Explaining “spatial functions” (14:19)Who uses Carto and why (15:52)What’s coming for Carto (19:15)What Gabriel does at Carto (22:22)The coolest things Carto’s done (23:52)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. First of all, Kostas and Brooks, it's nice to have you back. I was flying blind there for a little bit on the last episode, but the team is back together. And today we are going to talk with Gabriel from Carto, who's here at Data Council on site. And Kostas, I'm just excited. I don't know if I have an individual
Starting point is 00:00:46 burning question other than I just want to learn about the world of location data. This is the first guest we've had on the podcast from a company that's dealing with data of this sort. And so I just have tons of questions. How is this world different? How is this world the same? And so I'm just excited to learn. How about you? Yeah, I have some technical questions. Traditionally, I mean, the algorithms and like the way that you approach and you query geographical or spatial data is a little bit different. And usual database systems need like, let's say like some extensions on the SQL language itself and also like on how they represent and how they store the data to do that. So I want to understand a little bit better how Carto complements the data warehouse solutions
Starting point is 00:01:31 and their capabilities for querying this data out there and how they work together. So that's definitely something that I want to explore in this conversation. All right. Well, let's dig in and talk with Gabriel. Let's do it. Gabriel, welcome to the Data Stack Show. We are so excited to talk with you about Carto and learn more about you. So how did you get into data originally? So originally, actually, I was abroad in Peace Corps and I was working on just collecting
Starting point is 00:02:03 any data I could in the place I was stationed. And this meant like looking at cars on the side of a street and just through this, trying to figure out just collecting a spreadsheet and then getting into more in the open source community because that's what it's for. It's to kind of get people that don't have all these tools and all these web apps to like just get the bare minimum to just get what they need to do. And from there, that's when I started to like just get the bare minimum to just get what they need to do. And from there, that's when I started to learn to code. And I started to really try to get more into this, like understanding how to get data, how to make it better. What can I build with it to like empower people or make things better? Yeah, very cool. And just out of curiosity,
Starting point is 00:02:38 what were you looking at cars for, accounting cars for? What was the use case in the Peace Corps? Yeah. So before this, I was actually an urban planner, and I was stationed as an urban planner in Albania. And there, we were trying to figure out what roads need to improve. And to do that, you need data to be like, how many cars are navigating through this road, and to get more like an analysis on this. I think it was funny because everyone's like,
Starting point is 00:03:02 everyone knows which roads get most traveled. Like, it's pretty, we're a small town, everyone knows. But then I'm just like, let's just check. Why don't we just check to make sure? And it was actually really interesting. They're like, oh, wow, we didn't know this road gets used this much. Like they started to get more like understanding and like, oh, like we should start keeping track of this. And yeah, like I helped them out to get them more into that kind of thinking.
Starting point is 00:03:24 Very cool. Okay. And so you got more into data after that. And then did you start working in data? Like, did you go to work for a SaaS company? Or like, how did you sort of? I think funny. So I do have like a very variations of career, just because I think for me, it's more about what I want to learn and more what I want to do. So after I came back to New York, the Uber crisis was happening in New York, where a lot of taxi drivers were losing a lot of fares. And so I saw that problem and I'm like, oh, that's super interesting. Like what's going on here? And then I put those data analytic tools to do that. And then I got hired at the Taxi Limousine Commission as a policy analyst there. And I helped them to build up their data team and to get more of understanding of that.
Starting point is 00:04:08 So I guess for me, it's less on the like what I'm like, it's more about what's happening. And then I really try to help and try to solve that problem in some way or part of it. Yeah, fascinating. I love that. It's almost like a, this may be the wrong term, but almost like an activist approach to like, you know, solving problems with data, which is pretty cool. Yeah. But just to bring it back to originally how I got to Ducato is I think, so I was working for an app mobility company called CityMapper. And there I became more of like project manager, more like for all these analysts.
Starting point is 00:04:42 And I realized I was missing on the technical aspect of things. I realized also that I want to work more with city orientated instead of just mobility. I was becoming a mobility specialist. And so I wanted to expand that and get more technical at the same time. So that's why I think I always knew about Cardo. Like in New York City,
Starting point is 00:05:00 Cardo is a very known product and a lot of people know about it. I've been to their events. And so I've always had an interest in Carto. And actually, I found out that their hiring was through, I was actually looking for something for CityMember for my analyst. And I'm like, oh, why don't we just try out Carto? And I'm like, oh, they're hiring too. Why don't I just try it out? And here I am. Very cool. Okay. So for our listeners who don't know what Carto is, what is it? So I would say at our core, we try to help you handle spatial data. And I want to keep it this broad is because we provide a UI that can quickly create a map.
Starting point is 00:05:38 Like with one click button, upload data, create a map, style it in real time. But other side of things, we can actually, we create our own spatial functions that you are able to like do advanced analytics using spatial data as well. And on top of that, we even provide a backend that you can connect to your cloud warehouses because we are a cloud warehouse native that you can connect it and then build your own apps on top of it, like React, Angular. So I think it's less about these products. It's more about we deal with
Starting point is 00:06:05 spatial data and we try to get the most and what do you need from it? And we try to build those tools around that goal. Fascinating. Okay. So I'll hand it to Kostas. One more question. When we were just chatting before we hit record, I said location data and, you know, I think about maps, right? And you just kind of think about location data and you corrected me which I found very helpful and you said well it's like location data sure but it's really spatial data help us understand what the difference between location data and spatial data is I think location data is closely linked with data itself like from going from a to b but when you look like that is a type of spatial data. Like coordinates.
Starting point is 00:06:48 Exactly. I think people link with that. But I think spatial data is more like contours as well. Like you have polygons, you have radiuses, and within these radiuses, like how they interact with each other as well, which is also another field of data analytics as well. So for example, if you have like a polygon of a country and you want to see if there's a volcano going and you want to see who was affected by this and how many people are affected by this and you get an influence area around, you know, we see the lava going in because
Starting point is 00:07:19 it's actually an example we've done before. It's actually a La Palma example. And then you're able to see what buildings are the most affected or could possibly be affected by this. Interesting. So it's less, you know, like when you look at that buffer, it's not a location, it's a space and we're answering. Right.
Starting point is 00:07:34 Right. So it's, it's adding dimension as opposed to sort of just like lat launch, like to lat launch. Right. Exactly. Which is, you know, is basically a line. Fascinating. Okay.
Starting point is 00:07:43 Costas, I could keep going, but. Yeah. My question, right? Exactly. Which is, you know, is basically a line. Fascinating. Okay. Costas, I could keep going, but yeah, my question, my first question is about, we see, we have keep hearing what couple of years in a lot of public time series, right? So we've got like time series data and we have time series databases. So we have specialized technology actually like the world with this type of, this is the time that I'm interested in. Why don't also here the same for spatial or there are technology out there that we just don't know about it.
Starting point is 00:08:15 But why do you think that there's so much more, let's say, in the rest of the time instead of like the space? Is it because that it's harder to work with labs than use cases are not there yet? Like what's your feeling about it? I think for time, I would say it's not really that it's hard. It's just, there's more, I want more variants. So people, I think there's still a discussion timestamp versus daytime. And these conversions are like, and you lose a lot of data which like
Starting point is 00:08:45 with like how how do i say it like daylight savings like all these factors have been taken into account and all these data warehouses are trying to figure this out so on our side of things it's more we worry about the spatial side of this and i just to say like i i wouldn't completely agree i would disagree a bit that it's not connected because I think in mobility data, that is essential. Like time and space is like, that's like the bloodline of that, of those data sets. So I think it's more on the, I think what most people are focusing at the moment. And I think you are right. Time series is not really the thing because I think usually it deals with real-time data,
Starting point is 00:09:25 right? It's like you get a point and you want to know this, is it happening this day versus it happens instantly. So with these types of things, it's less data analytics, I want to say. It's more about like an app that you're going to be using a real time. And that's why I don't think it's gone that focused because it's more app mobility, more function related versus like you're at a company, like that's, you don't need that much of like the time recording for that. Yeah. Yeah.
Starting point is 00:09:49 A hundred percent. Like, how do you feel about, let's say the support, the modern data management systems have for special data, like the data warehouse, for example, like, do you see that there are things that are missing there in terms of capabilities? Is there work to be done? How do you feel about that? Yeah, so I think, and we, you know, we have partners with all of them. We have great partners there at each of these cloud data warehouses, you know, Snowflake, Redshift, BigQuery, Databricks.
Starting point is 00:10:20 And they're doing a great job with like the scaling. Like this is a problem that everyone has and they're doing a great job with like the scaling. Like this is a problem that everyone has and they're doing a great job about it. And I think in that side of things, like we don't expect them to handle the geom part of this. Like that's not their focus. And I think this is where Carto fits in into this, where we're able to like bring that spatial function
Starting point is 00:10:38 to their data warehouses. Like we actually have what we call the Carto analytic toolbox, where you can actually upload all these spatial functions, all these models into that warehouse, and you can actually run these functions within them. So we're that layer that they're missing for that. And so again, we want them to keep using their products, but we want to be the, we want to
Starting point is 00:11:00 say the spatial go-to for them to rely on. How does this work? Like let's say,'m in Israel for the Snowflake rights. And at some point I want to add special capabilities to my instance. Do I like to copy the data to your system and do the analytics and then go back? Or like how is the product work? Yeah, yeah. So there's two ways to do it. So I'll start with the Cardo Analyzer Toolbox first.
Starting point is 00:11:28 That one is once you're an account member, you actually contact us. And then we actually send you something to install within Snowflake. And they'll have it installed on there. And we have a whole, our documentation shows how to install that really quickly. All our customers have really been really great about doing it. And they've been really vocal. They've been wanting to get those functions as soon as possible. Whenever we release them, they're like, send it over to us as soon as possible.
Starting point is 00:11:50 Like this, we really depend on this. David PĂ©rez- Early apps. Henry Suryawirawan, Yeah. Cause it just makes their lives so much easier. Cause we've had people like, I think Snowflake is example of this, that what you've said before, like they've installed it in Snowflake, then they bring it out of it, do spatial functions there then they bring it out of it, do spatial functions there and then bring it back in. Or we had a case where actually a user just had like a notebook of like different SQL functions to do like very simple spatial.
Starting point is 00:12:16 And we cut that down to just like very simple functions. Like I think buffer is a good example of this, where if anyone that's on a buffer, the distance is like insane to do for like kilometers or specific meters. And so for us, we just made it into a simple like kilometer, you can use meter and just put the number and that's it, like how it should work. But you know, SQL isn't that straightforward. On the other side of things to also connect to Snowflake, it's as easy as making a connection. So we're actually interacting with these data warehouses. And whenever you upload a data, we're actually putting a SQL query within there. So you can actually, it's like running SQL within your own data warehouse, but within Cardo. And so again,
Starting point is 00:12:58 we say like, there is no limitation because it's your data warehouse. We're not dealing with it, but we're empowering it and you can use it under our platform that helps you enable it more. Like you're able to like import data straight into your cloud data warehouse. Like that's like a way that we like are trying to be as, we're trying to improve these services as much as we can to our users.
Starting point is 00:13:18 Wow, that's awesome. And do you see other differences between different data warehouses? Is there like things that you can do on small gridake that you cannot do like in BigQuery, for example, or Gratesy? Like how are the different data workhouses feel like? And this is why we are partners with them, that we're trying to make that there is no difference between them, right?
Starting point is 00:13:41 Like you can do whatever you do in Snowflake, you can do in Databricks. So, and that's what we partner with them to make sure that, you know, of course, they have different systems, how they work, but that's our job to figure that out and make it as seamless as possible. So when you go into your Encarto, you just see your connections and you can use the same functions throughout. So it's that seamless. Question for me, and I think some of our listeners, when you say spatial functions, could you give a specific example of that? Like, I think I know what that means,
Starting point is 00:14:13 but like, you know, for those of us who aren't as familiar with sort of spatial data, like what, give an example of a spatial function. Like, I think the buffer one is a good example of this, or an intersection. So if you have a bunch of points on a map and then you have a polygon, then you want to be able to like get whatever points are within that polygon or actually remove the points, right? Like you want to be able to like quickly get these points that are in certain polygons or not, or by certain values, or you actually want to run like a model regression where you kind of
Starting point is 00:14:45 are figuring out like, hey, there's a point and using math, which we can get, we can't not really get into right now because that gets really dense, but you're able to see, oh, it gets influenced by the polygons around it. And in what categories there's like a patterns to this. And so then we create geons based on this, on the data set you give it. So it can get really deep to that send to as getting a point outside or inside a polygon. Got it.
Starting point is 00:15:12 These are the functions. Yeah, super interesting. Super interesting. Okay. And how does Cardo execute those functions? Those are SQL in the warehouse. Yes. And they also call upon our own data warehouse.
Starting point is 00:15:26 Because again, like we have our own, we are trying to be as cloud native as possible. We use the same thing. So you call upon our own cloud data warehouses where these functions exist. So you're calling to us too. Oh, sure. Yeah, yeah.
Starting point is 00:15:38 Makes total sense. Sorry, Costas, I had to interrupt there for some clarification. And what's the user like? Who is like the people that, and they're all obviously like using Cartos today. So I would say like they're into really extreme cases. So I want to say in the one user,
Starting point is 00:15:56 it's they just want to create a map. They want to just get, they have their data set and they just want to quickly create a map that enables them to just see their points on the map and to share with their users to that sense. It's literally, we have a one-click button, create map, and then style it, and then be able to do these advanced analytics,
Starting point is 00:16:13 but they just click buttons and do a UI. On the other side of things, we have very, very technical users that have servers, they understand, they build their own front-end apps, and in that case, we actually help them or we empower them to use Cardo in the way that best fits them. And so I think it's similar how Cardo is actually shaped. We actually have a group that is very highly more about what does a customer want. And on the other side of things, we have very highly technical people that actually help build that out. So I think our customer base is kind of in
Starting point is 00:16:44 a similar sense where we're very highly technically. And then also on the other side, like try to make it as easy as possible. Yeah. Makes sense. And like, can you give us like a couple of like typical use cases that you see out there, like, I mean, I, I, I guess, let's say in a way that someone needs to build a map on some foreign drive, but why?
Starting point is 00:17:06 What are like some, do you see that like in always like government more is like, what are some examples of like incorporating Kafka inside like either an application or in analytics use case? David PĂ©rez- Yeah. So for example, in, for internal uses in a company, you want to see like what stores are, for example, performing the best. And so then you're able to like layer data inside of it. Maybe these stores are too close together. You find out like you're able to like make a buffer of like, and use some routing information to be like, oh, like people walking around this area.
Starting point is 00:17:42 This is the general sense of from this store to here. This is the, they can walk 10 minutes walking. So then you get an idea of like, oh, okay. Like only these people around here care about the store. And is that how you would define the buffer? Like as a term, is it like there's a 10 minute walking buffer around like the point? Exactly. Okay, got it.
Starting point is 00:18:02 Yeah, yeah. So in that sense, like if they interlock, you know, they overlap each other. These 10 minute walk, you're like, oh, maybe these stores are too far apart from each other. As an example of like a use case in like more of the side of like just creating a map or for internal use. On the other side of things, and these are more technical people is that they have this, they have their internal systems, like they have these giant servers, they have all these data in cloud warehouses that are like trillions of rows, then they just need someone to be able to like create or make apps for them really easily. So they just need something that, you know, they, and they also have security as well. So we can like have a self host, we call it, where we have a tenant and they can have their information in their own servers that they want, but they, we just have Carto as kind
Starting point is 00:18:50 of like a front end for it. And then we do it. So in that way, it's kind of like another app in their system that they can use internally. And that's another use case they can do. Cool. Vince, okay, last question from me. Dan, I don't believe you. Something like exciting that's coming, you know, future.
Starting point is 00:19:12 So actually, yeah, no, actually, I think I came at a very, very lucky time in Cardo, to be honest with you. Like this wasn't planned and I just like Cardo overall, but it's this, so we're actually transferring over to a new platform. This whole cloud nativity is actually very new. Like we've been trying to build this platform for a while and it's completely new, new accounts, new everything. We're transferring people over and it's, we're building out a new platform and that's super exciting. Like our current platform is great, but we see the trends are going on and people going to data warehouses. We see this people wanting more control over their data.
Starting point is 00:19:52 And so we want to help enable that. And so with a new platform, you get a bunch of new, we're creating features that we've always wanted. Like I think one of them we call a lasso tool that you're, you know, being the old Cardo, you're able to zoom in to be able to see them, but people wanted to be able to like select a certain area, only this area instead of having to zoom in and just figuring it out from there. And now we can just do that. Like there's so much new features coming and where it's not just like, oh, we're just waiting to make it catch up to it.
Starting point is 00:20:25 It's like, no, we're not. We're like, no, this platform is going to be a lot better than the other one. And so that's super exciting to see that change and like why we prioritize and all the new features that come out that haven't existed before. Oh, that's awesome. Awesome. So question, do you have customers who basically white label maps through Cardo and serve those to their users? So, yes, yes, we do. And there's like, I guess white labeling would be like a way I need to be explained a little more. Like it's more about like we have password protections for like a way I need to be explained a little more like it's more about like we have password
Starting point is 00:21:05 protections for like a certain map so when you create a map you can make it put completely publicly available or you can put it behind a password so if you want someone to see it or you can like actually take the map and make it into an iframe to put it on the website oh interesting yeah so that's kind of a way or Or again, like I think in more like people can have their own Cardo and like they can just have it internally and they can actually decide how to share this map. Yeah, yeah. Super. Yeah, because it sounds like it's less of a like, OK, I'm going to start a consumer mobile app and like there's a mapping component to it. So I need like, you know, sort of this interactive map.
Starting point is 00:21:43 Right. component to it. So I need like, you know, sort of this interactive map, right? It sounds much more geared towards using like specific data to create specific maps or like solve really specific questions related to spatial data, like inside of companies. Yeah, exactly. In some cases. Yeah. Yeah. Super interesting. And how long has the company been around? I'm not really sure about that, to be honest with you. I would say like 10, 10 years.
Starting point is 00:22:09 Oh, wow. Okay, cool. And we didn't even, I dropped the ball. I didn't ask you this at the beginning, but what do you do at Cordova? Sorry, Gary. No problem, no problem. So I'm a support engineer at Cordova.
Starting point is 00:22:24 This has a very unique, because I know when people hear support engineer, it's like a very broad term, like solutions engineer. So it really depends on the company. I think the idea of support within Cardo is actually support within the company. Like we are a full stack team. We work on front end, back end, servers,
Starting point is 00:22:41 everything possible to make the app run and improve it. But also we support users, like very technical users to not so technical users to just get the most out of the platform. And we actually have the same system internally, like within our team, people come with us with questions, the same system we use for our customers, because we want to like have this whole system to make sure that we're providing the same quality we're getting and the customers are getting. Oh, interesting. Okay. So you treat sort of internal customers the same way that you treat external customers. Exactly. Fascinating. That is super interesting. And a very high technical team, which is also
Starting point is 00:23:20 very interesting. Oh, sure. I'm sure that makes your job super interesting. Okay. I'd love to know, especially because, you know, you really have approached a lot of things, it sounds like, in your career from like observing like an interesting like problem or an interesting opportunity. What are maybe one or two of the coolest things you've seen your customers do with Cardo or like the coolest problems that they've solved with Cardo? Yeah, I think it's from, I think one of the coolest things for me is more utility. Like I've seen people make a huge impact on people's lives. Like I've seen, like actually the simplest maps actually are the most interesting to me because you know, the data coming in and the impact it has, like, I think, like elections, like voting. So I think those are really interesting because they see more the impact of the data set itself.
Starting point is 00:24:15 On the technical side, actually, funny enough, it's how people use the product. Like, you see some people that are using the ui in such an interesting way like we had one where you zoom in and you're able to see like the congestion number within the road and you zoom out they would have a different one we're like why would you want this and then they're like explaining to us that like hey it's because at due zoom level you see this part of the area and on this user we see this part so we want to make that distinction this how we're like and we're like oh that's like it wasn't well if it if you're using it this way then it's meant for that but I just we've never thought of using it that way
Starting point is 00:24:54 before so those are like the like we to be honest with you I think most of the really interesting like features we think about actually through customry because they just want the solution. And then they explain it to us and we're like, oh, okay, this is why this makes sense. And we should actually add that. Yeah, totally. Super interesting. Well, Gabriel, this has been such an interesting show.
Starting point is 00:25:19 I learned so much about spatial data, not location data. Well, some about location data that I didn't know before. So thanks for taking some time out to chat with us and best of luck on your journey as you help people solve spatial data problems. No, thank you so much for the conversation.
Starting point is 00:25:34 It was great. Well, obviously I had a lot to learn about spatial data and the difference from location data, which was really educational for me. It makes so much sense now that I talk about it, but I did kind of make a generalization there that for the product Carto, actually, it's a really important distinction. So I love learning those things. I think this is my big takeaway. Gabriel really brought a human element to thinking about this and his story about working in the Peace Corps and, you know, standing on a sidewalk, you know, with a
Starting point is 00:26:06 clicker and collecting traffic data for, I think it was a city in Albania. It was just in some ways like really heartwarming, right? I mean, it's so easy for us to get caught up in the technology. And I really loved hearing about him making a real impact in this city with, you know, really sort of analog data collection and data processing, but it really made a big difference in the city, which is really fun. And I think you can tell that the way that he approaches his work at Carto is influenced by that very human, like element of it, you know, sort of visceral experience capturing and using spatial data yeah yeah 100 i think it's also like so so refreshing to meet and connect with people
Starting point is 00:26:53 that they are so excited about like the stuff that they are doing and have like this very genuine way of like connecting with their work and you know know, like, feeling that they are doing things that are important, you know, like they deliver value. Like, that's, of course, like to do that, you have to add this human dimension to the technology itself, right? So, yeah, I really enjoyed, like, the conversation that we've had.
Starting point is 00:27:18 I think it's important to have more conversations with companies that they are working in less traditional types of data, let's say. As we said before, I think we tend to forget that data is not just the tabular data that we have and the very well-typed data that the database holds. And I think these data, outside of the tabular data, we will see more and more of a need to work with these new different types of data. And there are companies out there and I think we should reach
Starting point is 00:27:52 out to them and make sure we get them on the show to see what they're doing and why they're doing it. I totally agree. And I think Brooks is already on it. All right. Well, thank you for joining us. This has been an awesome week. This is our last episode from the week being on site at Data Council. And we have more. We'll be back to your normally scheduled programming on the show next week. Thanks for joining subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rutterstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.