The Data Stack Show - 80: Is Reverse-ETL Just Another Data Pipeline? With Census, Hightouch, & Workato

Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, one platform for all your customer data pipelines. Learn more at rudderstack.com. And don't forget, we're hiring for all your customer data pipelines. Learn more at rudderstack.com. And don't forget, we're hiring for all sorts of roles. Welcome to the Data Stack Show. This episode you're about to hear is actually, it was originally recorded as a live stream, and we collected

Starting point is 00:00:38 some of the top minds working in the reverse ETL space. And we just wanted to pick their brains about this technology that, you know, has probably been built internally for a long time by companies, but is now being turned into SaaS and is doing some interesting things. Costas, I'm really interested to ask this panel about some of the technical challenges of building these things at scale. A lot of times, you know, I think if these were internal builds, you know, maybe like a one-to-one connection, you're sort of dealing with you know, a pretty simple pipeline, but doing this at scale across integrations is hard. So I want to hear about what technical challenges they're dealing with.

Starting point is 00:01:17 How about you? Yeah. I want to ask them, when are we doing finally get like a proper name for this technology? Yeah. This reverse ETL thing needs to stop. Yes. It's still wrong.

Starting point is 00:01:30 So, yeah, I'll try to see what they're thinking about that and what's the timeline to get a better name. Great. Let's dig in. That's a marketing problem, so we're going into pretty uncharted territory. But I love it. All right, let's dig in. Let's do it. Welcome to the second Data Stack Show live stream. This is super fun. We did this once before,

Starting point is 00:01:59 and we like to collect some of the best minds in the industry around certain topics and just pick everyone's brains. And the topic for this live stream is reverse ETL, which is kind of a new term in the industry, but actually something that people have been doing for a while, which we'll talk about. And we have some people who I'm just so excited to have on the show, names that I've followed for a long time personally. I know Costas has as well. So let's just do some quick intros. Tejas, do you want to start off and give a quick intro? Yeah, sure. So hey, everyone, I'm Tejas, one of the founders of Hightech. We're one of the players in the reverse ETL space and data activation, basically helping

Starting point is 00:02:34 companies take data from the data warehouse and use it across all the operational processes and SaaS processes in their business. Before founding Hightech, I was actually an early engineer in segment. So my experience sort of in the data vendor space dates back to like seven, eight years ago before terms like CDP and stuff like that existed and kind of saw the rise of cloud data warehouses there and realized that there was an opportunity to bridge some of the challenges we were solving at Segment and what was happening in the data warehousing space with companies building a source of truth in the warehouse. So super excited to be on the show today. Obviously follow all the companies

Starting point is 00:03:06 in here super closely and excited to have a live coffee chat. Great. It's going to be great. All right, Boris, you're next in the window on Zoom. So take it away. Cool. Hey, I'm Boris. I'm the founder of the company called Census. We started building what we now call reverse ETL back in 2018 when there was no name for this. And we've always wanted to help companies get the most out of their data. And a lot of it tends to be locked away in analytics and warehouses, which is what we were trying to solve. So get that in the hands of salespeople, marketing people, support people, finance people, all those kinds of things. And data pipelines are the way to do that.

Starting point is 00:03:44 So yeah, before that, I those kinds of things. And data pipelines are the way to do that. So yeah, before that I've always been a tool builder. I used to work at Microsoft and before that, between census and Microsoft, I started another company before this, that was kind of tangentially related, called Meldium. Very cool. All right. Triti. Hi everyone. I go Triti by name, but I lead the product lead group team at Workato. If you're unfamiliar with Workato, it's an enterprise automation platform. We have been in the business for over eight years. It's an enterprise automation platform.

Starting point is 00:04:16 The customers, we have over 7,000 customers that we use just for automating various business processes by connecting your cloud on-prem stack. But a very interesting pattern in that happens with the reverse ETL. As a matter of fact, we released a warp automation index report last year. And in addition to all the traditional processes like audit to cache, employee onboarding, procure to pay, report to the boat, lead management and others, reverse ETL was trending out. It was in the top 10.

Starting point is 00:04:52 And reverse ETL means very many things to very many people. I was very excited to join this forum and with the previous process and Boris to learn more and also learn more from the questions that we get from the audience. Thanks for adding me, Eric. Yeah, of course. So I'll start it off with a question. Triti, you brought this up when we were prepping for the episode, but I'd love to hear from each of you. What has been driving the trend behind what we call reverse ETL. And maybe we'll get to whether reverse ETL is the proper term for it, you know, because, you know, we've had some conversations

Starting point is 00:05:30 with Boris about whether that encompasses, you know, sort of the spectrum of problems that this technology solves. But, you know, we sort of have like the event stream, you know, technology solved the behavioral data issue and sort of syndicated that to the stack. ETL allowed you to collect all the data from all the disparate parts of the stack. And really, a lot of those drove one-off triggers or the main use case was BI. We're just trying to understand how users are behaving and how the business is performing. And so why don't we start off by like, what are you seeing? Like what's pulling reverse ETL technology from your product teams in terms of use cases on the ground?

Starting point is 00:06:14 So why don't we just, let's just go in the order that we did intros with. So Tejas, what are you seeing? Cool. Yeah. So I would say across use cases, honestly, it's pretty exciting because we're seeing use cases across pretty much all business teams in an organization and far more use cases in terms of breadth than we imagined when we actually founded the product. And it turns out sales and marketing and go-to-market is still probably about like 70% of our use cases in the market. But we're also serving finance teams who need rich data in their ERP systems to close out the books faster and not pass around CSVs across the organization or product teams that need information from your analytics stack to be able to power certain personalized customer experiences inside of their applications. But overall, I would say the most exciting part about reverse CTL and data

Starting point is 00:07:09 activation as a whole, when I think about the category, is that we're oftentimes not just replacing scripts written by engineers or automation built by engineers, but we're actually unlocking brand new business use cases, brand new value, and brand new growth and revenue opportunities for companies using the wealth of data that they had already in their data warehouse. And that's really what I think has caught the attention of the market and excited companies to jump right in and see what can they do with the resources and data they already have to drive growth. Great.

Starting point is 00:07:40 Boris? Yeah, I think the breadth of scenarios has always been the kind of most exciting thing here. You know, when we envision the platform, we kind of thought about it as something very horizontal. You know, I tend to think about the fact that the way people wire data together shouldn't be piecemeal. And they should think about where can they centralize as much data as possible and get a source of truth and then federate that to as many kind of ends of the organization as possible. And to me, that's the story of, that's actually the goal of SaaS going back 20 years, which is to empower every individual in a company. And so whether that's

Starting point is 00:08:21 finance or sales, you want the right data. You want data you can trust. And you want that in the operational tool where you do your work, right? Rather than having to open up five tabs. And so this idea, to what I've seen over the last few years working on this is that analytics, by virtue of a lot of other kind of trends and behaviors on the data team, has become host to the best data in the company, right? The most complete data, the most trustworthy data. It's the data that, I mean, ultimately you're going to use to report to

Starting point is 00:08:59 Wall Street to some degree, right? And so like has the most level of scrutiny probably exactly exactly and so the the the ability to operationalize that data right to take that data and make it kind of available to every part of the company has been super exciting and continues to grow and so funny enough we didn't start with kind of like marketing bent back in 2018 we actually started with product like growth and just kind of thinking about software. Yeah, when you have software as your core kind of asset, the way you take it to the market is just different, right? And I don't know, it was personal frustration back then about salespeople not knowing what users are doing in the product.

Starting point is 00:09:40 And I think, funny enough, I think Segment had done a great job of connecting marketers to the engineering side, but sales was left behind. So our early scenarios were all on the sales side. And then that has since expanded. That's so interesting. Yeah, and that has since expanded to literally, I don't know, you can't even, I don't know if I could summarize it in any kind of set, right? It's like from support to finance, product to marketing, it's like everyone kind of wants to depend on this data. Interesting. And data organizations want to get more out of the

Starting point is 00:10:10 asset that they've invested in, right? And so that to me is the exciting story. I'm a tool builder, right? And so you're trying to make someone else a more amazing version of themselves. And data teams have a lot to offer. And it was locked away in charts. And, you know, the idea was like, let's get this into operational tools. Very cool. All right, Triti. I mean, like what Tejas and Boris summarized

Starting point is 00:10:36 catches a lot. I'll just say, maybe add to that in a different way. It's essentially the trends we see like over the last several decades, like ETL had been a way to collect data from various sources

Starting point is 00:10:50 like business application, business systems, and move it into a single repository to open boards and create a source of trim that you can rely on, right? And there's always been, like if you ask any company,

Starting point is 00:11:04 any individual, right? How do you make decisions? So we are data driven and the nature of data driven or the way to solve data being data driven is, was always, has always been around to a great degree still is like put a tool like Tadnall or Glyperin, a copy of copy of this phenomenal repository of data, and run visualizations-nodes repos. And that makes it data-driven, and nothing could be further from the truth. Just looking at the data, everyone has their own interpretation of what that is. And what's changing now is people want access to that data without having to go to, you know, from the tool that other already used, whether it be Salesforce, Marketo, whether

Starting point is 00:11:50 it be Pender, for like product analytics, and how to expand on the file. And so one is that people want access to insights from where they are working rather than having to read them and go to another tool to download those reports. So that's one. The second trend that we see is access to information more in real time, rather than a media report or, you know, digital reports and such, as things happen when a customer turns towards changes, people want to take action, like the CSM wants to reach out to them and say, Hey, what's happening? You know, if there's a drop in activity, the AEs want to reach out, you know, the GTM teams want to reach out.

Starting point is 00:12:30 If there's a change in upsell of process goals, you want to trigger off-campus. So those kind of automations being in real time, mainly get more event-driven. And that's driving some of these patterns around how you become truly data driven rather than just looking at visualization right so those are those are some and like you know and if you see apply those trends like one business function would not want to act on these things in real time it's not just for you know the gtm teams it's also finance and also the other also the third most important, if not the most, maybe one of the other important ones, is the innovation of a data warehouse or a data lake should be at the same level as any other business application. It's no longer the black box.

Starting point is 00:13:17 Oh, sure. Yeah. That you need to put the prism on top of it. You can't like what it looks like. It's been, and it's been elevated to another business application as important or some clarity more important than a CRM, right? Uh, so that is the other trend. So how do you make that data accessible in real time across all business functions to make them truly data driven than rely on like what has been traditionally

Starting point is 00:13:43 business intelligence and that's what's driving these trends, uh, from what has been traditionally business intelligence. And that's what's driving these trends from what we see with our customers. Yeah. That's so interesting. I mean, data-driven is such a loaded term, right? It's almost become hollow because it's used so much in marketing terminology. For decades now, right? For decades. It probably goes back to the information superhighway. Right. you know, marketing terminology and for decades now, right? For decades, right?

Starting point is 00:14:05 It probably goes back to the information superhighway. Right. And I like, I mean, maybe this isn't okay, but I'm going to take a little bit of a dig at like the big consultancies, right? Because it's like digital transformation, you know, it's like, man, the billions of dollars that people have made, like just trying to connect some pipes to like help companies become more data-driven. Yeah. But digital transformation is such a catch-all right it is a great job and data driven is just as much of a counter that's true i suppose you're right but digital just like the new digital even bigger like that

Starting point is 00:14:34 one feels even more all-encompassing because it's like it it means computers right it just means like with computers like i think that that that yeah i think that could be sheets of paper. So I guess it could be even bigger, I suppose. No, yeah, I agree. I think maybe, I mean, I think at least one big underpinning of digital transformation is sort of the move from on-prem to cloud,

Starting point is 00:14:56 you know, which is certainly non-trivial, especially in the enterprise. But actually on that note, what I'd love to do here is I'd love to get a little bit technical. So, you know, I remember when, you know, there were companies sending data out of Redshift into like SaaS applications, right? Like a good while ago, the idea of sort of getting data out of a warehouse and into some sort of SaaS application isn't new, right?

Starting point is 00:15:26 And I think we all would agree that like... I mean, data integration has been going on. Exactly. Data integration has been a thing people have been doing for decades. Sure. Yeah, yeah. So it's not like reverse ETL is like someone invented a completely novel way of sending data from point A to point B, right?

Starting point is 00:15:44 Like it's been happening. But it's painful, right? And so like we're building SaaS around that, which is super exciting, but there are still a lot of companies struggling with the pain of like trying to get the data out of the warehouse and into SaaS applications. And what I'd love to know is, because I think a lot of our listeners are, you know, either data engineers or on data teams who have experienced the pain of trying to build that themselves, experienced the want of not having the budget or the bandwidth to build that themselves, or grew up in an age where the SaaS just wasn't available to make that easy to them, right? And so that's sort of just painful, right? We're going to deal with it. And downstream teams are going to be annoyed. But if anyone's built anything like that, generally, I think it would be ad hoc inside of a company.

Starting point is 00:16:30 So you sort of have a bespoke pipeline, probably one-to-one or one-to-a-few. But you're building really robust pipelines that are taking tables or data in the warehouse, and then you're fanning them out to like a huge number of tools and you're doing this at scale in a cloud SaaS format. Right. And so I'm genuinely curious, like, like what are the problems that you're facing trying to do that? Especially, you know, if anyone's done this, you know, sort of ad hoc or bespoke, like in a company, like help them understand what does it take to do this at scale? I mean, there's a bunch of things you have to factor in if you're going to do this yourself.

Starting point is 00:17:15 And I think we're all going to probably talk about some similar things here. But the first thing you got to deal with is errors, right? Just like things fail way more than you might predict, right? So the great fallacy of APIs going back, again, 20, 30 years is like, oh, they just work. Nope, they don't just work. So things fail. And building in recovery is significantly more difficult,

Starting point is 00:17:40 I would say, than simply writing the code to sync data, right? So that's one. Two is like scale, right? So size. So dealing with 10 rows is totally different than a thousand, totally different than a million, different than a billion, right? And people need to sync large amounts of data. Our users, like our companies have like on the order of 500 plus million users, right? So you have to be able to do this at scale and with destinations that don't handle scale particularly well. And why does it always work there?

Starting point is 00:18:13 I mean, it depends, right? Some are really, really good. Do you know which product is unbelievably good at scale? Facebook. Facebook will happily eat like hundreds of millions of records in like a snap, right? But Marketo? Marketo, other end of the spectrum, other end

Starting point is 00:18:26 of the spectrum. I like to joke about Facebook because it's like, you don't think about it, but it's like, the reason it's so fast is like they already have all the data. So they're just going, check. They're just going, yep, we know who you're talking about. But anyway. Other podcasts, another podcast. Yeah, yeah, yeah. So scale is, you know, how you stitch data incremental, like into a system, how you do it in the right kind of order with minimizing API usage, all these kinds of things is probably the second thing that if you're going to do this yourself, you have to think about. Third is probably monitoring all this. Things break.

Starting point is 00:18:58 Your stuff will break. Now, I think things have improved in our market broadly. You can use orchestration tools that have good, you know, some alerting for you, but you have to be monitorable, right? And that's really not a trivial amount of work. It's the same reason engineers don't tend to build New Relic or Datadog themselves, right? That in itself is expensive. And so that's a huge part of our software as well, right?

Starting point is 00:19:21 Because you want these things to be alertable, monitorable, et cetera. And then last I'd say is, I think, I don't know if you all have seen something different, but like most internal versions of this are, you know, not manageable by anyone other than the person who wrote it. Whereas the whole point, right? You talked about this, right? You talked about the democratization of data and analytics and people want to be able to access these things. And if you're going to build this yourself, are you going to build the UI to make it easily mappable that people can modify these things without having to call you? That is probably all the things you would have to build to do this well yourself. Yep. Love it. Okay. I'm going to slightly modify the question as I pass it on to Tejas and then Triti. But okay, so here's a slight modification. How do you decide what you manage and then what you hand off to the user and or where the compromise is there,

Starting point is 00:20:20 so if you think about incremental sinks, are there decisions that you need to make on behalf of the user? Or are there like use cases where you like make that decision for them? Those are actually fairly challenging when you think about data at scale. So yeah, I would just love to hear your perspective on that. Yeah, it's a great question. So one thing that I think has been really powerful about the tools in the reverse deal ecosystem is giving the users a lot of flexibility, but also a lot of guardrails at the same time. So one thing that we think has been really powerful about the tools in the reverse deal ecosystem is giving the users a lot of flexibility, but also a lot of guardrails at the same time. So one thing that we handle out of the box, where it's also tapped on as well, is like diffing.

Starting point is 00:20:52 So I think typically when companies build a script like this in-house, they'll just kind of build a loop over the data in the warehouse and call an API to go update it or upstart it into a destination. And it remains pretty basic. And then a challenge comes up, the destination API can only accept data at a certain rate and you need to only send updated data, but you don't have a clear updated app timestamp and say your data warehouse or something like that. So one thing that we've handled out of the box is it's different for our customers where

Starting point is 00:21:20 high touch can actually automatically only send changes to some of these downstream, to all these downstream destinations instead of sending all the data every time. And with Diffing, there's a ton of nuances. So we support multiple mechanisms for Diffing. So one that we support is like Diffing inside of the customer's warehouse where data that's being synced over is actually written back to the warehouse and joined against in the process of syncing to a downstream. Oh, interesting. Okay. But for not all, not for all data warehouses,

Starting point is 00:21:49 it's the best approach. I mean, like certain databases that our customers connect with, writing back to it isn't as favorable as a cloud data warehouse, like a Google BigQuery or Snowflake where storage is separate from compute. So if you're thinking about like a redshift, this may not be the most favorable approach or even more of a transactional or a production database, like an elastic search, or even like a production Postgres, you know, that might not be something that a customer is okay with, so we also do support other mechanisms of power of a stiffing, like writing the data back to like a customer's S3 bucket, for example.

Starting point is 00:22:20 And even depending on the data warehouse we use, we support like even more options, like for example, leveraging timestamp partition keys and something like a Redshift or Google BigQuery to automatically do more intelligent, faster diffing for stuff like event forwarding use cases. So one thing I would say is like with building reverse ETL platforms, we have a lot of features kind of built out of the box where companies don't have to implement this stuff, but then still allow them to kind of see more and dial in and control how it works if they need to for their use case. So I think the same defaults with a lot of customizability is a general approach that we've been taking to building our software and one that companies have really appreciated versus say other players in the market with like CDPs and whatnot. Yeah, super interesting. Okay, Triti, you're going to have the last word here, but you can only answer this in two sentences, just because I just have a quick diversion here. I said I wasn't going to get technical, but of course I'm trying to get more philosophical.

Starting point is 00:23:12 Maybe you want me to try two sentences? But here's the question though. Does the user care? And the reason I ask that is because how you build your product- It depends on who you're talking to, the user. No, this is going to be a big rabbit hole but like those are really complicated like things that you're you're discussing right like diffing across okay like warehouse right like general term general tool when we talk about diffing like very specific right and like very specific product problems. And genuinely, I'm interested in like,

Starting point is 00:23:46 do your users like care about that? You know, sort of like the tuning question, right? Like, yeah, you can get software running, but like tuning, it's like a different skill set. We talked about the various use cases, like who's, you know, like the traditional ETL, the team was always centralized, right? So we talked about this discussion around the jurisdiction, like who's owning these, if I can call them reverse ETL pipelines, right? Like if it's the GTM team that owns it, like, do they really care about these control and the, you know, the extensibility and such? Probably not as much, right? But there are other teams, like maybe the product or data engineering teams

Starting point is 00:24:26 that need a lot more control and flexibility. So the answer is, depending on what this reverse ETL pipeline serves, the needs will be different. The personas that we're using in building these are different, and they will require, like, you know, some places, like the Almond Box, sync and things work just fine. Like, you know, it's not just for reverse ETM, like the example of Salesforce, Marketo integrations, that out-of-box integrations just do fine.

Starting point is 00:24:55 But then there are some cases where you need to dedue. There are some cases you need to, like, do some lookup of some third-party application, like, depending on the nature of transformation, where you need a lot more control, right? So those are things, and then you need to also create some reusable proponents that you can apply and standardize across multiple pipelines. And in those cases,

Starting point is 00:25:21 you will care about more control and flexibility. But I just wanted to add, I think Boris touched upon a few things that are very important when you ask this question. What should the product do and what should the user do? Yeah. And to, for the era we live in, right, at least like how we believe the philosophy,

Starting point is 00:25:42 the product has to do more so the users can get done more. What does that mean? That's not just a soundbite. What does that mean? One is like when you're looking at like whether it be reverse ETL or any form of data movement, the number of sources the product can connect to, like tree node connectivity, right? Both from a source, right? So the product has to take care of those things. On the destination side, Salesforce offers like bulk API

Starting point is 00:26:07 to ingest much faster, right? We've got 10,000,000 rows. It does market not as much, next week not as much. So it's a very, very good product. And it's a very, very good product. And it's a very, very good product. And it's a very, very good product. And it's a very, very good product.

Starting point is 00:26:24 And it's a very, very good product've been to 10,000,000 rows. It does market not as much, next week not as much, right? So the product has to do more to do that buffering and the queuing and size. So the user doesn't have to worry about those things. So that's one very important part, the ability to breadth of connectors, like on both sides, the source system, source databases, and the destination. The other point that Morris brought up, like with any pipeline, bad things happen. Errors happen.

Starting point is 00:26:53 And if the product doesn't provide the ability to sell, see, recover, and you know, the pre-built monitoring tools to troubleshoot, even not troubleshoot, like auto-correct in some ways. It puts a lot of burden on the developer, right? And then it requires specialists to come in. So those are the things the product needs to do more of. What should the user focus on is more the business logic. Like what is the outcome that we want to drive, right? And like, we need to, I need to move these set of reports when for an upsell campaign,

Starting point is 00:27:26 I need to look at this data in this table, like monitor for upsell forms, 175 and whatnot. And then, you know, take that list out and move it into a marketing campaign. They should focus just on the business logic and how quickly they can configure. The second part, the more they're able to, you know, like business processes

Starting point is 00:27:46 change dynamically every week, every month, the ability to iterate. So it should not be brittle, right? It should not be brittle. The ability to iterate and be agile about it is also something the product should support. So I'll put it this way. So, and all the products that we are like represent here and the modern ones that are coming up, they have to have parity in terms of experience with what these end

Starting point is 00:28:10 users are using. What I mean by that is it's more configuration driven than click driven than code driven, right? So like Salesforce marketer, you can do most of the things to TX rather than have to write any books, But also provided extensibility, very unique thought, you know, whether it be some Python scripting, some three-agency scripts that you may want to use, you're able to pull that in so it doesn't put you in a box. So that, those are things that drive adoption of solutions like these. Guys, I have a question that is related with something that was mentioned a little bit earlier, that nothing is like extremely new, right?

Starting point is 00:28:50 Like it's not like the first time that the market out there had like to move data from point A to point B and even like push the data back to the downstream applications. But I would like to ask all three of you about like two specific cases of products. And I will start with Tejas because he's coming from Segment. So the two products that I want to ask you about, one is Looker Actions and the other one is Persona, right? And the reason that I'm focusing on these two is because these two products are not that, I mean, they were not created that back in the past,

Starting point is 00:29:28 like compared to when you started, right? But why we don't hear about them, like, in a way why they didn't succeed in creating the category or leading the category, let's say so, that's just you first, thenjas, you first. Yeah. Then I'll ask the rest of the... Cool, yeah, I can kick it off. So, this is a super time question because I was actually one of the first engineers working on segment percenters

Starting point is 00:29:55 with my co-founder and CTO at Hightouch. So, first, I'll take Looker Actions. Honestly, Looker Actions had a pretty brilliant idea, which was, you know, one of the first, I think one of the first, I think one of the first offerings to the market that they started evangelizing this idea of reverse ETLing, which was, we're analyzing stuff in Looker and there should be a way to take these insights and put it into the other tools that the rest of the business team look at and

Starting point is 00:30:17 not just have the business teams have to look at a Looker dashboard or a Looker report every single time. You should be able to use that information more live. That is really the concept behind reverse ETL today. I would say there's a couple of reasons that didn't really pan out. I mean, one, honestly, I would say it's just like resource allocation. Like if you take a look at the looker action destinations, you just have like a lot of limitations.

Starting point is 00:30:37 Like I think the Braze destination, for example, can only handle like results of 200 rows. They don't really do diffing in their infrastructure. They don't really have much visibility or observability. The kind of sync mapping interface that customers expect for like a more modern reverse ETL platform is just like not there in Looker Actions. So I think really the reason it didn't take off is because activation, like data activation is just a separate technical problem and a separate technical space than data analytics. And I don't think that the team working on Looker Actions really treated it as such and invested in building Looker Actions to the same product perfection and degree of thoughtfulness that kind of best of breed solutions have come out to the market with, like high kitchen sentences, for example.

Starting point is 00:31:23 So that's the reason I think Looker actually is going to pan out. There's also some parts about it, which is that tons of people don't use Looker and want to tap into data in their data warehouse. But I actually think even tons of our customers do use Looker. And the real reason it didn't get very far was just product quality and product design at the end of the day. When I think about segment personas, it's actually different. Segment personas, for anyone who doesn't know, basically it says, okay, you're tracking all this event gate in a segment. It's being forwarded to all

Starting point is 00:31:48 these different downstream tools, but we want to provide marketing teams and growth teams and teams like this a central place inside of the segment product where all the user data is aggregated into these profiles that you can then build upon in a WYSIWYG way. So add some computed trades like number of orders in the last month to one or LTV, and then also build audiences on top of these profiles and sync them out to different tools. So really, if you think about it, Segment Personas was almost building its own source of truth

Starting point is 00:32:19 off segment data within the segment products. And I think what the market has really realized is that the source of truth is not going to be in any sort of proprietary vendor or any sort of SaaS application or follow any sort of spec of what a user should look like in segment or what an event should look like or what a shopping cart should look like. It's going to be in the data warehouse where companies are able to get all the data into it via numerous different ETL vendors, where there's a standard that all software is kind of integrating on top of, transform it freely using software like DBT,

Starting point is 00:32:51 for example, in the ELT stack. And then once they know what a customer 360 view kind of looks like in the data warehouse, sync it out to all the different downstream destinations. So honestly, I would say the reason Segment Personas primarily didn't pan out, I would say is just because it was built on the wrong source of truth, right? It was built directly on top of segment as a source of truth with the warehouse as kind of like a side afterthought. Whereas what I think has really become clear in the last five to seven years is that companies want to use the data warehouse as a source of truth. That's where all the data will be. And that's where the best data will be, kind of as Boris mentioned earlier.

Starting point is 00:33:24 And that's really the trend that reverse ETL and data activation is riding on. That's very interesting. And actually based on your experience at Segment, because this is something that like I've been thinking like from time to time, do you think that the way that personas like were implemented based on the, this single source of truth that was like the Segment itself was also like a result of, let's say, timing, like when segment actually started as a company. David Pérez- Entirely.

Starting point is 00:33:50 Yeah. I say this time and time again. So I think, you know, the approach that CDP solutions like segment took, you know, back when I worked there or seven, eight years ago when the solutions were started to be designed, was not wrong for the time. If you looked at data warehouse usage at the time, I mean, companies like Snowflake just add less than 100 customers when I joined Segment, honestly. And unless you were in the enterprise, you weren't really heavily using the data warehouse,

Starting point is 00:34:15 BI culture and solutions like Looker were just popping up. If you went to a company and said, hey, we're building reverse ETL, we're going to allow you to take data from your data warehouse and feed it all into these SaaS tools to solve problem A you have on marketing or problem B you have in sales. Technically, that works. The software would work just as well then as it did today in a lot of sense as a technical solution. But when you think of the fit, like the product market fit for companies, they just didn't have the data in the warehouse in the first place.

Starting point is 00:34:42 They weren't building the kind of models of what it means to be a customer. How much are they paying us? Are they a high value or low value user? Just, you know, the, all the prerequisite steps weren't done yet. So it just didn't make sense for that to be the way that companies solve data activation problems all the way back five, six, seven years ago. So I don't think the way CDPs approach the problem was incorrect at all. I think it's just a different approach for a different time. And now that companies have made this massive investment in data

Starting point is 00:35:08 warehousing and the modern data stack, everyone's looking for how can I drive more value from it? How can I use all the data I have and all the data models I've built to drive growth? And reverse ETL and data activation is really the answer to that that makes sense for businesses at this time. I could not agree more with that. Like most of these things end up with good decisions in their context, right? Even I would say, since you talked about nothing is new under the sun, right? Long before any of those products, like people were integrating data and it made sense to do it, you know, from A to B without a warehouse.

Starting point is 00:35:43 Like it wouldn't have, you would have been an incorrect decision to kind of design with a warehouse bias, right? Like we did something in 2018 that was like almost weird for its time, which is like we put all of our products capabilities inside the warehouse, right? Which was unheard of for a SaaS product at the time. So it's like, you can cut the cord of census. All our data is actually sitting in your warehouse. Because I felt like, you know, there cord of census and all our data is actually sitting in your warehouse. Because I felt like,

Starting point is 00:36:06 you know, there's a secular trend towards owning your data, which Tejas kind of mentioned. I think those are much larger trends than even just a data stack trend, right? Yep. You're from Greece. Like, Europe has led the way,

Starting point is 00:36:18 but there's a general trend towards owning your data, making sure it's not locked away in a proprietary platform, right? And the data warehouses have just been this perfect piece of infrastructure for that. And then if you think about, I tend to think about the humans involved a lot,

Starting point is 00:36:35 as well as just the tech, right? I know it's weird for technologists, but Segment was a brilliant bridge between engineering and marketing. Right, Tejas? Would you say that? I would say that's accurate. Like product engineering in particular is the big differentiation.

Starting point is 00:36:51 Right, right. And when we started, we were not trying to be a bridge between product engineering and marketing. We're trying to be a bridge between the data team writ large. And we started in sales, but eventually all teams. But it was really about putting the data team at the center, right? And, you know, Looker, of course, cared about the data team, obviously, but it cared primarily about this like batch analysis, you know, explore some reports about what happened last quarter. And this idea of taking the data team and making them a central pillar of the company, that they're operationalizing their work, that

Starting point is 00:37:32 they are driving in the truest sense, Eric, right? Like driving the business, that is a different relationship. And if you had tried to build that relationship 10 years ago or 70 years ago, the data team was too small, didn't have enough tools, wouldn't have had the buy-in from the C-suite to own this part of the company. Well, and the data wasn't actually centralized, really. Yeah, but all these things build on each other, right? But I think it's not just the data centralized now, it's that data teams and I think CEOs around the world are realizing, like, I need to give this team more influence in my company because good things happen when I do

Starting point is 00:38:11 that, right? So you needed a new bridge between them and everybody else, right? So that's kind of like why we have, you know, kind of, we talk a lot about the word analytics because it's like, that's kind of the lingua franca of data teams is the word analytics. And it's like, let's operationalize that, right? Yeah, I think outside of the core data team as well, it's just that, you know, the data enabled personas in organizations just have a much more powerful tool set than they did 10 years ago. Like, obviously, you know, marketing operations analysts, marketing analysts, sales analysts, those roles existed 10 years ago as well.

Starting point is 00:38:43 But if you look at the tools they were using, they were using Google Analytics, Omniture, Google Excel, tools like that, Salesforce reports. They didn't have the power of the data warehouse. They weren't leveraging BI. They didn't have knowledge or even access to SQL queries, access to a changed to it to to say like you know it's a lot easier for for any business user to find someone nearby them like sits nearby them in the office that i can write sql or that can use a bi tool then someone who can code and and that wasn't really true 10 years ago i'd say costa i was i was talking to someone the other day how many people do you think have the title have sql in skills, but no other programming languages?

Starting point is 00:39:27 On LinkedIn? That's a great question. That's a good question. I would assume. You have to listen. I'm going to tell you, but if you listen to my next published podcast, you will discover it in that. Like as a percentage or like? No, no, no. Just number of humans.

Starting point is 00:39:39 Number of humans on LinkedIn who state SQL as a skill, but not what the rest of us here would probably call a program. A number of humans. This is great. This is like wits and wagers. Wits and wagers is a fun game. It's a great game. Well, the question was for you, Costas. I'm trying to answer it.

Starting point is 00:39:57 Yeah, I don't know. I would assume that. Just give us a number. That's way more fun if you try to give a number. Give a number? I don't know. Like, I mean, a number. You're failing the interview. At least give a number. I just sent two of all LinkedIn users. So if LinkedIn has like 700 million users, like 14 million or so.

Starting point is 00:40:19 Whoa, nice. That's high. That's high. I like that. I think that's high too. I was going to say like... I don't know anything though. Two to four million was my guess. But that's, I think, Triti, your math... I was thinking six figures, to be honest. So, I think I would have guessed the same as you, but it's on the order of five million.

Starting point is 00:40:37 Wow. That's great. Pretty great, right? Great for us. Great for us. I'm not the guy, but I stand by it. I'll just back to your question. The need for like, you know, like it's like the, this pattern has existed long before it started getting branded as reverse EGL. The difference is how it has been fulfilled in the past, right? It has been fulfilled

Starting point is 00:41:02 with CSG exposed, right? It has been fulfilled with CSG exposed, right? It has been fulfilled with CSG exposed, and things like that, right? And who was able to do that? Who was able to do that in the past is like these centralized data teams or people who are very incompetent with databases

Starting point is 00:41:19 and such, right? Not only for the reason of knowing SQL, just from a compliance and security standpoint, you didn't have access to these systems of report, right? So what has changed, like, and you know, the segment, the looker, how to categorize them in any product that doesn't have anything with data other than like visualization. But segment, it was a good example. But what has changed is the demand for these requests. It's coming from, you know, we already talked about what set of use cases.

Starting point is 00:41:54 Like if segment solve for like 5% or 10% of use cases for GTM teams, there's a whole large number of unsolved cases that gets unmet by any tool. And that's why these products exist, right? So there's a need for ownership of these processes outside of the data team as well. And Boris and if you can speak to this, like your buying centers will be different from the traditional data teams, right? The traditional, I mean, set to some of the traditional ETL products. Yeah, yeah. I think your point, both of you

Starting point is 00:42:30 are saying like, there's this democratization occurring, right? Of skill, of skill. And listen, you talk about buying, right, Trudy? I think the journey of SaaS for 20 years now is this empowerment of individuals and teams that are not... You're talking about data teams. It used to be that all your software was bought by the CIO, right? Period. And deployed by your CIO and in the office, in a physical office somewhere from that, right? And people used to call it shadow IT and all these things. But broadly speaking, it's about having more choice, more autonomy, and different sets of teams being able to make decisions about what tools they want to use.

Starting point is 00:43:09 And I, you know, this is where, you know, like I've been at this for technically a decade, if you factor in my previous company, which was all about kind of democratizing access to SaaS. I think this is the journey

Starting point is 00:43:21 we're still on as an industry is letting individuals and teams make decisions about software and using it to the best of their ability. In other words, with the best data from the trusted source, right? But our job has to be to create the right, you know, let's call it guardrails and availability of that data, not to prevent individual teams, whether that's a sales team or a content marketing team, doesn't matter, to make choices about what tools they want to use, right? And, you know, the analogy I like to use about this, now I'm going to really frame myself as a child of the 80s, but like, video games used to work like this too. So in the eighties, like video games were not purchased by the children who played them. They were selected and purchased by parents.

Starting point is 00:44:10 And, and therefore they were marketed to parents, things that people don't remember this, but they were marketed to like mom and dad as like safe, fun games. And that all changed in the nineties and into the two thousands where we now, you know,

Starting point is 00:44:26 have, you know, more violent games, more sports like games that that are more for, let's say, the user. But the reason we could do that, there are these necessary pieces that had to come into existence, like ESRB ratings, and app stores, and controls from the game makers so that you couldn't just install whatever game on your console. And so that is where the building blocks. And so SaaS is, to me, a similar journey just for the worker in the IT world. And so, yeah, Triti, to answer your question,

Starting point is 00:44:56 like, yeah, the buyer's not going to be this centralized, massive team. The data team just has to have the right visibility, observability in our platforms so that they can let everybody else kind of select and do what they want. That's kind of how I'm doing it. And, you know, building on that analogy, you know, with those, like, even though the kid purchases, it's still a preference of some supervision, right? Exactly.

Starting point is 00:45:19 Exactly. Well, thank trust. It's governance, yeah. Exactly. It's trust but verify. So governance will, I think we're at the early days of that, but I think, yeah, that's going to become key to all of our platforms is to make sure there's reasonable governance. Yeah. And I think something really interesting here is something that Triti actually brought up earlier, which is that when you said, what does the product have to do versus

Starting point is 00:45:38 what does the user have to do? I think about it like a little differently, like almost an extension of that, where the product is also the infrastructure that the company is building, right? So it's not just what is the highest that your product has to do, but it's what does the data warehouse have to do? What is that upstream versus what does the user have to do? And I think that balance, like striking that balance in the application is like, you know, the winning formula to enabling business teams to be able to leverage this data. So as much as possible, if reverse ETL and data activation platforms can,

Starting point is 00:46:05 you know, tap into tools like in the observability space or, you know, leverage models from the transformation space or do a lot of things outside of their product that taps into the overall infrastructure that kind of the technical teams, the data teams are putting forth in an organization, then that makes it a lot easier for business teams to come in and solve these cases in a self-service capacity without actually building more product features in the reverse ETL tools itself. So I think that's a really interesting trend that we're seeing. One thing is with the CDP players, everything was kind of in a proprietary ecosystem where let's say you wanted a data transformation feature. CDP had to build a data transformation feature in it. Let's say you wanted observability on data ingestion. CDP had to build a data transformation feature in it. Let's say you wanted observability on data ingestion.

Starting point is 00:46:45 CDP had to build observability into its platform. With reverse ETL or the data warehouse sort of first approach, these can be solved by the ecosystem of players that all interop and build on top of the data warehouse instead of necessarily one vendor. And a lot of these problems that could be solved by the product can now be solved by the kind of technical infrastructure and analytics infrastructure that a company has in place, which I think is just super powerful.

Starting point is 00:47:07 So business users don't have to think about any of that stuff. You just have a question on that. You know, you touched upon this and it seems to be something that reverse ETL somehow is to be tied to the data warehouse, the data lake as a store. Sure. I think it's broader than that, that it can go beyond like any centralized or founded career data. Where do you, you know, and so, okay.

Starting point is 00:47:30 I just want you to talk some more to sleep. Now, I'm very afflicted. Uh, the cost of your thoughts may, uh, when we said reverse ETL, like, yeah, because ETL has been traditionally tied with the data warehouse, it may, uh, indicate that every mostly GL always there has to have the data warehouse. It can be an NVM, right? It's like a customer data hub or a data hub as a response. Yeah.

Starting point is 00:47:58 Yeah. And I think all of our products support lots of different sources, right? But I think the goal is, I think if as an industry, we end up with a variety of sources and a variety of destinations and no central cleaning and deduplication and kind of unification in the core somewhere,

Starting point is 00:48:20 we're going to make great companies who make lots of money, but we will not actually have moved the industry forward. And I think this is, to me, where we need to land in the end, right? Is that you have, remember what I said at the beginning about the goal is to have data, the best data, data you can trust, right? And the tools that you want to use. And I think you should be able to use any tool you want, but the data you can trust is key. And if you don't have some amount of centralization somewhere in the company,

Starting point is 00:48:52 then this, I don't know how to make that happen. Like to me, you get trust through central, some centralization and some federation, right? That's just how, that's always, that's why our product is called Census, by the way. Like it's because exactly that was the intent. Boris, completely agree. But the data you can trust from a business analytics standpoint,

Starting point is 00:49:16 maybe data warehouse, but for, for example, MDN, or customer data, product data, right? That may be the system of truth. It's not the data warehouse. The same goes for... I hear you. I hear you. I hear you.

Starting point is 00:49:31 I think every SaaS product I've ever interacted with, and I think Tej is smiling because he's had the same reaction, is like every SaaS product I've ever interacted with

Starting point is 00:49:41 in some form on their website says something about the system of record for X. For X. Take your pick. I think Drift once says something about the system of record for X. For X. Take your pick. I think Drift once said, we're the system of record for chats or something like that. And I was like, what? I don't even know what that means.

Starting point is 00:49:51 I swear. I think it said something like that once. And so I think, Triti, I am totally on board with using a source that your company has fully bought into. Like, this is the truth, right? Then it's great. Then it's great. Then it's great. But in my experience, the reason people tend to gravitate to the warehouse

Starting point is 00:50:10 and why we made early on a pretty hard decision to like bias towards these kinds of platforms, not to the exclusion of others, but to like as our primary bias is that they have infinite storage and infinite join capability, right? And like that, to a digital point, you can use the ecosystem for that.

Starting point is 00:50:28 You're not tied to a single vendor making sure that it supports every source, right? And so I think that if you can get that out of something else, then great. Like, you know, we'll support that as a source too, right? But that to me is the important part is that you can join all data somewhere that matters. I agree with that fully.

Starting point is 00:50:47 But I would also add that I think the even more important part is that you have data at rest somewhere in an organization that your business team simply aren't using. I think data that you can trust is definitely a huge part of the value of these things. But the biggest thing is that before solutions like Hithitch or before like data activation, before reverse ETL, people just weren't using the data at all, right? There's so much, you know, such a wealth of data and a lot of the companies we work with,

Starting point is 00:51:12 it wasn't. I think that, but the central premise, right, Tejas, between how we approach this is like, you're not connecting Zendesk to Salesforce right now. For sure. Neither are we. And so I think,

Starting point is 00:51:23 and Triti, like, I think there's tons of data in Zendesk that can go into Salesforce and it does. And I think that's great, but potentially keeps you away from coalescing on something that is more trustworthy. I don't think that's wrong for some use cases. Correct. Agreed. True tangents. Of course. Of course. That goes without saying. That goes without saying. Yeah. We were actually talking about that. Like a pipeline that doesn't make sense for someone to build or really for like... Like at all? Well, no, no, no. I mean,

Starting point is 00:51:59 just like an example, it's like, okay, you have leads in Salesforce and you want to sync those to Google Ads because there's certain data, right? And it's like, okay, well, no one wants to manage that pipeline. Great. Like Google and Salesforce built it. So you can just reverse all the data points in there and then great. Right. So there are like point to point connections where it's like, this is awesome because like no one has to manage this. These two enterprise companies like built in integration. And this is awesome. Like great. Like connect the tools

Starting point is 00:52:26 and the data teams like and the actual operational teams don't have to deal with it. And like that is very convenient. And if every app, if every app on earth was perfectly connected to every other app on earth.

Starting point is 00:52:36 Sure. Right. We're talking about Salesforce and Google Ads, right? Like, yeah, they should. But those are, but Salesforce and Google Ads is great.

Starting point is 00:52:42 What happens with Facebook? Well, but even, even if we only focus on Salesforce and Google Ads, that integration has all sorts of limitations. It can only sync, I think, the last 90 days. Like, they all have limitations, right? Yeah, yeah. Agreed. It pitches this point from way earlier.

Starting point is 00:52:56 Like, do you think the staff, senior staff engineers at Salesforce are working on that problem? I don't think so. Right, but it is, like, totally a cost benefit for the data teams working inside a company where it's like, great, we're just going to offload that, right? Like we can accept the limitations. I mean, our goal as software vendors and data integration should be able to make it as easy to do that as you can do it in the Salesforce UI, but perfectly on top of the data warehouse. I really think that's possible. Absolutely. Totally agree. Yeah. And there's also a matter of like expressivity, to be honest.

Starting point is 00:53:28 Like you can move data from something like Zendesk to Salesforce, right? Like you can do it with Zapier. You can do it with Zendesk natively. You don't even need it. Yeah, yeah, yeah. Yeah, exactly. But the whole point of like working with data is like how we can take whatever data points that we have that they store probably in almost, let's say,

Starting point is 00:53:51 an infinite amount of like implicit information in there and make it so we can push it and use it somehow. And to do that, processing environment, right? And these processing environments, like humanity so far has decided that's going to be like a database system. Like they are built for the region, right? Yeah. Separation of concerns. Yep. So unless the things that we have to do are like super trivial, like, okay, someone signed up. Okay. Let's send this somewhere. Okay, fine.

Starting point is 00:54:25 But anything more than that, that requires some kind of like business logic to be built there and be executed on top of the data in order to derive something, it needs to happen somewhere. And that's not the pipeline that will do that, right? That's a great point. I mean, I guess once the context is in point to point,

Starting point is 00:54:42 the context is decided for you, right? Yeah, well, it's context, but also decided for you. Right. Like, right. Yeah. Well, it's context, but also like, I think we all know history here, right? Once upon a time, you had to put the logic into the pipe because of literal computing constraints. Like, like going back to, you know, we, we actually had limited ability to, to, to move all the data around. So luckily we now live, that's a genuine shift technologically, right?

Starting point is 00:55:05 Like now we no longer have to pre, like compute on the fly or as we move, right? So that is one thing where we can clearly show before and after where compute costs went sufficiently down that we could just store everything and then compute after. But you're right, Ghost. Eventually you're going to need to compute in some form.

Starting point is 00:55:24 People might not realize they're computing. I found that people who use Excel don't realize that they're programmers, when in reality, Excel is the world's most popular functional language by far. Sure. You know, for all the Haskell developers out there, like actually Excel. Those are, that's a good one. Yeah. I saw this, you know, the Q point.

Starting point is 00:55:39 I just want to be very clear about this. And I wish you mentioned Workato and sort of Zip. That would be nice, but I'll get to the point. It's like the Zend and the Salesforce integration. But the example that it brought about, like let's say,

Starting point is 00:55:50 you know, I go to get the Git system and register as a user, as a lead, right? And there is a, like the reverse ETL is not a patch all for everything.

Starting point is 00:56:00 The lead getting to it as the RMA in real time, it's a very different flow that requires some integration and automation, which may not even patch like any data warehouse, right? It needs to happen in real time because somebody in the army to respond to me in less than five minutes, right? We hang out with our production database. We actually do that with a replica of our... Yeah, yeah, yeah. I'm just saying that's where, you know, they're different nature. The other part is like, once you've connected all these leads and say you wanted to do a

Starting point is 00:56:30 re-attribution and pay no stats, right? And that data is in the data warehouse and say, hey, let's reach out to these leads that were interested in some point in time, but it never went somewhere. And you need to move that data into, you know, like mobs tool kind of thing, that's when reverse ETL comes in. So there's a place for both. There's a place for both. And again, the tool of choice will depend on, you know, what the user is trying to solve for.

Starting point is 00:56:56 But there's a place for both why Glendestine's post-integration need to happen, regardless of whether it's either Glendestine's post as the most reliable source of customer data or not. That's a great point. We're close to the buzzer here, so we didn't get to talk about

Starting point is 00:57:14 a number of things that I would have loved to, but we need to do a Q&A and wrap it up here. We'll just do a couple of quick questions here. The first one, which is super interesting, is there was discussion around, I'll give a little context here,

Starting point is 00:57:30 or I'll read some context into the question. There was discussion around the change in technology, right? So products like personas were built before the warehouse had sort of come of age, as it were, right? So the question is, the technical ability of roles it were, right? So the question is the technical ability of roles has changed, right?

Starting point is 00:57:48 So a marketer, you know, 10 years ago was far less technical or most of them were far less technical than today, right? And even salespeople, right? And sort of the appetite for like different interesting types of data that help them do their job. How does that influence the way you're building your product?

Starting point is 00:58:06 Not only has the tech changed, but the users, marketers are very data-centric. Salespeople are becoming more data-centric. How's that influencing the way that you're building your products? Yeah, it's a tremendous responsibility. No, I mean that. We get to see and foster basically a, you know, an upgrade in skill. And to me, you know, I think a lot about, you know, when you don't learn just computer science by learning, you know, like how to, how to write a, a get commit, right? Like you, there's theory related to that. And I find that the, the, I've always framed census to our users as not a data pipeline, but more of a data deployment tool, right? Where I'm trying to teach you certain aspects of software engineering without calling it that. And so to your point, marketers, people on data teams are all becoming dramatically more

Starting point is 00:58:58 savvy. DBT has led the way in terms of teaching people how to check in their SQL models, like that's, people think that that's like, no big deal, but it's actually huge, right? And we're at the infancy of that we're at point, you know, of those 5 million LinkedIn SQL people, we're probably at a teeny, teeny tiny fraction who know about version control, right? So, so I think of it as it's a super exciting, and it's like, it's kind of a responsibility. I feel like we're teachers, as well as like engaging with them on these these ways so so there's a lot of you know kind of integration like we we long ago integrated with like you know the air flows and and prefects of the world right so that

Starting point is 00:59:34 you know we can so that we can help marketers or analysts or data teams who want to plug into their you know kind of modern infrastructure can't right so So yeah, I think it's, it's totally informs how we think about our cool philosophy of everything. I was going to say Triti and then and then Tejas, and then we'll do one more question to end it out. Yeah, I'll just add it in. And it's like what Boris said, it's a tremendous responsibility, but it's like for like from a work out standpoint, it has been always like recognizing that

Starting point is 01:00:02 fact that people are changing. It's not written their skills and their needs are not static, right? It's changing and they want to do work. But at the same time, you know, taking the technology, uh, barriers, the skills, uh, the friction to learn and are not out of the way and making them successful faster, right, so that's what we focus on. And then also the second part is not put them in a box. Like if they want to do more, the platform should give them

Starting point is 01:00:29 to the ability to do more, right? So it's a balance. It's a very hard balance to thread. But that's it's an important one that the role of empowering more people to do things and also providing the right controls so they do it responsibly. Yeah, totally agree with everything that's been said. I think there's kind of a balancing act between two chains of thoughts in our product organization and we're pushing on both axes and then they both balance each other out.

Starting point is 01:00:56 One of them is like, how do we empower more people in an organization to actually perform reverse ETL? So, you know, a decade ago, engineers are the only ones building the scripts to move data from the data warehouse or people trade in like a MuleSoft or something super technical to move data from the data warehouse into all these different systems. Now, you know, we're allowing data analysts to you. Next, we're going to want marketing ops to you. Next, we might even want some marketers on the team, depending on the technical level to be able to do so. And on that train of thought, we ship new features like audiences that allow marketers

Starting point is 01:01:27 to come in and kind of build segments on top of the data warehouse and sync those out to different tools, basically performing reverse ETL without necessarily knowing all the ins and outs of SQL. On the other hand, we also, the other kind of train of thought that we're pushing that balances the first one out of empowering business users in our product organization is really this philosophy of like taking all the principles and all the tribal knowledge that software engineers have and the processes that they have. So version control, you know, observability, visibility, staging environments, kind of pushing the staging before production. And our goal at Hightech is to think, okay, what would the best, if the best software

Starting point is 01:02:03 engineering team ever was to build like, you know, would the best, if the best software engineering team ever was to build like a script from, or a platform for moving data from the data warehouse into something like Salesforce, what would they build? And they'd have all of those things as a part of their 12 principles of deployment or whatever it is. And how do we make all those aspects of a really strong data pipeline available to kind of the less technical users, whether it's data analyst, marketing ops, marketer, without them having to know all about it. And I think if you look at our application product features like GitSync, so the ability to be just using the Hydra product as usual,

Starting point is 01:02:36 then everything you're doing, all the configuration to be bidirectionally synced with like a GitHub repo is a really powerful step in that direction where you don't have to understand it all to start, but now you can start seeing the commits you're making and then you need to make a bulk change. You know, you can do that in code as well, but you can also just use the application as is. So, all right, last question guys. And hopefully like I can make you promise that we will do that again in the future because we have like more stuff to chat about and we need more time.

Starting point is 01:03:06 But I'd like to hear from like all of you, like one thing that you are anticipating like to come in this category and that makes you like really, really excited. So let's start with Boris. It's coming that I'm really excited. I mean, there's so much. So I'm assuming this is not a, this is not a, like, we'll talk more about like what's happening in our ecosystem rather than just in our, in our products. But I think the, the warehouses keep getting better. Right. And that is,

Starting point is 01:03:40 it just enables more possibility. Right. So, you know, if you think back to when we started the company, we liked the warehouse because it had infinite, effectively near infinite storage. And we could use it as both a source and a destination. We could actually write our information into the warehouse. That way we could, you know, kind of do diffs. Like I think incremental six is like a table stakes thing. It's like, that shouldn't be some fancy feature. And, but we can do so much more given the capabilities of the warehouse. Right. And again, when you have separated workloads

Starting point is 01:04:13 and infinite storage, like there's just so much you can do in terms of being able to create more observability, more kinds of transforms. There are new SQL functions that still come out, right? Like that are kind of really fun for people. I hope to teach people about certain approximation functions that are actually kind of neat, but story front of the day. And of course, they're all getting more real-time, more centralized, more merged across the lake, the warehouse, the real-time systems.

Starting point is 01:04:39 And I don't think they're ever going to perfectly intersect, right? But the beauty of the the most of business like most of business save of like a very small set of things can really handle what i would call our version of like real world real time which is not computer real time it's like you know seconds not microseconds and i think there the warehouses are really getting there. And that, I think, will unlock so many scenarios in terms of, you even gave that example close to, right? You said, there's some things where it's like, oh, they signed up, right? And but other things where you need to do some computation.

Starting point is 01:05:16 It's like, you should not have to make that trade-off. Anything that happens, you want to be able to compute on it and you want to be able to operationalize it, you should be able to compute on it and you want to be able to, you know, operationalize it, you should be able to do that. And I think that's why I think we're, we're, we're just in a fun era of kind of warehousing and let's call it data storage and competition just getting better every year. And so I think it's just a fun time to be in our space. Yeah. And more accessible actually.

Starting point is 01:05:39 Yeah. Accessibility. I think is a great way to think about it. Yep. Yep. That's great. Pretty your turn. And I'd love to hear also like from your perspective, because you are coming

Starting point is 01:05:49 from the more enterprise space. So what do you see there? Yeah, like ETL became a thing, not because ETL was a people romanticize ETL because ETL was such a cool technology. What drove the rise of ETL or now what's trending towards ELT, is a rabid appetite for consuming data, the data-driven decisions, the business intelligence side of things. So the exciting thing, like with the reverse ETL trend, what will propel it to what ETL has been for the last 30, 40 years is the trend that we

Starting point is 01:06:26 see in enterprises with this problem, like going from big data, which drone ETL to big ops. Everything that we are talking about, you know, reverse ETL is just a way to move data from one place to another. But at the end of it, the GTM teams are trying to convert to each password, launch campaigns, more effective campaigns, right? Product teams are trying to drive the product ledger out and, you know, drive better experiences.

Starting point is 01:06:51 Customer experience teams are doing the same things using data, right? So the big ops is the next big thing and reverse ETL will play a big role in that. I'll leave it at that. And that's a trend that we see in enterprises. Yeah, that's very interesting. That's a trend that we see in enterprises. Yeah. That's, that's very interesting. That's a very interesting term, big ops. So that sounds great.

Starting point is 01:07:12 So Zaz, your turn. Zaz Kucinich- Cool. Yeah. I mean, honestly, on the technical front, I think I, Boris and I are thinking alike here, I'm really excited for the data warehouses to just get better and better. I think streaming data warehouses is something we've always been excited about. There's some players like Materialize that are, you know, building the ability to give a SQL database a view that's defined in SQL.

Starting point is 01:07:34 And as the data comes in, it's incrementally processed so that a system like Hightouch, for example, could just subscribe to that and automatically forward, you know, what cohort or what audience the user is in based on the SQL formula to all these downstream tools. And while that's not innovation in high touch, that unlocks, you know, massive potential for high touch to be used for use cases like on-site personalization in real time that, you know, it's harder to use high touch for today, as kind of Triti mentioned. But on the reverse ETL product front, I think really what I'm excited about is just more the design aspect of things, actually. I think a big bottleneck to making reverse ETL and data activation someone hands them and they see the graph on it. Like I can't wait till they have a problem. Like, oh, I wish I had this data point in this tool,

Starting point is 01:08:32 or I wish I could grab users that meet this criteria and they can walk into a data activation tool that they have never used before the organization. It kind of gets connected to the resources that exist around the company quickly, metadata from all those different systems and helps guide that user through actually solving their business level use case. And I think a lot of the innovation in the space outside of the technical front will actually just be on the design of the products and making it really accessible and separating technical concerns

Starting point is 01:09:00 from business concerns so that business people who identify a problem can just solve that problem. And that's where we spend a lot of our headspace honestly and i think it's it's a function of both marketing reverse etl is not the most accessible term to everyone as well as product and partnerships and that's what i'm most excited about yeah yeah 100 and i think i mean we don't have time today but there are a couple of things that we didn't manage like to touch and discuss about today. And I think one of the most interesting is the users behind who is mainly affected by reverse ETL. What are the personas? Like, what do you see there? And Boris, you mentioned like this 5 million people there that they know SQL, but they are not technical, right?

Starting point is 01:09:47 What's the journey to enable these people to do more with less technology? the, the, they don't just hire us for moving bits, right? Like they're hiring us as a piece of software, but they're, they're, they're trying to increase their impact, do more with their data. Like that's exactly. Yeah. Colin business. Just the morning. I thought, you know, they just mentioned two things that triggered some thoughts on this at first.

Starting point is 01:10:21 He mentioned one thing that was very important. Like it's that the, you bring in a tool, like the tool, but how it fits into your overall data strategy or how you're thinking about data in the company. And the second part, like you just mentioned about this, about like the ability to publish and subscribe like events, right? And that's a trend like we see in enterprises, which is this event driven architecture, right? It's not, it's not just connecting your data warehouse,

Starting point is 01:10:46 cloud network, not to the business applications, but the ability to stream events, right? To a bus and then consume it across various subscribers. The deep top of that architecture, that's a big trend as well. There's no doubt. And again, the maturity level in C++ enterprises varies, but that's definitely a trend.

Starting point is 01:11:10 Yep. Yep. Makes a lot of sense. Someone tried to say something and I think interrupted. I was going to say, I think it's time... We're way over time, Eric. We're way over time, but I think, you know, I'll try to summarize at least one of my primary takeaways.

Starting point is 01:11:27 What's so interesting to me is that reverse ETL is almost a misnomer in that it's just sort of moving the data. It's moving the data, right? It's a pipeline and it sort of describes a flow of data. And what we're talking about here is far, far deeper than that, you know, and impacts the organization as a whole. And I think reflects changes in the industry as represented by both technology and then, you know, the changing skill sets of people. And so this has been a true treat to hear about how everyone's thinking about that.

Starting point is 01:12:01 So thank you again. Thank you for going long. And let's do this again and dig into users. And then, of course, we didn't get to synthetic events, my favorite topic when it comes to reverse ETL.

Starting point is 01:12:11 So we'll do it again in another couple months. Thank you, everyone. Thank you. Thank you. Thanks for having us. Yeah, it's a lot of fun. Thank you, guys.

Starting point is 01:12:19 We get to talk to some super smart people, Costas, which is honestly like maybe one of the highlights of my week, just being able to ask good questions, or at least what I think are good questions, or at least my own curiosities to these brilliant people building stuff. I think what was so interesting was we really didn't... If you went back and listened to that conversation and you didn't have the context for it, you might not necessarily think it was solely centered on reverse ETL. And we actually didn't talk about the actual technical flow of data from a row in a warehouse table to a field in some sort of downstream tool. And I mentioned this at the end, like reverse ETL is like a, it's a strange term in that

Starting point is 01:13:06 way, right? Because the way that they're thinking about this problem is so much more comprehensive than, you know, just sort of a basic pipeline that's moving data from A to B. So yeah, it made me even, it made me think even more about, you know, your point that the name for this maybe is not a great name. What'd you take away? Yeah, it's not a great name. My main takeaway is that we really need to spend like more time with these folks,

Starting point is 01:13:29 like discussing about not just like reverse ETL, but the whole, let's say, transformation that the data infrastructure is going through right now. Like for example, you saw that one of the most exciting things they talked about where it was about the latest developments in data warehousing, right? And like, what does this mean? Or what it means to have like so many people out there that they know, or they say that they know SQL, but they are not like technical, right? And we still have like so many people that they technically are doing functional programming through Excel sheets, but still they are not using like all these amazing technologies that we are talking. So the potential is obviously like huge

Starting point is 01:14:21 out there and we are still very, early and what i i'd like to add like to what you said about like speaking with very smart people i would say that like what i find like extremely fascinating is that it's not just like they are smart people they are also like highly motivated people that's what makes like things even more interesting because we have people that are trying to change the way that we are working with data and it's very early. So I don't know, I find it like, how to say that, as an amazing opportunity to take a glimpse in the future when we can get all these people together and chat with them. So hopefully we'll be able to do it more often in the future.

Starting point is 01:15:02 I agree. All right. Well, thanks for joining us. Lots of great recordings coming up. So make sure to subscribe and we will catch you on the next Data Stack Show. We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback.

Starting point is 01:15:21 You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com. Thank you.

Pet Camera - EBO Air 2

The Data Stack Show - 80: Is Reverse-ETL Just Another Data Pipeline? With Census, Hightouch, & Workato

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

The Data Stack Show - 80: Is Reverse-ETL Just Another Data Pipeline? With Census, Hightouch, & Workato

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.