The Data Stack Show - 80: Is Reverse-ETL Just Another Data Pipeline? With Census, Hightouch, & Workato
Episode Date: March 23, 2022Highlights from this week’s conversation include:Panel introductions (2:23)What is driving the trend behind Reverse ETL? (5:24)The obstacles to building an internal Reverse ETL tool at scale (15:34)...How to decide system management vs. user flexibility (20:14)Why previous products failed in creating this category (29:12)Increased demand and democratization of datastack skills via SaaS (42:03)Broader applications for Reverse ETL (47:29)Limitations of Reverse ETL (55:05)How user technical ability affects design and build roadmaps (58:14)What do you anticipate comes next for Reverse ETL? (1:02:45)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack, one platform for all your customer data pipelines.
Learn more at rudderstack.com.
And don't forget, we're hiring for all your customer data pipelines. Learn more at rudderstack.com. And don't forget,
we're hiring for all sorts of roles. Welcome to the Data Stack Show. This episode you're
about to hear is actually, it was originally recorded as a live stream, and we collected
some of the top minds working in the reverse ETL space. And we just wanted to pick their brains about this technology that,
you know, has probably been built internally for a long time by companies, but is now being
turned into SaaS and is doing some interesting things. Costas, I'm really interested to ask this
panel about some of the technical challenges of building these things at scale. A lot of times,
you know, I think if these were internal builds, you know,
maybe like a one-to-one connection, you're sort of dealing with you know,
a pretty simple pipeline, but doing this at scale across integrations is hard.
So I want to hear about what technical challenges they're dealing with.
How about you?
Yeah.
I want to ask them, when are we doing finally get like a proper
name for this technology?
Yeah.
This reverse ETL thing needs to stop.
Yes.
It's still wrong.
So, yeah, I'll try to see what they're thinking about that
and what's the timeline to get a better name.
Great. Let's dig in.
That's a marketing problem, so we're going into pretty uncharted territory.
But I love it.
All right, let's dig in.
Let's do it.
Welcome to the second Data Stack Show live stream. This is super fun. We did this once before,
and we like to collect some of the best minds in the industry around certain topics and just pick everyone's brains. And the topic for this live stream is reverse ETL, which is kind of a new term
in the industry, but actually something that people have been doing for a while,
which we'll talk about. And we have some people who I'm just so excited to have on the show,
names that I've followed for a long time personally. I know Costas has as well.
So let's just do some quick intros. Tejas, do you want to start off and give a quick intro?
Yeah, sure.
So hey, everyone, I'm Tejas, one of the founders of Hightech.
We're one of the players in the reverse ETL space and data activation, basically helping
companies take data from the data warehouse and use it across all the operational processes
and SaaS processes in their business.
Before founding Hightech, I was actually an early engineer in segment.
So my experience sort of in the data vendor space dates back to like seven, eight years ago before
terms like CDP and stuff like that existed and kind of saw the rise of cloud data warehouses
there and realized that there was an opportunity to bridge some of the challenges we were solving
at Segment and what was happening in the data warehousing space with companies building a
source of truth in the warehouse. So super excited to be on the show today. Obviously follow all the companies
in here super closely and excited to have a live coffee chat. Great. It's going to be great. All
right, Boris, you're next in the window on Zoom. So take it away. Cool. Hey, I'm Boris. I'm the
founder of the company called Census. We started building what we now call reverse ETL back in
2018 when there was no name for this.
And we've always wanted to help companies get the most out of their data.
And a lot of it tends to be locked away in analytics and warehouses, which is what we were trying to solve.
So get that in the hands of salespeople, marketing people, support people, finance people, all those kinds of things.
And data pipelines are the way to do that.
So yeah, before that, I those kinds of things. And data pipelines are the way to do that. So
yeah, before that I've always been a tool builder. I used to work at Microsoft and before that, between census and Microsoft, I started another company before this, that was
kind of tangentially related, called Meldium. Very cool. All right. Triti.
Hi everyone. I go Triti by name, but I lead the product lead group team at Workato.
If you're unfamiliar with Workato,
it's an enterprise automation platform.
We have been in the business for over eight years.
It's an enterprise automation platform.
The customers, we have over 7,000 customers that we use just for
automating various business processes
by connecting your cloud on-prem stack.
But a very interesting pattern in that happens with the reverse ETL.
As a matter of fact, we released a warp automation index report last year.
And in addition to all the traditional processes like audit to cache,
employee onboarding, procure to pay, report to the boat,
lead management and others, reverse ETL was trending out. It was in the top 10.
And reverse ETL means very many things to very many people. I was very excited to join this forum
and with the previous process and Boris to learn more and also learn more from the questions that we get from the audience.
Thanks for adding me, Eric.
Yeah, of course.
So I'll start it off with a question.
Triti, you brought this up when we were prepping for the episode, but I'd love to hear from each of you.
What has been driving the trend behind what we call reverse ETL. And maybe we'll get to whether
reverse ETL is the proper term for it, you know, because, you know, we've had some conversations
with Boris about whether that encompasses, you know, sort of the spectrum of problems that this
technology solves. But, you know, we sort of have like the event stream, you know, technology solved
the behavioral data issue and sort of syndicated that to the stack.
ETL allowed you to collect all the data from all the disparate parts of the stack.
And really, a lot of those drove one-off triggers or the main use case was BI. We're just trying to
understand how users are behaving and how the business is performing. And so why don't we start off by like, what are you seeing?
Like what's pulling reverse ETL technology from your product teams
in terms of use cases on the ground?
So why don't we just, let's just go in the order that we did intros with.
So Tejas, what are you seeing?
Cool. Yeah.
So I would say across use cases, honestly, it's pretty exciting
because we're seeing use cases across pretty much all business teams in an organization and far more use cases in terms of breadth than we imagined when we actually founded the product. And it turns out sales and marketing and go-to-market is still probably about like 70% of our use cases in the market. But we're also serving finance teams who need
rich data in their ERP systems to close out the books faster and not pass around CSVs across the
organization or product teams that need information from your analytics stack to be able to power
certain personalized customer experiences inside of their applications. But overall, I would say the most exciting part about reverse CTL and data
activation as a whole, when I think about the category, is that we're oftentimes not just
replacing scripts written by engineers or automation built by engineers, but we're actually
unlocking brand new business use cases, brand new value, and brand new growth and revenue
opportunities for companies using the wealth of data that they had already in their data warehouse.
And that's really what I think has caught the attention of the market and excited companies
to jump right in and see what can they do with the resources and data they already have
to drive growth.
Great.
Boris?
Yeah, I think the breadth of scenarios has always been the kind of most exciting thing here.
You know, when we envision the platform, we kind of thought about it as something very horizontal.
You know, I tend to think about the fact that the way people wire data together shouldn't be piecemeal.
And they should think about where can they centralize as much
data as possible and get a source of truth and then federate that to as many kind of ends of
the organization as possible. And to me, that's the story of, that's actually the goal of SaaS
going back 20 years, which is to empower every individual in a company. And so whether that's
finance or sales, you want the right data.
You want data you can trust.
And you want that in the operational tool where you do your work, right?
Rather than having to open up five tabs.
And so this idea, to what I've seen over the last few years working on this is that analytics,
by virtue of a lot of other kind of trends and behaviors on the data team,
has become host to the best data in the company, right? The most complete data, the most
trustworthy data. It's the data that, I mean, ultimately you're going to use to report to
Wall Street to some degree, right? And so like has the most level of scrutiny probably exactly exactly and so the the the
ability to operationalize that data right to take that data and make it kind of available to every
part of the company has been super exciting and continues to grow and so funny enough we didn't
start with kind of like marketing bent back in 2018 we actually started with product like growth
and just kind of thinking about software.
Yeah, when you have software as your core kind of asset,
the way you take it to the market is just different, right?
And I don't know, it was personal frustration back then about salespeople not knowing what users are doing in the product.
And I think, funny enough, I think Segment had done a great job
of connecting marketers to the engineering side, but sales was left behind.
So our early scenarios were all on the sales side.
And then that has since expanded.
That's so interesting.
Yeah, and that has since expanded to literally, I don't know, you can't even, I don't know if I could summarize it in any kind of set, right?
It's like from support to finance, product to marketing, it's like everyone kind of
wants to depend on this data. Interesting. And data organizations want to get more out of the
asset that they've invested in, right? And so that to me is the exciting story. I'm a tool builder,
right? And so you're trying to make someone else a more amazing version of themselves. And data
teams have a lot to offer. And it was locked away in charts.
And, you know, the idea was like,
let's get this into operational tools.
Very cool.
All right, Triti.
I mean, like what Tejas and Boris summarized
catches a lot.
I'll just say,
maybe add to that in a different way.
It's essentially the trends we see
like over the last several decades,
like ETL had been a way
to collect data
from various sources
like business application,
business systems,
and move it into a single repository
to open boards
and create a source of trim
that you can rely on, right?
And there's always been,
like if you ask any company,
any individual,
right? How do you make decisions? So we are data driven and the nature of data driven or the way
to solve data being data driven is, was always, has always been around to a great degree still is
like put a tool like Tadnall or Glyperin, a copy of copy of this phenomenal repository of data, and run visualizations-nodes
repos. And that makes it data-driven, and nothing could be further from the truth.
Just looking at the data, everyone has their own interpretation of what that is.
And what's changing now is people want access to that data without having to go to, you know,
from the tool that other already used, whether it be Salesforce, Marketo, whether
it be Pender, for like product analytics, and how to expand on the file.
And so one is that people want access to insights from where they are working
rather than having to read them and go to another tool to download those reports.
So that's one.
The second trend that we see is access to information more in real time, rather
than a media report or, you know, digital reports and such, as things happen when
a customer turns towards changes, people want to take action, like the CSM wants
to reach out to them and say, Hey, what's happening? You know, if there's a drop in activity, the AEs want to reach out, you know, the GTM teams want to reach out.
If there's a change in upsell of process goals, you want to trigger off-campus.
So those kind of automations being in real time, mainly get more event-driven.
And that's driving some of these patterns around how you become truly data driven rather than just looking
at visualization right so those are those are some and like you know and if you see apply those
trends like one business function would not want to act on these things in real time it's not just
for you know the gtm teams it's also finance and also the other also the third most important, if not the most, maybe one of the other important ones,
is the innovation of a data warehouse or a data lake should be at the same level as any other business application.
It's no longer the black box.
Oh, sure.
Yeah.
That you need to put the prism on top of it.
You can't like what it looks like. It's been, and it's been elevated to another business application as important
or some clarity more important than a CRM, right?
Uh, so that is the other trend.
So how do you make that data accessible in real time across all business functions
to make them truly data driven than rely on like what has been traditionally
business intelligence and that's what's driving these trends, uh, from what has been traditionally business intelligence.
And that's what's driving these trends from what we see with our customers.
Yeah.
That's so interesting.
I mean, data-driven is such a loaded term, right? It's almost become hollow because it's used so much in marketing terminology.
For decades now, right?
For decades.
It probably goes back to the information superhighway. Right. you know, marketing terminology and for decades now, right? For decades, right?
It probably goes back to the information superhighway.
Right. And I like, I mean, maybe this isn't okay,
but I'm going to take a little bit of a dig at like the big consultancies, right?
Because it's like digital transformation, you know, it's like, man,
the billions of dollars that people have made, like just trying to
connect some pipes to like help companies become more data-driven.
Yeah. But digital transformation is such a catch-all right it is a great job and data driven is just as much of a counter
that's true i suppose you're right but digital just like the new digital even bigger like that
one feels even more all-encompassing because it's like it it means computers right it just means
like with computers like i think that that that yeah i think that could be sheets of paper.
So I guess it could be even bigger, I suppose.
No, yeah, I agree.
I think maybe, I mean,
I think at least one big underpinning
of digital transformation
is sort of the move from on-prem to cloud,
you know, which is certainly non-trivial,
especially in the enterprise.
But actually on that note,
what I'd love to do here is
I'd love to get a little bit technical.
So, you know, I remember when, you know, there were companies sending data out of Redshift into
like SaaS applications, right? Like a good while ago, the idea of sort of getting data
out of a warehouse and into some sort of SaaS application isn't new, right?
And I think we all would agree that like...
I mean, data integration has been going on.
Exactly.
Data integration has been a thing people have been doing for decades.
Sure.
Yeah, yeah.
So it's not like reverse ETL is like someone invented a completely novel way of sending
data from point A to point B, right?
Like it's been happening.
But it's painful, right? And so like we're building SaaS around that, which is super exciting, but there are still a lot of companies struggling with the pain of like trying to get the data out
of the warehouse and into SaaS applications. And what I'd love to know is, because I think a lot
of our listeners are, you know, either data engineers or on data teams who have experienced the pain of trying to build that themselves, experienced the want of not having the budget or the bandwidth to build that themselves, or grew up in an age where the SaaS just wasn't available to make that easy to them, right?
And so that's sort of just painful, right?
We're going to deal with it.
And downstream teams are going to be annoyed. But if anyone's built anything like that, generally, I think it would be ad hoc inside
of a company.
So you sort of have a bespoke pipeline, probably one-to-one or one-to-a-few.
But you're building really robust pipelines that are taking tables or data in the warehouse,
and then you're fanning them out to
like a huge number of tools and you're doing this at scale in a cloud SaaS format. Right. And so
I'm genuinely curious, like, like what are the problems that you're facing trying to do that?
Especially, you know, if anyone's done this, you know, sort of ad hoc or bespoke, like in a company,
like help them understand what does it take to do this at scale?
I mean, there's a bunch of things you have to factor in if you're going to do this yourself.
And I think we're all going to probably talk about some similar things here. But
the first thing you got to deal with is errors, right?
Just like things fail way more than you might predict, right?
So the great fallacy of APIs going back, again, 20, 30 years is like,
oh, they just work.
Nope, they don't just work.
So things fail.
And building in recovery is significantly more difficult,
I would say, than simply writing the code to sync data, right? So that's one.
Two is like scale, right? So size. So dealing with 10 rows is totally different than a thousand,
totally different than a million, different than a billion, right? And people need to sync large
amounts of data. Our users, like our companies have like on the order of 500 plus million users,
right? So you have to be able to do this at scale
and with destinations that don't handle scale
particularly well.
And why does it always work there?
I mean, it depends, right?
Some are really, really good.
Do you know which product is unbelievably good at scale?
Facebook.
Facebook will happily eat like hundreds of millions
of records in like a snap, right?
But Marketo?
Marketo, other end of the spectrum, other end
of the spectrum. I like to joke about Facebook because it's like, you don't think about it,
but it's like, the reason it's so fast is like they already have all the data. So they're just
going, check. They're just going, yep, we know who you're talking about. But anyway.
Other podcasts, another podcast.
Yeah, yeah, yeah. So scale is, you know, how you stitch data incremental, like into a system,
how you do it in the right kind of order with minimizing API usage, all these kinds of things is probably the second thing that if you're going to do this yourself, you have to think about.
Third is probably monitoring all this.
Things break.
Your stuff will break. Now, I think things have improved in our market broadly.
You can use orchestration tools that have good, you know, some alerting
for you, but you have to be monitorable, right?
And that's really not a trivial amount of work.
It's the same reason engineers don't tend to build New Relic or Datadog themselves,
right?
That in itself is expensive.
And so that's a huge part of our software as well, right?
Because you want these things to be alertable, monitorable, et cetera.
And then last I'd say is, I think, I don't know if you all have
seen something different, but like most internal versions of this are, you know, not manageable by
anyone other than the person who wrote it. Whereas the whole point, right? You talked about this,
right? You talked about the democratization of data and analytics and people want to be able
to access these things. And if you're going to build this yourself, are you going to build the UI to make it easily mappable that people can modify these things without having to call you? That is probably all the things you would have to build to do this well yourself. Yep. Love it. Okay. I'm going to slightly modify the question as I pass
it on to Tejas and then Triti. But okay, so here's a slight modification. How do you decide
what you manage and then what you hand off to the user and or where the compromise is there,
so if you think about incremental sinks, are there decisions that you need to make on behalf of the user? Or are there like use cases where you like make that decision for them?
Those are actually fairly challenging when you think about data at scale. So yeah, I would just
love to hear your perspective on that. Yeah, it's a great question. So one thing that I think has
been really powerful about the tools in the reverse deal ecosystem is giving the users a lot of
flexibility, but also a lot of guardrails at the same time. So one thing that we think has been really powerful about the tools in the reverse deal ecosystem is giving the users a lot of flexibility, but also a lot of guardrails
at the same time.
So one thing that we handle out of the box, where it's also tapped on as well, is like
diffing.
So I think typically when companies build a script like this in-house, they'll just
kind of build a loop over the data in the warehouse and call an API to go update it
or upstart it into a destination.
And it remains pretty basic.
And then a challenge comes up, the destination API can only accept data at a certain rate
and you need to only send updated data, but you don't have a clear updated app timestamp
and say your data warehouse or something like that.
So one thing that we've handled out of the box is it's different for our customers where
high touch can actually automatically only send changes
to some of these downstream, to all these downstream destinations
instead of sending all the data every time.
And with Diffing, there's a ton of nuances.
So we support multiple mechanisms for Diffing.
So one that we support is like Diffing inside of the customer's warehouse
where data that's being synced over is actually written back to the warehouse
and joined against in the process of syncing to a downstream. Oh, interesting. Okay. But for not all, not for all data warehouses,
it's the best approach. I mean, like certain databases that our customers connect with,
writing back to it isn't as favorable as a cloud data warehouse, like a Google BigQuery or
Snowflake where storage is separate from compute. So if you're thinking about like a redshift,
this may not be the most favorable approach or even more of a transactional or a production database,
like an elastic search, or even like a production Postgres, you know, that might
not be something that a customer is okay with, so we also do support other
mechanisms of power of a stiffing, like writing the data back to like a
customer's S3 bucket, for example.
And even depending on the data warehouse we use, we support like even more options,
like for example, leveraging timestamp partition keys and something like a Redshift or Google BigQuery to automatically do more intelligent, faster diffing for stuff like event forwarding use cases.
So one thing I would say is like with building reverse ETL platforms, we have a lot of features kind of built out of the box where companies don't have to implement this stuff, but then still allow them to kind of see more and dial in and control how it works if they need to
for their use case. So I think the same defaults with a lot of customizability is a general approach
that we've been taking to building our software and one that companies have really appreciated
versus say other players in the market with like CDPs and whatnot. Yeah, super interesting. Okay,
Triti, you're going to have the last word here, but you can only answer this in two sentences, just because I just have a quick diversion here.
I said I wasn't going to get technical, but of course I'm trying to get more philosophical.
Maybe you want me to try two sentences?
But here's the question though. Does the user care? And the reason I ask that is because
how you build your product-
It depends on who you're talking to, the user.
No, this is going to be a
big rabbit hole but like those are really complicated like things that you're you're
discussing right like diffing across okay like warehouse right like general term general tool
when we talk about diffing like very specific right and like very specific product problems. And genuinely, I'm interested in like,
do your users like care about that? You know, sort of like the tuning question, right? Like,
yeah, you can get software running, but like tuning, it's like a different skill set.
We talked about the various use cases, like who's, you know, like the traditional ETL,
the team was always centralized, right?
So we talked about this discussion around the jurisdiction, like who's owning these, if I can call them reverse ETL pipelines, right?
Like if it's the GTM team that owns it, like, do they really care about these control and the, you know, the extensibility and such?
Probably not as much, right?
But there are other teams, like maybe the product or data engineering teams
that need a lot more control and flexibility.
So the answer is, depending on what this reverse ETL pipeline serves,
the needs will be different.
The personas that we're using in building these are different,
and they will require, like, you know, some places, like the Almond Box,
sync and things work just fine.
Like, you know, it's not just for reverse ETM, like the example of Salesforce, Marketo integrations,
that out-of-box integrations just do fine.
But then there are some cases where you need to dedue.
There are some cases you need to, like, do some lookup of some third-party application,
like, depending on the nature of transformation,
where you need a lot more control, right?
So those are things, and then you need to also create
some reusable proponents that you can apply
and standardize across multiple pipelines.
And in those cases,
you will care about more control and flexibility.
But I just wanted to add,
I think Boris touched upon a few things
that are very important when you ask this question.
What should the product do and what should the user do?
Yeah.
And to, for the era we live in, right,
at least like how we believe the philosophy,
the product has to do more
so the users can get done more.
What does that mean?
That's not just a soundbite.
What does that mean?
One is like when you're looking at like whether it be reverse ETL or any form of data movement,
the number of sources the product can connect to, like tree node connectivity, right?
Both from a source, right? So the product has to take care of those things. On the destination side, Salesforce offers like bulk API
to ingest much faster, right?
We've got 10,000,000 rows.
It does market not as much, next week not as much.
So it's a very, very good product.
And it's a very, very good product.
And it's a very, very good product.
And it's a very, very good product.
And it's a very, very good product.
And it's a very, very good product've been to 10,000,000 rows.
It does market not as much, next week not as much, right?
So the product has to do more to do that buffering and the queuing and size.
So the user doesn't have to worry about those things.
So that's one very important part, the ability to breadth of connectors,
like on both sides, the source system, source databases, and the destination.
The other point that Morris brought up, like with any pipeline, bad things happen.
Errors happen.
And if the product doesn't provide the ability to sell, see, recover, and you
know, the pre-built monitoring tools to troubleshoot, even not troubleshoot, like
auto-correct in some ways.
It puts a lot of burden on the developer, right? And then it requires specialists to come in.
So those are the things the product needs to do more of.
What should the user focus on is more the business logic.
Like what is the outcome that we want to drive, right?
And like, we need to, I need to move these set of reports when for an upsell campaign,
I need to look at this data in this table,
like monitor for upsell forms, 175 and whatnot.
And then, you know, take that list out
and move it into a marketing campaign.
They should focus just on the business logic
and how quickly they can configure.
The second part, the more they're able to,
you know, like business processes
change dynamically every week, every month, the ability to iterate.
So it should not be brittle, right?
It should not be brittle.
The ability to iterate and be agile about it is also something
the product should support.
So I'll put it this way.
So, and all the products that we are like represent here and the
modern ones that are coming up, they have to have parity in terms of experience with what these end
users are using. What I mean by that is it's more configuration driven than click driven
than code driven, right? So like Salesforce marketer, you can do most of the things to
TX rather than have to write any books, But also provided extensibility, very unique thought, you know, whether it be
some Python scripting, some three-agency scripts that you may want to use, you're
able to pull that in so it doesn't put you in a box.
So that, those are things that drive adoption of solutions like these.
Guys, I have a question that is related with something that was mentioned a little bit
earlier, that nothing is like extremely new, right?
Like it's not like the first time that the market out there had like to move data from
point A to point B and even like push the data back to the downstream applications.
But I would like to ask all three of you about like two specific cases of products.
And I will start with Tejas because he's coming from Segment.
So the two products that I want to ask you about, one is Looker Actions
and the other one is Persona, right?
And the reason that I'm focusing on these two is because these two products are not
that, I mean, they were not created that back in the past,
like compared to when you started, right? But why we don't
hear about them, like, in a way why they didn't succeed in
creating the category or leading the category, let's say so,
that's just you first, thenjas, you first. Yeah.
Then I'll ask the rest of the... Cool, yeah, I can kick it off.
So, this is a super time question
because I was actually one of the first engineers
working on segment percenters
with my co-founder and CTO at Hightouch.
So, first, I'll take Looker Actions.
Honestly, Looker Actions had a pretty brilliant idea,
which was, you know, one of the first,
I think one of the first, I think
one of the first offerings to the market that they started evangelizing this idea of reverse
ETLing, which was, we're analyzing stuff in Looker and there should be a way to take these
insights and put it into the other tools that the rest of the business team look at and
not just have the business teams have to look at a Looker dashboard or a Looker report every
single time.
You should be able to use that information more live.
That is really the concept behind reverse ETL today.
I would say there's a couple of reasons that didn't really pan out.
I mean, one, honestly, I would say it's just like resource allocation.
Like if you take a look at the looker action destinations, you just have like a lot of
limitations.
Like I think the Braze destination, for example, can only handle like results of 200 rows.
They don't really do diffing in their infrastructure.
They don't really have much visibility or observability.
The kind of sync mapping interface that customers expect for like a more modern
reverse ETL platform is just like not there in Looker Actions.
So I think really the reason it didn't take off is because activation, like data
activation is just a separate technical problem and a separate technical space than data analytics.
And I don't think that the team working on Looker Actions really treated it as such and invested in building Looker Actions to the same product perfection and degree of thoughtfulness that kind of best of breed solutions have come out to the market with, like high kitchen sentences, for example.
So that's the reason I think Looker actually is going to pan out.
There's also some parts about it, which is that tons of people don't use Looker and want
to tap into data in their data warehouse.
But I actually think even tons of our customers do use Looker.
And the real reason it didn't get very far was just product quality and product design
at the end of the day.
When I think about segment personas, it's actually different.
Segment personas, for anyone who doesn't know, basically it says, okay, you're tracking all this event gate in a segment. It's being forwarded to all
these different downstream tools, but we want to provide marketing teams and growth teams and
teams like this a central place inside of the segment product where all the user data is
aggregated into these profiles that you can then build upon in a WYSIWYG way. So add some
computed trades like number of orders in the last month to one or LTV,
and then also build audiences on top of these profiles
and sync them out to different tools.
So really, if you think about it,
Segment Personas was almost building its own source of truth
off segment data within the segment products.
And I think what the market has really realized
is that the source of truth is not going to be in any sort of proprietary vendor or any sort of SaaS
application or follow any sort of spec of what a user should look like in segment or what an event
should look like or what a shopping cart should look like. It's going to be in the data warehouse
where companies are able to get all the data into it via numerous different ETL vendors,
where there's a standard that all
software is kind of integrating on top of, transform it freely using software like DBT,
for example, in the ELT stack. And then once they know what a customer 360 view kind of looks like
in the data warehouse, sync it out to all the different downstream destinations. So honestly,
I would say the reason Segment Personas primarily didn't pan out, I would say is just because it was
built on the wrong source of truth, right?
It was built directly on top of segment as a source of truth with the warehouse as kind of like a side afterthought.
Whereas what I think has really become clear in the last five to seven years is that companies want to use the data warehouse as a source of truth.
That's where all the data will be.
And that's where the best data will be, kind of as Boris mentioned earlier.
And that's really the trend that reverse ETL and data activation is riding on.
That's very interesting.
And actually based on your experience at Segment, because this is something that
like I've been thinking like from time to time, do you think that the way that
personas like were implemented based on the, this single source of truth that
was like the Segment itself was also like a result of, let's say, timing, like when segment
actually started as a company.
David Pérez- Entirely.
Yeah.
I say this time and time again.
So I think, you know, the approach that CDP solutions like segment took, you
know, back when I worked there or seven, eight years ago when the solutions were
started to be designed, was not wrong for the time.
If you looked at data warehouse usage at the time, I mean, companies like Snowflake just
add less than 100 customers when I joined Segment, honestly.
And unless you were in the enterprise, you weren't really heavily using the data warehouse,
BI culture and solutions like Looker were just popping up.
If you went to a company and said, hey, we're building reverse ETL, we're going to allow
you to take data from your data warehouse and feed it all into these SaaS tools to solve problem A you have on marketing or problem B you have in sales.
Technically, that works.
The software would work just as well then as it did today in a lot of sense as a technical
solution.
But when you think of the fit, like the product market fit for companies, they just didn't
have the data in the warehouse in the first place.
They weren't building the kind of models of what it means to be a customer.
How much are they paying us?
Are they a high value or low value user?
Just, you know, the, all the prerequisite steps weren't done yet.
So it just didn't make sense for that to be the way that companies solve data
activation problems all the way back five, six, seven years ago.
So I don't think the way CDPs approach the problem was incorrect at all.
I think it's just a different approach for a different time. And now that companies have made this massive investment in data
warehousing and the modern data stack, everyone's looking for how can I drive more value from it?
How can I use all the data I have and all the data models I've built to drive growth? And
reverse ETL and data activation is really the answer to that that makes sense for businesses
at this time. I could not agree more with that.
Like most of these things end up with good decisions in their context, right?
Even I would say, since you talked about nothing is new under the sun, right?
Long before any of those products, like people were integrating data and it made sense to
do it, you know, from A to B without a warehouse.
Like it wouldn't have, you would have been an incorrect decision to kind of design with
a warehouse bias, right?
Like we did something in 2018 that was like almost weird for its time, which is like we
put all of our products capabilities inside the warehouse, right?
Which was unheard of for a SaaS product at the time.
So it's like, you can cut the cord of census.
All our data is actually sitting in your warehouse.
Because I felt like, you know, there cord of census and all our data is actually sitting in your warehouse. Because I felt like,
you know, there's a secular trend
towards owning your data,
which Tejas kind of mentioned.
I think those are much larger trends
than even just a data stack trend, right?
Yep.
You're from Greece.
Like, Europe has led the way,
but there's a general trend
towards owning your data,
making sure it's not locked away
in a proprietary platform, right?
And the data warehouses have just been
this perfect piece of infrastructure for that.
And then if you think about,
I tend to think about the humans involved a lot,
as well as just the tech, right?
I know it's weird for technologists,
but Segment was a brilliant bridge
between engineering and marketing.
Right, Tejas?
Would you say that?
I would say that's accurate.
Like product engineering in particular is the big differentiation.
Right, right.
And when we started, we were not trying to be a bridge between product engineering and
marketing.
We're trying to be a bridge between the data team writ large.
And we started in sales, but eventually all teams.
But it was really about putting the data team at the center, right?
And, you know, Looker, of course, cared about the data team, obviously, but it cared primarily about this like batch analysis, you know, explore some reports about what happened last quarter. And this idea of taking the data team
and making them a central pillar of the company, that they're operationalizing their work, that
they are driving in the truest sense, Eric, right? Like driving the business, that is a different
relationship. And if you had tried to build that relationship 10 years ago or 70 years ago,
the data team was too small, didn't
have enough tools, wouldn't have had the buy-in from the C-suite to own this part of the company.
Well, and the data wasn't actually centralized, really.
Yeah, but all these things build on each other, right? But I think it's not just the data
centralized now, it's that data teams and I think CEOs around the world are realizing,
like, I need to give this team more influence in my company because good things happen when I do
that, right? So you needed a new bridge between them and everybody else, right? So that's kind
of like why we have, you know, kind of, we talk a lot about the word analytics because it's like,
that's kind of the lingua franca of data teams is the word analytics. And it's like,
let's operationalize that, right?
Yeah, I think outside of the core data team as well, it's just that, you know, the data enabled personas in organizations just have a much more powerful tool set than they did
10 years ago.
Like, obviously, you know, marketing operations analysts, marketing analysts, sales analysts,
those roles existed 10 years ago as well.
But if you look at the tools they were using, they were using Google Analytics, Omniture, Google Excel, tools like that, Salesforce reports.
They didn't have the power of the data warehouse.
They weren't leveraging BI.
They didn't have knowledge or even access to SQL queries, access to a changed to it to to say like you know it's a lot easier for for
any business user to find someone nearby them like sits nearby them in the office that i can write
sql or that can use a bi tool then someone who can code and and that wasn't really true 10 years
ago i'd say costa i was i was talking to someone the other day how many people do you think have
the title have sql in skills, but no other programming languages?
On LinkedIn?
That's a great question.
That's a good question.
I would assume.
You have to listen.
I'm going to tell you, but if you listen to my next published podcast, you will discover it in that. Like as a percentage or like?
No, no, no.
Just number of humans.
Number of humans on LinkedIn who state SQL as a skill, but not what the rest of us here would probably call a program.
A number of humans.
This is great.
This is like wits and wagers.
Wits and wagers is a fun game.
It's a great game.
Well, the question was for you, Costas.
I'm trying to answer it.
Yeah, I don't know.
I would assume that.
Just give us a number.
That's way more fun if you try to give a number.
Give a number?
I don't know. Like, I mean, a number.
You're failing the interview. At least give a number.
I just sent two of all LinkedIn users. So if LinkedIn has like 700 million users, like 14 million or so.
Whoa, nice. That's high.
That's high. I like that. I think that's high too. I was going to say like...
I don't know anything though.
Two to four million was my guess.
But that's, I think, Triti, your math...
I was thinking six figures, to be honest.
So, I think I would have guessed the same as you,
but it's on the order of five million.
Wow.
That's great.
Pretty great, right?
Great for us.
Great for us.
I'm not the guy, but I stand by it. I'll just back to your question. The need for like, you know, like
it's like the, this pattern has existed long before it started getting branded as reverse
EGL. The difference is how it has been fulfilled in the past, right? It has been fulfilled
with CSG exposed, right? It has been fulfilled with CSG exposed, right?
It has been fulfilled with CSG exposed,
and things like that, right?
And who was able to do
that? Who was able to do that in the past
is like these centralized
data teams or people who are very
incompetent with databases
and such, right?
Not only for the reason
of knowing SQL,
just from a compliance and security standpoint, you didn't have access to these systems of report, right? So what has changed,
like, and you know, the segment, the looker, how to categorize them in any product that doesn't
have anything with data other than like visualization. But segment, it was a good example.
But what has changed is the demand for these requests.
It's coming from, you know, we already talked about what set of use cases.
Like if segment solve for like 5% or 10% of use cases for GTM teams,
there's a whole large number of unsolved cases that gets unmet by any tool.
And that's why these products exist, right?
So there's a need for ownership of these processes outside of the data team as well.
And Boris and if you can speak to this, like your buying centers will be different
from the traditional data teams, right?
The traditional, I mean,
set to some of the traditional ETL products. Yeah, yeah. I think your point, both of you
are saying like, there's this democratization occurring, right? Of skill, of skill. And
listen, you talk about buying, right, Trudy? I think the journey of SaaS for 20 years now is this empowerment of individuals and teams that are not...
You're talking about data teams. It used to be that all your software was bought by the CIO,
right? Period. And deployed by your CIO and in the office, in a physical office somewhere
from that, right? And people used to call it shadow IT and all these things. But broadly
speaking, it's about having more choice, more autonomy,
and different sets of teams being able to make decisions
about what tools they want to use.
And I, you know,
this is where, you know,
like I've been at this
for technically a decade,
if you factor in my previous company,
which was all about
kind of democratizing access to SaaS.
I think this is the journey
we're still on as an industry
is letting individuals and teams make decisions about software and using it to the best of their ability.
In other words, with the best data from the trusted source, right?
But our job has to be to create the right, you know, let's call it guardrails and availability of that data, not to prevent individual teams, whether that's a sales team or a content marketing team, doesn't
matter, to make choices about what tools they want to use, right? And, you know, the analogy I like
to use about this, now I'm going to really frame myself as a child of the 80s, but like,
video games used to work like this too. So in the eighties, like video games were not purchased by the children who played them.
They were selected and purchased by parents.
And,
and therefore they were marketed to parents,
things that people don't remember this,
but they were marketed to like mom and dad as like safe,
fun games.
And that all changed in the nineties and into the two thousands where we
now,
you know,
have, you know, more violent games, more sports like games that that are more for, let's say, the user. But the reason we
could do that, there are these necessary pieces that had to come into existence, like ESRB
ratings, and app stores, and controls from the game makers so that you couldn't just install
whatever game on your console.
And so that is where the building blocks.
And so SaaS is, to me, a similar journey
just for the worker in the IT world.
And so, yeah, Triti, to answer your question,
like, yeah, the buyer's not going to be
this centralized, massive team.
The data team just has to have the right visibility,
observability in our platforms
so that they can let everybody else kind of select and do what they want.
That's kind of how I'm doing it.
And, you know, building on that analogy, you know, with those, like, even though the kid purchases, it's still a preference of some supervision, right?
Exactly.
Exactly.
Well, thank trust.
It's governance, yeah.
Exactly.
It's trust but verify.
So governance will, I think we're at the early days of that, but I think, yeah, that's going to become key to all of our platforms is to make sure there's
reasonable governance. Yeah. And I think something really interesting here is something that Triti
actually brought up earlier, which is that when you said, what does the product have to do versus
what does the user have to do? I think about it like a little differently, like almost an extension
of that, where the product is also the infrastructure that the company is building, right?
So it's not just what is the highest that your product has to do, but it's what does
the data warehouse have to do?
What is that upstream versus what does the user have to do?
And I think that balance, like striking that balance in the application is like, you know,
the winning formula to enabling business teams to be able to leverage this data.
So as much as possible, if reverse ETL and data activation platforms can,
you know, tap into tools like in the observability space or, you know, leverage models from the
transformation space or do a lot of things outside of their product that taps into the
overall infrastructure that kind of the technical teams, the data teams are putting forth in an
organization, then that makes it a lot easier for business teams to come in and solve these cases in a self-service capacity without actually building more product features in the
reverse ETL tools itself. So I think that's a really interesting trend that we're seeing.
One thing is with the CDP players, everything was kind of in a proprietary ecosystem where
let's say you wanted a data transformation feature. CDP had to build a data transformation
feature in it. Let's say you wanted observability on data ingestion. CDP had to build a data transformation feature in it. Let's say you wanted observability on data ingestion.
CDP had to build observability into its platform.
With reverse ETL or the data warehouse sort of first approach,
these can be solved by the ecosystem of players
that all interop and build on top of the data warehouse
instead of necessarily one vendor.
And a lot of these problems that could be solved by the product
can now be solved by the kind of technical infrastructure
and analytics infrastructure that a company has in place, which I think is just super powerful.
So business users don't have to think about any of that stuff.
You just have a question on that.
You know, you touched upon this and it seems to be something that reverse ETL somehow is
to be tied to the data warehouse, the data lake as a store.
Sure.
I think it's broader than that, that it can go beyond like any
centralized or founded career data.
Where do you, you know, and so, okay.
I just want you to talk some more to sleep.
Now, I'm very afflicted.
Uh, the cost of your thoughts may, uh, when we said reverse ETL, like, yeah,
because ETL has been traditionally tied with the data warehouse, it may, uh,
indicate that every mostly GL always there has to have the data warehouse.
It can be an NVM, right?
It's like a customer data hub or a data hub as a response.
Yeah.
Yeah.
And I think all of our products support lots of different sources, right?
But I think the goal is,
I think if as an industry,
we end up with a variety of sources
and a variety of destinations
and no central cleaning and deduplication
and kind of unification in the core somewhere,
we're going to make great companies
who make lots of money,
but we will not actually have moved the industry forward. And I think this is, to me, where we need to land in the end,
right? Is that you have, remember what I said at the beginning about the goal is to have data,
the best data, data you can trust, right? And the tools that you want to use. And I think you
should be able to use any tool you want, but the data you can trust is key.
And if you don't have some amount of centralization
somewhere in the company,
then this, I don't know how to make that happen.
Like to me, you get trust through central,
some centralization and some federation, right?
That's just how, that's always,
that's why our product is called Census, by the way.
Like it's because exactly that was the intent.
Boris, completely agree.
But the data you can trust from a business analytics standpoint,
maybe data warehouse, but for, for example, MDN,
or customer data, product data, right?
That may be the system of truth.
It's not the data warehouse.
The same goes for...
I hear you.
I hear you.
I hear you.
I think
every SaaS product
I've ever interacted with,
and I think Tej is smiling
because he's had the same reaction,
is like
every SaaS product
I've ever interacted with
in some form on their website
says something about
the system of record for X.
For X. Take your pick. I think Drift once says something about the system of record for X. For X.
Take your pick.
I think Drift once said, we're the system of record for chats or something like that.
And I was like, what?
I don't even know what that means.
I swear.
I think it said something like that once.
And so I think, Triti, I am totally on board with using a source that your company has fully bought into.
Like, this is the truth, right?
Then it's great. Then it's great.
Then it's great.
But in my experience,
the reason people tend to gravitate to the warehouse
and why we made early on a pretty hard decision
to like bias towards these kinds of platforms,
not to the exclusion of others,
but to like as our primary bias
is that they have infinite storage
and infinite join capability, right?
And like that, to a digital point,
you can use the ecosystem for that.
You're not tied to a single vendor
making sure that it supports every source, right?
And so I think that if you can get that
out of something else, then great.
Like, you know, we'll support that as a source too, right?
But that to me is the important part
is that you can join all data somewhere that matters.
I agree with that fully.
But I would also add that I think the even more important part is that you have data at rest somewhere in an organization that your business team simply aren't using.
I think data that you can trust is definitely a huge part of the value of these things.
But the biggest thing is that before solutions like Hithitch or before like data activation,
before reverse ETL,
people just weren't using the data at all, right?
There's so much, you know,
such a wealth of data
and a lot of the companies we work with,
it wasn't.
I think that,
but the central premise, right,
Tejas, between how we approach this is like,
you're not connecting Zendesk to Salesforce right now.
For sure.
Neither are we.
And so I think,
and Triti, like,
I think there's tons of data in Zendesk
that can go into Salesforce and it does. And I think that's great, but potentially keeps you
away from coalescing on something that is more trustworthy. I don't think that's wrong for some
use cases. Correct. Agreed. True tangents. Of course. Of course. That goes without saying.
That goes without saying. Yeah. We were actually talking about that.
Like a pipeline that doesn't make sense for someone to build or really for like...
Like at all? Well, no, no, no. I mean,
just like an example, it's like, okay, you have leads in Salesforce and you want to sync those
to Google Ads because there's certain data, right? And it's like, okay, well, no one
wants to manage that pipeline. Great. Like Google and Salesforce built it. So you can just reverse
all the data points in there and then great. Right. So there are like point to point connections
where it's like, this is awesome because like no one has to manage this. These two enterprise
companies like built in integration. And this is awesome.
Like great.
Like connect the tools
and the data teams like
and the actual operational teams
don't have to deal with it.
And like that is very convenient.
And if every app,
if every app on earth
was perfectly connected
to every other app on earth.
Sure.
Right.
We're talking about Salesforce
and Google Ads, right?
Like, yeah, they should.
But those are,
but Salesforce and Google Ads
is great.
What happens with Facebook?
Well, but even, even if we only focus on Salesforce and Google Ads,
that integration has all sorts of limitations.
It can only sync, I think, the last 90 days.
Like, they all have limitations, right?
Yeah, yeah.
Agreed.
It pitches this point from way earlier.
Like, do you think the staff, senior staff engineers at Salesforce
are working on that problem?
I don't think so.
Right, but it is, like, totally a cost benefit for the data teams working inside a company where it's
like, great, we're just going to offload that, right? Like we can accept the limitations.
I mean, our goal as software vendors and data integration should be able to make it
as easy to do that as you can do it in the Salesforce UI, but perfectly on top of the
data warehouse. I really think that's possible. Absolutely. Totally agree. Yeah. And there's also a matter of like expressivity, to be honest.
Like you can move data from something like Zendesk to Salesforce, right?
Like you can do it with Zapier.
You can do it with Zendesk natively.
You don't even need it.
Yeah, yeah, yeah.
Yeah, exactly.
But the whole point of like working with data is like how we can take whatever data points that we have
that they store probably in almost, let's say,
an infinite amount of like implicit information in there
and make it so we can push it and use it somehow.
And to do that, processing environment, right?
And these processing environments,
like humanity so far has decided that's going to be like a database system. Like they are
built for the region, right? Yeah. Separation of concerns.
Yep. So unless the things that we have to do are like super trivial, like, okay, someone
signed up. Okay. Let's send this somewhere. Okay, fine.
But anything more than that,
that requires some kind of like business logic
to be built there and be executed on top of the data
in order to derive something,
it needs to happen somewhere.
And that's not the pipeline that will do that, right?
That's a great point.
I mean, I guess once the context is in point to point,
the context is decided for you, right?
Yeah, well, it's context, but also decided for you. Right. Like, right.
Yeah. Well, it's context, but also like, I think we all know history here, right?
Once upon a time, you had to put the logic into the pipe because of literal computing
constraints.
Like, like going back to, you know, we, we actually had limited ability to, to, to move
all the data around.
So luckily we now live, that's a genuine shift technologically, right?
Like now we no longer have to pre,
like compute on the fly or as we move, right?
So that is one thing where we can clearly show
before and after where compute costs went sufficiently down
that we could just store everything
and then compute after.
But you're right, Ghost.
Eventually you're going to need to compute in some form.
People might not realize they're computing.
I found that people who use Excel don't realize that they're programmers,
when in reality, Excel is the world's most popular functional language by far.
Sure.
You know, for all the Haskell developers out there, like actually Excel.
Those are, that's a good one.
Yeah.
I saw this, you know, the Q point.
I just want to be very clear about this.
And I wish you mentioned Workato and sort of Zip.
That would be nice, but I'll get to the point.
It's like the Zend
and the Salesforce integration.
But the example
that it brought about,
like let's say,
you know, I go to get
the Git system
and register as a user,
as a lead, right?
And there is a,
like the reverse ETL
is not a patch
all for everything.
The lead getting to it
as the RMA in real time,
it's a very different flow that requires
some integration and automation, which may not even patch like any data warehouse, right? It
needs to happen in real time because somebody in the army to respond to me in less than five
minutes, right? We hang out with our production database. We actually do that with a replica of
our... Yeah, yeah, yeah. I'm just saying that's where, you know, they're different nature.
The other part is like, once you've connected all these leads and say you wanted to do a
re-attribution and pay no stats, right?
And that data is in the data warehouse and say, hey, let's reach out to these leads that
were interested in some point in time, but it never went somewhere.
And you need to move that data into, you know, like mobs tool kind of thing, that's when reverse ETL comes in.
So there's a place for both.
There's a place for both.
And again, the tool of choice will depend on, you know,
what the user is trying to solve for.
But there's a place for both
why Glendestine's post-integration need to happen,
regardless of whether it's either Glendestine's post
as the most reliable source of customer data
or not.
That's a great point.
We're close to the buzzer here, so
we didn't get to talk about
a number of things that I would
have loved to, but we need to do a Q&A
and wrap it up here.
We'll just do a couple of quick questions here.
The first one, which
is super interesting,
is there was discussion around,
I'll give a little context here,
or I'll read some context into the question.
There was discussion around the change in technology, right?
So products like personas were built
before the warehouse had sort of come of age,
as it were, right?
So the question is,
the technical ability of roles it were, right? So the question is the technical ability
of roles has changed, right?
So a marketer, you know, 10 years ago
was far less technical
or most of them were far less technical than today, right?
And even salespeople, right?
And sort of the appetite for like
different interesting types of data
that help them do their job.
How does that influence the way you're building your product?
Not only has the tech changed, but the users, marketers are very data-centric.
Salespeople are becoming more data-centric.
How's that influencing the way that you're building your products?
Yeah, it's a tremendous responsibility.
No, I mean that.
We get to see and foster basically a, you know, an upgrade in skill. And to me, you know, I think a lot about, you know, when you don't learn just computer science by learning, you know, like how to, how to write a, a get commit, right? Like you, there's theory related to that. And I find that the, the, I've always framed census to our users as not a data pipeline, but more of a data deployment
tool, right? Where I'm trying to teach you certain aspects of software engineering without calling it
that. And so to your point, marketers, people on data teams are all becoming dramatically more
savvy. DBT has led the way in terms of teaching people how to check in their SQL models,
like that's, people think that that's like, no big deal, but it's actually huge, right? And we're at
the infancy of that we're at point, you know, of those 5 million LinkedIn SQL people, we're probably
at a teeny, teeny tiny fraction who know about version control, right? So, so I think of it as
it's a super exciting, and it's like, it's kind of a responsibility. I feel like we're teachers,
as well as like engaging
with them on these these ways so so there's a lot of you know kind of integration like we we
long ago integrated with like you know the air flows and and prefects of the world right so that
you know we can so that we can help marketers or analysts or data teams who want to plug into their
you know kind of modern infrastructure can't right so So yeah, I think it's, it's totally informs how we think about our cool
philosophy of everything.
I was going to say Triti and then and then Tejas, and then we'll
do one more question to end it out.
Yeah, I'll just add it in.
And it's like what Boris said, it's a tremendous responsibility, but it's
like for like from a work out standpoint, it has been always like recognizing that
fact that people are changing.
It's not written their skills and their needs are not static, right?
It's changing and they want to do work.
But at the same time, you know, taking the technology, uh, barriers, the skills,
uh, the friction to learn and are not out of the way and making them successful
faster, right, so that's what we focus on.
And then also the second part is not put them in a box.
Like if they want to do more, the platform should give them
to the ability to do more, right?
So it's a balance.
It's a very hard balance to thread.
But that's it's an important one that the role of empowering more people to do
things and also providing the right controls so they do it responsibly.
Yeah, totally agree with everything that's been said.
I think there's kind of a balancing act between two chains of thoughts in our product organization
and we're pushing on both axes and then they both balance each other out.
One of them is like, how do we empower more people in an organization to actually perform
reverse ETL?
So, you know, a decade ago, engineers are the only ones building the scripts
to move data from the data warehouse or people trade in like a MuleSoft or something super
technical to move data from the data warehouse into all these different systems. Now, you know,
we're allowing data analysts to you. Next, we're going to want marketing ops to you. Next, we might
even want some marketers on the team, depending on the technical level to be able to do so. And
on that train of thought, we ship new features like audiences that allow marketers
to come in and kind of build segments on top of the data warehouse and sync those out to
different tools, basically performing reverse ETL without necessarily knowing all the ins
and outs of SQL.
On the other hand, we also, the other kind of train of thought that we're pushing that
balances the first one out of empowering business users in our product organization is really this philosophy of like taking all the principles and all the tribal
knowledge that software engineers have and the processes that they have. So version control,
you know, observability, visibility, staging environments, kind of pushing the staging before
production. And our goal at Hightech is to think, okay, what would the best, if the best software
engineering team ever was to build like, you know, would the best, if the best software engineering team ever
was to build like a script from, or a platform for moving data from the data warehouse into
something like Salesforce, what would they build? And they'd have all of those things as a part of
their 12 principles of deployment or whatever it is. And how do we make all those aspects of a
really strong data pipeline available to kind of the less technical users, whether it's data analyst, marketing ops, marketer,
without them having to know all about it.
And I think if you look at our application product features like GitSync,
so the ability to be just using the Hydra product as usual,
then everything you're doing, all the configuration to be
bidirectionally synced with like a GitHub repo
is a really powerful step in that direction
where you don't have to understand it all to start, but now you can start seeing the commits you're making
and then you need to make a bulk change. You know, you can do that in code as well,
but you can also just use the application as is. So, all right, last question guys. And hopefully
like I can make you promise that we will do that again in the future because we have like
more stuff to chat about and we need more time.
But I'd like to hear from like all of you, like one thing that you are anticipating like
to come in this category and that makes you like really, really excited.
So let's start with Boris.
It's coming that I'm really excited.
I mean, there's so much.
So I'm assuming this is not a, this is not a,
like, we'll talk more about like what's happening in our ecosystem rather than just in our,
in our products. But I think the, the warehouses keep getting better. Right. And that is,
it just enables more possibility. Right. So, you know, if you think back to when we started the company,
we liked the warehouse because it had infinite,
effectively near infinite storage.
And we could use it as both a source and a destination.
We could actually write our information into the warehouse.
That way we could, you know, kind of do diffs.
Like I think incremental six is like a table stakes thing. It's like, that shouldn't be some fancy feature. And, but we can do so much
more given the capabilities of the warehouse. Right. And again, when you have separated workloads
and infinite storage, like there's just so much you can do in terms of being able to create more
observability, more kinds of transforms. There are new SQL functions that still come out, right?
Like that are kind of really fun for people. I hope to teach people about certain approximation functions
that are actually kind of neat,
but story front of the day.
And of course, they're all getting more real-time,
more centralized, more merged across the lake,
the warehouse, the real-time systems.
And I don't think they're ever going to perfectly intersect, right?
But the beauty of the the most of business
like most of business save of like a very small set of things can really handle what i would call
our version of like real world real time which is not computer real time it's like you know seconds
not microseconds and i think there the warehouses are really getting there. And that, I think, will unlock so many scenarios in terms of, you even gave that example close
to, right?
You said, there's some things where it's like, oh, they signed up, right?
And but other things where you need to do some computation.
It's like, you should not have to make that trade-off.
Anything that happens, you want to be able to compute on it and you want to be able to
operationalize it, you should be able to compute on it and you want to be able to, you know, operationalize it, you should be able to do that. And I think that's why I think we're, we're, we're just in a fun era of kind
of warehousing and let's call it data storage and competition just getting
better every year.
And so I think it's just a fun time to be in our space.
Yeah.
And more accessible actually.
Yeah.
Accessibility.
I think is a great way to think about it.
Yep.
Yep.
That's great.
Pretty your turn.
And I'd love to hear also like from your perspective, because you are coming
from the more enterprise space.
So what do you see there?
Yeah, like ETL became a thing, not because ETL was a people
romanticize ETL because ETL was such a cool technology.
What drove the rise of ETL or now what's trending towards ELT, is a rabid appetite
for consuming data, the data-driven decisions, the business intelligence side of things.
So the exciting thing, like with the reverse ETL trend, what will propel it to what ETL
has been for the last 30, 40 years is the trend that we
see in enterprises with this problem, like going from big data, which drone ETL to big
ops.
Everything that we are talking about, you know, reverse ETL is just a way to move data
from one place to another.
But at the end of it, the GTM teams are trying to convert to each password, launch campaigns,
more effective campaigns, right?
Product teams are trying to drive the product ledger out and, you know,
drive better experiences.
Customer experience teams are doing the same things using data, right?
So the big ops is the next big thing and reverse ETL will play a big role in that.
I'll leave it at that.
And that's a trend that we see in enterprises.
Yeah, that's very interesting. That's a trend that we see in enterprises. Yeah.
That's, that's very interesting.
That's a very interesting term, big ops.
So that sounds great.
So Zaz, your turn.
Zaz Kucinich- Cool.
Yeah.
I mean, honestly, on the technical front, I think I, Boris and I are thinking
alike here, I'm really excited for the data warehouses to just get better and better.
I think streaming data warehouses is something we've always been excited about.
There's some players like Materialize that are, you know, building the ability to give
a SQL database a view that's defined in SQL.
And as the data comes in, it's incrementally processed so that a system like Hightouch,
for example, could just subscribe to that and automatically forward, you know, what
cohort or what audience the user is in based on the SQL formula to all these downstream tools. And while that's not innovation in high
touch, that unlocks, you know, massive potential for high touch to be used for use cases like
on-site personalization in real time that, you know, it's harder to use high touch for today,
as kind of Triti mentioned. But on the reverse ETL product front, I think really what I'm excited
about is just more the design aspect of things, actually. I think a big bottleneck to making reverse ETL and data activation someone hands them and they see the graph on it. Like I can't wait
till they have a problem. Like, oh, I wish I had this data point in this tool,
or I wish I could grab users that meet this criteria and they can walk into a data activation tool
that they have never used before the organization. It kind of
gets connected to the resources that exist around the company quickly,
metadata from all those different systems
and helps guide that user through actually solving their business level use case.
And I think a lot of the innovation in the space outside of the technical front
will actually just be on the design of the products
and making it really accessible and separating technical concerns
from business concerns so that business people who identify a problem
can just solve that problem. And that's where we spend a lot of our headspace honestly and i think it's it's a
function of both marketing reverse etl is not the most accessible term to everyone as well as product
and partnerships and that's what i'm most excited about yeah yeah 100 and i think i mean we don't
have time today but there are a couple of things that we didn't manage like
to touch and discuss about today. And I think one of the most interesting is the users behind
who is mainly affected by reverse ETL. What are the personas? Like, what do you see there?
And Boris, you mentioned like this 5 million people there that they know SQL, but they are not technical, right?
What's the journey to enable these people to do more with less technology? the, the, they don't just hire us for moving bits, right?
Like they're hiring us as a piece of software, but they're, they're, they're trying to increase their impact, do more with their data.
Like that's exactly.
Yeah.
Colin business.
Just the morning.
I thought, you know, they just mentioned two things that triggered
some thoughts on this at first.
He mentioned one thing that was very important.
Like it's that the, you bring in a tool, like the tool, but how it fits into your
overall data strategy or how you're thinking about data in the company.
And the second part, like you just mentioned about this, about like the
ability to publish and subscribe like events, right?
And that's a trend like we see in enterprises, which is this event driven
architecture, right?
It's not, it's not just connecting your data warehouse,
cloud network, not to the business applications,
but the ability to stream events, right?
To a bus and then consume it across various subscribers.
The deep top of that architecture, that's a big trend as well.
There's no doubt.
And again, the maturity
level in C++ enterprises varies,
but that's definitely a trend.
Yep. Yep. Makes
a lot of sense.
Someone tried to say something
and I think interrupted. I was going to say,
I think it's time... We're way over time, Eric.
We're way over time, but
I think, you know, I'll try to
summarize at least one of my primary takeaways.
What's so interesting to me is that reverse ETL is almost a misnomer in that it's just sort of moving the data.
It's moving the data, right?
It's a pipeline and it sort of describes a flow of data.
And what we're talking about here is far, far deeper than that, you know, and impacts
the organization as a whole.
And I think reflects changes in the industry as represented by both technology and then,
you know, the changing skill sets of people.
And so this has been a true treat to hear about how everyone's thinking about that.
So thank you again.
Thank you for going long.
And let's do this again
and dig into users.
And then, of course,
we didn't get to synthetic events,
my favorite topic
when it comes to reverse ETL.
So we'll do it again
in another couple months.
Thank you, everyone.
Thank you.
Thank you.
Thanks for having us.
Yeah, it's a lot of fun.
Thank you, guys.
We get to talk to some super smart people,
Costas, which is honestly
like maybe one of the highlights of my week,
just being able to ask good questions, or at least what I think are good questions,
or at least my own curiosities to these brilliant people building stuff.
I think what was so interesting was we really didn't... If you went back and listened to that
conversation and you didn't have the context for it, you might not necessarily think it was solely centered on reverse ETL.
And we actually didn't talk about the actual technical flow of data from a row in a warehouse table to a field in some sort of downstream tool. And I mentioned this at the end, like reverse ETL is like a, it's a strange term in that
way, right?
Because the way that they're thinking about this problem is so much more comprehensive
than, you know, just sort of a basic pipeline that's moving data from A to B.
So yeah, it made me even, it made me think even more about, you know, your point that
the name for this maybe is not a great name.
What'd you take away?
Yeah, it's not a great name.
My main takeaway is that we really need to spend like more time with these folks,
like discussing about not just like reverse ETL, but the whole, let's say,
transformation that the data infrastructure is going through right now.
Like for example, you saw that one of the most exciting things they talked about where
it was about the latest developments in data warehousing, right? And like, what does this mean?
Or what it means to have like so many people out there that they know, or they say that they know
SQL, but they are not like technical, right? And we still have like so many people that
they technically are doing functional programming through Excel sheets, but still they are not using
like all these amazing technologies that we are talking. So the potential is obviously like huge
out there and we are still very, early and what i i'd like to
add like to what you said about like speaking with very smart people i would say that like what i
find like extremely fascinating is that it's not just like they are smart people they are also like
highly motivated people that's what makes like things even more interesting because we have people that are trying to
change the way that we are working with data and it's very early.
So I don't know, I find it like, how to say that, as an amazing opportunity to take a
glimpse in the future when we can get all these people together and chat with them.
So hopefully we'll be able to do it more often in the future.
I agree.
All right.
Well, thanks for joining us.
Lots of great recordings coming up.
So make sure to subscribe and we will catch you on the next Data Stack Show.
We hope you enjoyed this episode of the Data Stack Show.
Be sure to subscribe on your favorite podcast app to get notified about new episodes every week.
We'd also love your feedback.
You can email me, ericdodds, at eric at datastackshow.com.
That's E-R-I-C at datastackshow.com.
The show is brought to you by Rudderstack, the CDP for developers.
Learn how to build a CDP on your data warehouse at rudderstack.com. Thank you.