The Data Stack Show - 253: Why Traditional Data Pipelines Are Broken (And How to Fix Them) with Ruben Burdin of Stacksync
Episode Date: July 16, 2025This week on The Data Stack Show, Eric and welcomes back Ruben Burdin, Founder and CEO of Stacksync as they together dismantle the myths surrounding zero-copy ETL and traditional data integration meth...ods. Ruben reveals the complex challenges of two-way syncing between enterprise systems like Salesforce, HubSpot, and NetSuite, highlighting how existing tools often create more problems than solutions. He also introduces Stacksync's innovative approach, which uses real-time SQL-based synchronization to simplify data integration, reduce maintenance overhead, and enable more efficient operational workflows. The conversation exposes the limitations of current data transfer techniques and offers a glimpse into a more declarative, flexible approach to managing enterprise data across multiple systems. You won’t want to miss it.Highlights from this week’s conversation include:The Pain of Two-Way Sync and Early Integration Challenges (2:01)Zero Copy ETL: Hype vs. Reality (3:50)Data Definitions and System Complexity (7:39)Limitations of Out-of-the-Box Integrations (9:35)The CSV File: The Original Two-Way Sync (11:18)Stacksync’s Approach and Capabilities (12:21)Zero Copy ETL: Technical and Business Barriers (14:22)Data Sharing, Clean Rooms, and Marketing Myths (18:40)The Reliable Loop: ETL, Transform, Reverse ETL (27:08)Business Logic Fragmentation and Maintenance (33:43)Simplifying Architecture with Real-Time Two-Way Sync (35:14)Operational Use Case: HubSpot, Salesforce, and Snowflake (39:10)Filtering, Triggers, and Real-Time Workflows (45:38)Complex Use Case: Salesforce to NetSuite with Data Discrepancies (48:56)Declarative Logic and Debugging with SQL (54:54)Connecting with Ruben and Parting Thoughts (57:58)The Data Stack Show is a weekly podcast powered by RudderStack, customer data infrastructure that enables you to deliver real-time customer event data everywhere it’s needed to power smarter decisions and better customer experiences. Each week, we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Hi, I'm Eric Dotz.
And I'm John Wessel.
Welcome to the Data Stack Show.
The Data Stack Show is a podcast where we talk about the technical, business, and human
challenges involved in data work.
Join our casual conversations with innovators and data professionals to learn about new
data technologies and how data teams are run at top companies. How to Create a Data Team with RutterSack
Before we dig into today's episode,
we want to give a huge thanks
to our presenting sponsor, RutterSack.
They give us the equipment and time
to do this show week in, week out,
and provide you the valuable content.
RutterSack provides customer data infrastructure
and is used by the world's most innovative companies
to collect, transform, and deliver their event data RutterSack provides customer data infrastructure
joined us at Data Council. We did a little bit of a lightning round
at Data Council, Ruben.
So we'll take our time to dive deep.
Thanks for joining us again
for your second slot on the show.
Yeah, thanks so much for hosting.
Great, well give us just a little bit of background
for those who didn't hear the Data Council show.
Give us a little bit of background on yourself
and then just the one or two sentence overview of Stacksync.
Yeah, perfect.
So, my name is Ruben.
I'm co-founder and CEO at Stacksync.
So I am based here in San Francisco, California, building Stacksync with our team.
I'm originally from France, so a bit of my background, you know, I did study computer
science, double degree in computer science and one degree in business as well, back in
Switzerland.
And then I worked as well, you know, in Germany and in Singapore.
And actually, this is also where I got really in touch with the world of
two way syncing because I was working, you know, in a company and I was
as a consultant and was in charge of putting everything in place, you know,
from accounting software to ERP to CRM and I, you know, all of these tools
to work in two way sync.
So what I did in the CRM reflected to the ERP and vice versa.
And there were no products on the market. You know, like then I searched, you know, I tried to build, you know, some alternatives myself, you know,
with somehow, you know, workado, etc. And none of them really worked, was really
complex. And I just couldn't leave the company because everybody was afraid to
take this work over. And this is where actually I realized, you know, like this
is where I realized, you know, like this is where I realized, you know,
this is a big whale problem.
Everybody's complaining it should exist.
And so, and there I committed, you know, I started an entrepreneurial journey and now
here we are, we did YC.
So Stacksync basically we were running for a year and a half and we did YC, Y Combinator
in the Winter 24 batch.
And since then, you know, we moved to San Francisco and really got this
explosive growth that we have at the moment.
Awesome.
Well, one thing that I'm excited to talk about is where two-way sync fits into
the stack because there are a lot of companies who, you know, sort of use a
traditional sort of in, transform, out type loop.
So I'm excited to dig into that
and just learn more about two-way sync in general,
dive deeper than we went last time.
How about you Ruben, what do you wanna talk about?
Absolutely, so I'm super excited to talk about this
two-way sync, what it fits in the stack,
but as well as like, you know, I'm very surprised
how marketing is actually reshaping the perception
of people on zero on, you know,
zero copy ETL, this kind of trend, you know, and how actually exists right now as we stand,
you know, it's most of marketing and little tech, right? And it's crazy how much, you
know, this tech people actually go into this, this, this fantasy of vendors, you know, selling
it. And so, yeah, extremely happy actually also to
decode that a little bit further. And yeah. Great. Well, let's dig in. Let's do it.
Ruben, I love having second time guests because I can say, if you haven't heard the first show,
go listen to it and you'll get context and listen to them in a row.
So we can kind of dig right into some of the spicier topics.
The first one of which we'll cover is that zero copy isn't a real thing.
And so very excited to dig into that.
But first, I want to do two things. give our listeners a high level overview of StackSync. We'll get way more into that later. You mentioned when we were recording the intro
that you were a consultant,
you were trying to get all these tools to talk together,
you know, CRM to talk to the ERP,
and it was a really brutal problem.
You tried all these tools.
What was the worst integration problem
that you faced during that period?
Where you just, that period where you just
You know, you just said this is so
Gnarly and bad that you know, did you ever think about giving up? I mean, what was the nastiest problem?
Absolutely. I mean like this is I mean, you know building to with sync with workflow automation tools or code
Was as much like really brutal.
First of all, you have to think about the whole architecture, then you start building.
So after maybe like a day or two, you get the workflow.
And then you realize, okay, now when I edit a record on one side, it goes to the other side, and vice versa.
And then you say, okay, well, cool, this is great.
So now, okay, so this is for one record, which I just created. What about for update?
Oh, and now you have to figure out an entire update logic,
right, and you realize, you know,
I'm gonna take the entire record.
So whenever, you know, I have a record of it in Salesforce,
I'm gonna update into my HubSpot and vice versa.
And then you say, well, okay, so I just,
because Salesforce only tells you
that the record has been updated, but not which field.
So now you say, well, intuitively,
I'm gonna take the entire record. I'm gonna push it into HubSpot, but not which field. So now you say, well, intuitively, I'm going to take the entire record.
I'm going to push it into HubSpot.
But then you override the email.
So now you have a, for sure, you have a marketing person
in the company creating a sequence,
and say every time Mark, the email is updated,
enrolls this into a welcome email sequence.
So the guy, so customers start now receiving welcome emails
every time something is updated on the CRM. And yep and so that's as a structure you have to know exactly what
changed now you have to run an entire system which detects which field was
updated into the same record and now you start storing data and then you have
deletes as well right and once you have done it say all the credit operations
you're like well that's great but now this is for one record. Let's backfill the data.
And now you have to backfill all the historical data
of the same system.
And now it becomes extremely complex.
And now every time that your system misses one event
for any reason, one custom field added,
someone changed the name of a field or something like this,
it bugs and you lost the data.
There is no monitoring.
And so the maintenance over like the first three weeks were so huge. It was taking my full time integrator position for a single contact sync pipeline.
And so,
well, I'm laughing.
It's a painful laugh because this is over 10 years ago.
But I have to confess that I was a victim to the promise from the sales team
that I can't remember if it was Salesforce or HubSpot, but it doesn't matter
which one it was, but they're like, oh yeah, we just integrate with, you know,
let's just say it was HubSpot.
They're like, we have a direct integration with Salesforce.
All your data just goes back and forth.
And you just configure a few things, right? And I was like, oh, a direct integration with Salesforce. All your data just goes back and forth. You just configure a few things, right?
And I was like, oh, well, that's great.
And I could not have been more wrong.
And actually the thing that, and I'm sure things have gotten better since then,
because it's a long time ago, but it was impossible to troubleshoot,
especially with large batches, right?
I mean, you upload a bunch of leads, you do all of that.
But the other thing actually that I'm interested in is that was one contact sync flow.
But if we take Salesforce and HubSpout as an example, any data engineer or analyst
who has worked on data from either or both systems knows that a contact record
and what that means is not the same in these different systems.
And the database design is different and they can actually have pretty dramatically different meanings
even across teams.
And so that's what you described sound really painful with the assumption that it's a shared definition
across the two systems and across the two teams, which is actually more rare than it is common.
And I think probably what you had 10 years ago is probably even worse now,
because the systems are more complex instead of getting simpler because technology evolved.
Because it's even worse. And I'm telling you, now we're just talking about how to sync systems,
databases, CRMs,
between each other, et cetera.
But now you're mentioning the problem
of the definition of a sync.
So it means you need to have filtering, right?
But filtering from A to B is not the same as filtering
from B to A, even with the same definition, right?
So this goes very crazy.
And also if you want to really go even deeper,
no, contacts are not standalone.
Contacts belong to a company. So now you want to really go even deeper, no, contacts are not standalone. Contacts belong to a company.
So now you have to associate.
Now you have to associate different association models.
Also, you have ordering, right?
Because you have to first create a company,
and then the contact,
because the contact belongs to the company.
But like in other systems,
you might have different association systems.
I mean, so where you have to create a record,
to your company, and then associate them, you
know, so it really, it's really different.
And so you have all of these complexities to actually maintain.
And this is what makes Twasync very hard.
So in the very beginning, it's very hard to go into the marketing and say, yeah, you know,
it's very easy, you know, we integrate, etc.
And for example, say, there is this very big, very big, et cetera. And for example, let's say there is this very big pain
on the market at the moment, right?
Which is HubSpot, Zendesk, for example, right?
Integration of HubSpot and Zendesk.
You know, the team at HubSpot is gonna tell you,
yeah, of course we integrate with Zendesk.
You can use Zendesk for your customer support,
HubSpot as your CRM, and this is gonna work fine
because we have two async.
So first of all, only contact and companies
and tickets are associated.
But actually the tickets, it's not the tickets
as you imagine them in HubSpot.
So the Zendesk tickets is gonna have to be synced
to the service hub of HubSpot.
So you have to subscribe to the Zendesk of HubSpot
to get the sync.
But because you use Zendesk of HubSpot to get it synced. Which is a separate body.
Because you use Zendesk, you don't need the service hub of HubSpot.
You're actually just buying towards the same product to actually use it.
This is not the tickets you want to sync.
So eventually, it's just not syncing.
You won't have for all of your marketing contacts, right? So you want, so because, you know,
contacts will sync, you know,
have different definition in HubSpot and in Zendesk,
every person you send a marketing email, you know,
even a call lead from HubSpot,
you don't want it to sync to your Zendesk server system,
which is made for your customers or the people with very high intent.
And so what about this transformation
which happened in between?
Custom objects are not supported,
associations are not supported, it's completely crazy.
And so, and this integration are very costly
and they still sell.
Still sell.
Yeah, it's wild, yeah.
That's why the original two-way sync is called a CSV file.
Exactly, exactly, exactly. I mean, is called a CSV file.
Exactly.
CSV file and lots of VLOOKUPs and stuff.
You mentioned in the intro that it was hard for you to leave the company
just because no one knew how all this worked, right? And CSV files can be that way, where it's like, there's one person who knows how to get it just right. Yeah, it's crazy. I mean, and this is where Stacksync came,
all about came to be, right? So Stacksync basically also, you know, really deep dives into this
nature and to this reality of two-way syncing between enterprise systems. And where we really
are, you know, really as a leader is really when it works at scale, right?
So Stacksync basically really gets this two-way sync at scale.
And it requires a whole complex engineering,
which is almost a database level, you know, conflict management,
you know, technology, which is like,
it's been developed over like tens of years.
And so, yeah, so just more about Stacksync, you know,
Stacksync is today's leader in real time and
two-way syncing between enterprise systems and databases. So, Stacksync supports CRMs like
Salesforce and HubSpot, Zoho, et cetera, but also ERPs like NetSuite, ACP, Acumatica, and all of
these tools basically, they can be synchronized in two-way sync with databases, such as Postgres, Snowflake, BigQuery, MongoDB, MySQL, OracleDB, you name it.
And so what ReleaseTaxing enables to do is really to actually bypass all of these IPaaS
tools, all of these complex in-house code, custom code logics, and just have a two-way
sync as you would think it is in a human manner.
Just simplify your architecture diagram to a very baby level, right?
Just like I have, like, no, two, one, you know, one CRM, one database.
Whenever you modify something into the CRM, it goes into your database.
And when you modify that same table, no, not on the table, the same table, back, it's actually
going to write back into your CRM or ERP.
And that's what Stacksync really offers in real time
with millions of records per minute,
technically at big times.
I love it.
Okay, I have a ton of questions about that,
but I promised the listeners we would do the spicy,
we would get to a spicy take early,
which is zero copy isn't a thing.
So I want to move around back to Saxon specifics, but this really piqued my interest when we
were chatting before the show.
I think you used the phrase zero copy isn't real.
I think that was the phrase.
Okay.
So give us the spicy take because it has been a major topic of discussion.
Product launches, feature launches, a lot of ink has been spilled.
And I'm sure all of our listeners have heard of it, but for those that haven't,
what is the promise of zero copy?
Give us a baseline of what does zero copy mean.
Yeah. So let's get zero copy or maybe in its full term,
zero copy ETL or zero ETL even sometimes.
It's basically the fact that right now, basically,
you have to use five trends, stitch, or byte, et cetera,
to transfer your data from an external source
to the warehouse.
And so with this, there are a lot of recent tech developments, which actually
tells you, okay, maybe we can agree on a common data format.
And actually, every system would pump on the same storage. So
there is no copy between system actually, you have one source of
truth of data. And you know, the CRM would actually pump on this
data. So that warehouse will pump on this data. Actually,
there is no transfer, just a single place, a single storage, right?
And to this, there are technical and business challenges
that at least 10 years or 15 years before it's solved
if it will ever be, right?
Because it's really a business problem actually.
It's a very root.
And every tech problem is a business problem in the end.
And so-
Dig into that a little bit more. What is the business problem tech problem is a business problem in the end. Well, dig into that a little bit more.
What is the business problem?
What is the business problem and why will it potentially not get solved?
Yeah.
So it's basically like ETL in general is basically like we need to have
that different data sources and we want to bring everything in the same place.
So we can actually have, you know, data, you know, data available for insights
and reporting and do all sorts of like operational things. So it's a business problem. We need to do some real
stuff and make some real money with this. And now what happens with this
zero ETL is that, okay well shipping data from A to B is actually very long,
costly, and hard to get very accurate at scale. Yes, yep. So then what happens is that Stacksync,
I mean Stacksync, or any vendor actually
will just transfer data, but it's gonna get long.
So people say, well, let's agree on a common data format
and a common place of storage.
So we don't have to copy.
Everybody can just come and grab what they want,
but there is no transfer, you know,
there is no transport, it's just a common place.
Okay, so that's a very good idea.
But then this has a very big business issue.
Why this almost cannot happen is because data warehouse,
I mean, this common data format
has to be sort of efficient for everybody.
Data warehouse work in a very different way
than we used to make some query at scale. efficient for everybody. Then there were how work in a very different way than, you know,
we're still make some query at scale.
CRM, which is to retrieve records with different indexes, you know,
to make it very fast for users, which work on a daily basis.
Right.
So this means that the storage of this two different, you know,
the Salesforce and this Snowflake have to be very different in
nature, in nature.
So just performance wise and businesswise, it's a problem.
But also strategy-wise, right?
I mean, like, what do you think?
Do you think Salesforce is going to open up their backend
and all of those are business secrets to everybody, right?
It's like, you know, this is not, you know, the schema,
the schema will never be exposed because also like
in the database schema, a company has
much more than the data which is exposed to the customer.
They have also a lot of metadata, a lot of organization, relations, optimizations, and
you know like the storage just cannot be the same because some part of it needs to be masked.
So now you have to have very deep row level and column level access rights, right?
So this causes another problem of security.
How do you make sure this never leaks?
And so you have all of these business problems
which actually make sure you always have to make copy
simply because the data which you operate in
cannot be endangered.
And also we have a common place of storage.
And I'm throwing another question to the industry.
So Salesforce and Snowflake would share the same storage? Where is the storage? Who pays for this?
Where is it? Who accepts to have latency? Do we all lock in the same vendor into the same Amazon
S3 bucket or Google Blob storage? Do we have to lock in into that storage? Because now if we move,
we have to move everybody. So it's even more if we move, we have to move everybody, right?
So it's even more locked in.
So we have so many problems that happen.
And so all these issues, right, are something which make,
you know, zero ETL or zero copy really something which is,
you know, almost fantasy as today.
Yep, yep.
It was interesting.
We had a guest on the show from, Yep, yep.
So, you know, you can of course see that there would be benefit in sharing, you know, having some crossover data there that benefits both parties.
So they had really similar things to say about clean rooms.
Cause they thought, oh, well, we'll just use clean rooms to facilitate this.
But when they really started to dig into it, they found similar things to what
you are saying where it is that you can do some things with it, but it is not,
you know, it doesn't quite live up to everything that the marketing says it is.
As far as this full seamless functionality where you can just dump data
into a clean room and all this magic happens, it's actually, you know, there
is actually a lot of work in order to figure out how to
make it work well with both parties.
And so they ended up actually building a, you know, sort of a different architecture,
but fascinating, fascinating.
It's the power of marketing.
Yeah, the power of marketing.
And this is, and this gets even more critical, right?
Because like, you know, if you really deep dive into how it really works, okay, so you
go to the Salesforce to Snowflake data sharing,
right, it's a zero copy, zero caffeine, zero everything.
It's like, there is zero, zero nothing, zero calories.
Right, it's very Buddhist.
It's pure. Blank.
Yes, pure.
Pure data, right?
Just pure data as a red piece.
And then you go into this and
you say, well, in the documentation, you have a five minutes replication lag. So I mean, like,
if I really read zero copy and I really understanding as a human would five minutes,
you know, it's already, there is a problem if it's the same place and sorry, five minutes
and replication lag. So if you have the word replication into the documentation of something,
which is zero copy, that's concerning, right?
That's concerning.
And so eventually what is all of this,
like my take on what is data cloud,
data sharing with Snowflake or HubSpot,
Snowflake data share,
this is just Salesforce Snowflake account.
Yes, that's what I'm doing. It's just like, is they manage the entire ETL pipeline for you, this is just Salesforce Snowflake account.
They manage the entire ETL pipeline for you, and they give you a Snowflake account,
which you can actually just grab the credentials,
you can grab the credentials and just query it.
That's it, you can't write to it, you can't transfer it,
you can't do anything, you can add custom fields,
you can just query it.
It's just a dump of data, which is locked into your Snowflake.
And so what that means, for companies, you say,
well, it's great, instead of going to FavTran,
I can actually buy all of these tools.
But it's a very big problem now,
because when you have plenty of pipelines with FavTran,
et cetera, you have both discounts.
But when you actually have to buy this small item from Salesforce, this
small sync from Hubbot, this small sync from Zendesk, you have to actually
contract with 10 or 15 different data sources, I mean, vendors to actually
get data into your data warehouse instead of just having one ETL, which
is maintained by your data engineering team.
So now you have a distribution of ownership of these pipelines, which to people who have
nothing to do with pipelines because they are just working in Zendesk or just working
in Salesforce.
And so that's a very important strategy and cost problem as well, which also make zero
ETL.
So zero ETL is complex from a technical standpoint, if not saying almost impossible.
It's complex from a strategy standpoint,
and it's complex from cost standpoint.
So actually, all components which drive a business
in reality are just not present
into this zero-copy landscape.
So this is, it's the most not existing, basically.
Yeah, it is a really not existing, basically.
Yeah, it is a really unfortunate term because I can see a narrow use case.
When I say narrow, what I mean is there's a business team working in some tool
and usually have an operations person.
And there is a use case for being able to write a query in that tool and pull in some data set or something.
And actually I remember we had someone from Braze on the show.
Braze is a marketing tool.
You can send customer communications and create customer journeys through Braze.
And they launched a tool that allowed people to write SQL queries and you could sort of pull data in. And he thought, power users will love this,
but like adoption was way more than he thought
and like a ton of people use it, right?
And so there is this interesting use case there
where it's like, okay, you need some specific thing.
Your data team has probably materialized a couple of views
that have some things and some valuable data fields,
just pull them into your tool.
That's actually a totally valid use case, but calling it zero copy ETL is really misleading
because that's not actually the value that it provides or even really a good description
of what's actually happening.
You're just querying data from a very high context individual system.
So it is unfortunate.
Absolutely. I see though a value in zero ETL and a legit point,
I mean, which is not legit, you know, really per se,
but at least it can be legit from a business perspective. Let's say,
for example, let's say you have a system,
like say in your P with a very complex data structure, a complex format,
you know, which is quite hard to actually
expose over APIs or the vendor just doesn't do it because of strategy like SAP. So it's very hard
to get data out of SAP for no real apparent reason, just because I don't want you to
get out of the ecosystem. And so maybe for vendors like this, selling this zero, I mean, I mean, data share.
It's really data share.
See, data share can be valuable
because they can expose data
which you cannot get access via APIs independently, right?
You need to get access to your own data in some way
and it's not possible to make it accessible to APIs.
So that maybe because of scale,
because of complexity, because of data types,
this kind of things could make sense,
but it's because of technical or business limitation
that the business has,
and this is where data share makes sense.
But data sharing for HubSpot or for Salesforce
doesn't really make sense because the APIs
still enable some sort of real time sync.
And so that's why there is no real need for this.
So we have to really introduce
hard limitation on the API.
So at least it would be the only one
but it is critical monopoly,
which would be a very big scandal.
Right, right.
We're gonna take a quick break from the episode
to talk about our sponsor, RutterSack. Now I could say a bunch of nice things Right, right. is clean and then to stream it everywhere it needs to go.
Yeah, Eric. As you know, customer data can get messy.
And if you've ever seen a tag manager, you know how messy it can get.
So RutterStack has really been one of my team's secret weapons.
We can collect and standardize data from anywhere, web, mobile, even server side, Now, rumor has it that you have implemented the longest running production instance of
RutterStack at six years and going.
Yes, I can confirm that.
And one of the reasons we picked RutterStack was that it does not store the data and we
can live stream data to our downstream tools.
One of the things about the implementation that has been so common over all the years
and with so many RutterStack customers is that it wasn't a wholesale replacement of your stack.
It fit right into your existing tool set. Yeah and even with technical tools Eric
things like Kafka or PubSub but you don't have to have all that complicated
customer data infrastructure. Well if you need to stream clean customer data to
your entire stack,
So, let's talk about two-way sync. And let me frame this a little bit because there's a very common loop that has been reliable for a really long time.
And actually sort of pre-existed the modern data stack, right?
Because people did it with, you know, whatever they would write pipelines in Python or whatever, you know, if you were completely hand rolling it.
But let's just, we'll frame it in the terms of the modern data stack.
So I have Salesforce.
I'm on a data team, right?
The go-to-market team uses Salesforce for all the business stuff, right?
It's where they track leads and campaigns and opportunities and everything. I need to combine that data with other data, some model, I need to, whatever, right?
Enrich it.
So I use 5Tran, I pull the data into some data source, Snowflake, Databricks, et cetera.
I model the data.
I'm using DBT or some transformation layer,
to enrich the data point, do whatever transformations I need to run,
and then I reverse ETL it back into Salesforce.
And then of course that gets pulled in, five-train again, and that's the loop.
And that's been a very reliable loop for a long time.
Again, there are sort of tools to do that now, but again, it's been going on for years. What is the problem with that loop?
Yeah, so that's a very interesting question.
Because then you say, well, you can build, you know,
if there is no problem, you can build two-way sync manually, right?
And as we were saying in the beginning of the podcast,
you know, there is this, you know, it was an absolute mess to maintain.
So the problem with two-way sync is that, you know,
two-way sync is extremely hard problem to actually get
because of the limitation of current tools, right?
So for example, I was mentioning,
okay, so if you modify your record,
in Salesforce, it would tell you that record changed,
but doesn't tell you which field changed.
So now what do you send to your hotspot or NetSuite?
You override the entire record. So every time you, what do you send to your house, spot or next street, you override the entire record.
So every time that you have a, you know,
if you override the email, even with the same value, you know,
even though it might look the same, you might have a rule,
which every time you update this field, you know, send a welcome email.
So now every time that, you know,
you update something in the CRM with even like something completely unrelated
field, like first name or last name,
you would actually send a welcoming mail
because you told the rights and shy away from it
and you lose data.
And so, even if it's one way.
In the loop of like Salesforce ETL,
transformation reverse ETL, Salesforce ETL,
transformation reverse ETL,
I mean, that's not really two way sync,
but you kind of handle all of the nasty stuff in the transformations, right? I mean, that's not really two-way sync,
but you kind of handle all of the nasty stuff in the transformations, right?
That's where most of the work happens.
Ultimately, those things get crazy.
Is that the big challenge with that and why the value proposition of two-way sync is attractive, or is that loop not suited well for specific
use cases, for example?
Because I think that people use the loop for everything, right?
It's the go-to for any, this is probably a dramatic oversimplification, right? And it's like analytics, ops,
operational data, whatever.
It's just throw it into the loop
and we'll figure it out in the transformation layer, right?
Which gets complicated and expensive,
I know is one issue, but walk us through the other issues.
Absolutely.
So, I mean, like now we're talking about one pipeline, right?
Maybe Salesforce contacts to Snowflake,
you transform DBT,
you put this into another table, which is a staging,
and then you have like an ETL which is scheduled,
hopefully coordinated, but you know,
it's orchestration is still like a piece of problem.
And actually, which would turn it back.
Huge actually, yeah.
And send it, you know, query from Snowflake
and send it back to Salesforce.
So here is a reverse ETL vendor.
They have a data storage in between
which actually compares what you send as a query.
So data running the diff, yep.
Is running the diff and then like the difference
that we shipped and back to Salesforce.
So this is all the fixer plan.
But so this means that you have one ETL vendor,
one reverse ETL vendor,
you have at least two tables in your house.
I mean, that is the, yeah, I mean, for a baby startup company, 200 maybe.
Yeah, at least two tables and this and your energy BT engine, right?
So that's cool.
And now you only talk about contacts because companies is another one.
And if you need to add contacts and companies and associate them, you need to also make sure
the companies feedback first after, you know, before the contacts because the contacts belong
to companies. And so, but like if you want to associate a contact in a company, you need to
have the Salesforce IDs in Snowflake. So you can send, you know,
create the record in Snowflake, right?
I mean, Salesforce.
So it means that, you know,
you need to first do the first loop of companies,
get the IDs back for the companies you created into Snowflake,
and only then you can start with the context.
But you have this kind of managed fields
which needs some feedback
because you first have to create a company.
So first create a company from Snowflake to Salesforce,
get the ID back into Snowflake,
use that ID to create a contact
and get the contact back
because then you need to create opportunities
or something like this.
And so all of this orchestration now
gets you tables and tables, et cetera.
So this, the promise of having a simpler architecture
is wrong and what this means concreted forbidden
because complex tech is not a problem.
But heavy maintenance, that's a challenge.
And this is why the loop is not working
and this is not even real time.
So this is only for analytics use cases
where you need to ship some sort of aggregated metrics And this is not even real time. So this is only for analytics use cases where, you know,
you need to ship some sort of aggregated metrics once a day,
you know, some stuff like this.
So it's a big deal and one more vendor and all of this organization you have to do
just for shipping some metrics back.
You know, so don't tell me this is easy, right?
This is pretty complex.
And now, the other thing, sorry to interrupt, but the other thing now that you're talking through that is really interesting about it This is pretty complex.
The other thing, sorry to interrupt, but the other thing now that you're talking through that is really interesting about it is that I said earlier,
we'll just fix everything in the transformation layer.
But in reality, actually, for anyone who's built these systems inside of companies,
what actually happens and is a very pernicious problem is that the logic lives in different places, right? So you may have logic in Salesforce that runs on load, super common, right?
I load a bunch of data, I run a bunch of Ajax to do some operation to apply some
business logic, right?
You may have logic that runs on the ETL pipeline on ingest into Snowflake.
And then you also may have logic that runs on the ETL pipeline on ingest into Snowflake.
And then you also may have logic that runs in the individual reverse ETL jobs
that are coming out of it, right?
And so what's the thing about that is you don't really have a single source of truth.
I mean, maybe you have one giant table that sort of represents it,
and then you materialize things on top of that.
But it is actually very difficult to ensure that all of the logic does actually live materialize things on top of that.
model and check all the dependencies and all that sort of stuff. It's like, well, we don't really have time to do that.
It's like, okay, well, we'll just do it in the reverse ETL pipeline, or we'll
just write like something in, you know, in Salesforce, right?
And so then your business logic starts to get spread all over the place.
Absolutely.
And so basically by grouping, you know, this ETL and reverse ETL
vendor and putting that in real time, actually you decrease basically the
number of tables, but from hundreds to just one, you know, because it's the same table that you read and write data from. You
also simplify all of these, you know, complex transformation layers, which now all sit within
your dbt or coalesce transformations. You know, coalesce is a different, is another
dbt sort of, you know, competitor and very popular as well with some no code features.
And so what I really see that,
is our architecture get completely simplified by this,
by having loops, you actually end up just having
by a bi-directional arrow.
And a bi-directional arrow is not the same as two arrows
in opposite directions, right?
Because it's two tools that don't talk to each other.
And also what people don't think is that
if you have a conflict, a data conflict,
which happened, same data at the same time, et cetera,
if you are badly orchestrated,
like you're just gonna swap values
and actually your data will just not look same anymore.
It's gonna revert values, you're gonna swap values.
It's gonna be really complex to maintain.
And now the CMO walks to you and say, why?
We texted this customer on a different segment,
we lost the DR.
And say, well, because actually there was a technical,
we don't care about the technical issues.
Like you have to make this pipeline work.
So if you work, if your position is at stake,
you would not use this kind of tools.
And that's where like the robust tooling like Stacking,
which really blocks the scale.
Because scale also like is a very different impact, right?
It makes like, it's going to take ages.
It's going to take ages, you know, just to ship, you know, to make the pipeline run.
And if it takes four hours to run from HubSpot to Snowflake and four hours from Snowflake
to HubSpot, it just means that, you know, it takes eight hours just to run a pipeline.
And when you go too much, it becomes impossible.
Because it makes more than 24 hours.
So that's where really the challenge is.
Also, so actually you have the problem of loops, data types, you have the problem
of authentication, managing two different vendors with potentially two different
people in the data team, which are responsible for each tool.
You know, like it's really an exposure of leadership to bad decisions. And so, you know, people are not really responsible
for choosing bad toolings. And we see this like, you know, so
today, industry is filled up with bad tools, like, honestly,
like, it's a lot of tools are crap. But like, leadership is
actually responsible for having bad business results.
And this is just digging a hole by having the wrong tooling.
So IT investment is actually an IT investment to your own leadership.
So as a data leader or even as a CEO or CFO, investing in the right tooling is actually like ensuring your business performance will
be driven by the right data at the right pace.
Right?
So actually like your company will not be underwater with simple normal growth.
So let's walk through use case because I agree that,
I mean, the loop has been a reliable architecture for analytical use cases and will continue to be.
Right? I mean, it's actually wonderful in many ways for that.
You know, and then reverse ETL sort of adds like this sort of slight operational benefit to it for like analytical type stuff, right?
Where you need to get some data point into some tool or whatever. operational benefit to it for like analytical type stuff, right?
Where you need to get some data point into some tool or whatever.
So I mean it's not going to go anywhere, but I think we should talk about the
operational use cases and how you do two-way sync.
And so what I'd love to do is let's pick two examples.
And my use case is, I'll just make up a use case, you can tell me how close I am to what your customers experience.
I have this use case where we're doing lead intake on some website or app,
and those leads are coming into HubSpot where they get marketed to,
they're sending emails, all that sort of stuff. Right? And then, and so the marketing team or the demand generation team is using HubSpot to
do all of that.
And then some subset of that, you know, of those need to make it into Salesforce, or
let's say all of those need to make it into Salesforce, but the sales team or whatever,
you know, or the support team or whoever really only is
going to focus on some subset of those based on some characteristics, you know,
whatever those specific fields are, et cetera.
Okay.
So I could theoretically run the loop, but the problem is let's say I'm a pretty
big company,
And so those things, those actually, and they have a 10 minute SLA, okay? Is that, am I thinking correctly about the challenge
that I would have as a data team of like, okay,
well, how do I actually make that work?
Is that?
Yes, this is actually a correct challenge.
And I would even say like, even clearly, right?
It's like, okay, so now you make a marketing campaign,
you know, on hotspot and this person immediately logs in,
you know, respond to the email.
It's an absolutely, you know, sweet spot.
You, this person signs up
and actually now goes into Salesforce.
If your pipeline didn't run yet, you will, it will be inserted in Salesforce
by sign up as well as for, from the HubSpot to Salesforce, you know, pipeline.
So basically now you're going to have a duplicate.
So you have to make sure that your pipelines are also robust
to upserts and not only update, right?
And how do you do upserts?
Well, you need to query the data first
and write data second, right?
So actually you need to first,
so you need to first,
so you need to first basically get the data
from HubSpot to Salesforce in absurd,
which is a very big, I mean, inquiries data
that we know if the record is actually present
into the Salesforce.
And then if it's present update or then insert, right?
So there's two API calls.
And so at scale, this is extremely hard
and doing this on batch, batches, it's also complicated.
It's very complex.
So now you have to make, let's say,
most pipelines do query records like one by one, which is very
problematic because if you have batches of 10,000 or 100,000 contacts, which is the size of a
marketing campaign, you're actually going to be like
basically over consuming your API limit for the entire week on Salesforce.
And hotfought is also limited,
right? Per second. So it's going to be, it's not going to work.
Now what is a, so this is challenging in terms of setup,
it's complex setup and it's complex. It's complex. So also maintenance.
Well with now with two-way sync, basically, Staxing tells you, okay,
you create a two-way sync between Salesforce and Snowflake.
You create a two-way sync between hotpot and Snowfl. You create a two-way thing between HubSpot and Snowflake.
So now you have a place in set.
So now you just have two bidirectional arrows.
So now it's very just, it's not even a triangle.
You have a in Snowflake, you have all the contacts, companies, extra, all the tables
of Salesforce and also table of HubSpot.
Whenever you know, because taxync is real time, right?
Is this very important because
the previous type was not real time, right?
So it's even-
Oh yeah, it would run on scheduled jobs, yep.
Exactly, so because StackSync is real time,
let's say a new contact is created on HubSpot.
Yep.
As soon as it's created, StackSync will ship it, right?
It's sub second or maybe one second, two seconds latency.
It's gonna go directly into your Snowflake.
StackSync, we also have a feature called triggers,
which actually enable you to say,
when you observe a certain data event to be transferred,
also triggers this workflow or this database query.
So you say, when a contact is created or updated,
right into Snowflake, from Hubflow to Snowflake, and also create this query,
which tells you, because also,
StackSync also runs a diffing, so it tells you,
take the, if email was updated, you know,
and ID is a field of change,
so it means it's a new record,
write this record into the Salesforce table
into MySnowflake.
So now you made one simple SQL query
because the entire Salesforce data
is also real time into a sync, it's fresh.
So now you can do an absurd, right?
And be pretty confident that this will not really
to duplicate.
So with a simple absurd operation,
you actually know exact,
and you can actually upset on many fields,
which you cannot do on Salesforce
because on Salesforce you might not even be able to query
based on a given field.
In Snowflake, you can do...
Can you filter as well?
So, I totally understand that example where you offload...
Because that's actually a very...
That's a pretty efficient...
That's a super simple query, right?
I mean, that's about as simple as that can run.
Just search based on email or ID or whatever.
Yeah, yeah.
It's super easy.
But could you also filter, right?
So let's say I want to modify the filter going back to the use case.
I want to modify the filter so that I can say, even, you know, even if this
exists in Salesforce, I don't want to send it
because it doesn't meet some sort of qualification, right?
Or, you know, I want to send it with a flat,
whatever it is, whatever that sort of filtering is, right?
So that I can kind of determine
like when different types of things get sent.
Absolutely.
You can also make this filter,
and you say, for example, say,
when a contact is created and the segment is X, Y, and Z,
send it to Salesforce or maybe it's a very large company and maybe send it to the Salesforce of
this company for this child company which serves the large company's network. And then also if it's
for a company, send it to the Salesforce of this other company. Subsidiary which serves S&E.
Oh, like literally a separate Salesforce instance.
Exactly, because now you can synchronize.
Oh, right.
So even if you had single marketing intake, you could send it to...
Oh, interesting.
Okay, yeah, I was thinking about it in a far too linear way.
Exactly.
And this is just a simple SQL query.
So now what we did,
there is no table transformation and everything.
There is just like a query,
which is triggered at the right moment
to maintain this real-time feeling across your system
and a simple SQL query,
or even a db transformation which can run, right?
You can also do batch once an hour, once a day,
as a current use case.
But now this is really real time.
And now when you insert your record,
so now I'm gonna say,
a record has been created in a hotspot.
It has been synced to Snowflake.
Stack sync, you created a trigger which says,
okay, when a contact having these properties
is created and emails the field that changed,
then also put it into Salesforce.
So into the Salesforce table in Snowflake.
So now we run an upsert,
which prevents the use of emergence of duplicates
into Snowflake.
And with Two Way Sync, I just want to just send this,
I mean, because of Two Way Sync,
this data is gonna be inserted into Salesforce.
Yep.
And so now you just created a contact in HubSpot
and you have it in Salesforce and it passed
into your Snowflake.
So it's also available to all of your other systems
and analytics, you know, and dashboards
to actually be observed.
So you have real-time analytics,
operational purposes, because now, you know,
you can reach release,
because this takes maybe two to three seconds
because every square time and all this,
maybe it takes two seconds or three seconds at late at maximum
So three seconds later
You created a code in hotspot and you have it in Salesforce and you can trigger like a welcome sequence or whatever
that's really operational and this entire pipeline has been built by
one second query one trigger
with two to with six and you're handling all the API calls.
Everything, and even batching.
And so in StackSync, it even works in a very clever manner
is that if you go into a Snowflake
and you modify, let's say, one million records at a time,
because Salesforce has a different rate limiting
and Hubflow too, StackSync will just, you know,
batch all these records as fast as possible
within your allowed API
rate limits, which you can also configure and send this data.
So it might take a bit more time, but for Salesforce, we go up to 1 million records
per minute, alpha million records per minute on HasBots, so it can have very fast.
And so all of this architecture and therefore some maintenance is simplified with a single
vendor. So one deal, which has just triggers, SQL queries, and
the bit transformations.
Yep. Nothing else, you know.
Okay. So let's, I said we'll do easy mode and then let's do something
harder. And maybe this isn't, maybe this isn't harder, but I'll try to,
this is the example that came to mind.
Okay. Let's move from Salesforce to NetSuite ERP. This is the example that came to mind.
Okay, let's move from Salesforce to NetSuite ERP.
Any ERP, but whatever, let's just say NetSuite.
Things get more complicated when you think about operational use cases where you have a billing department
who's using the ERP to send invoices,
manage payments and receivables.
But in Salesforce, let's say that the salesperson or the customer support person is working with an individual.
This contact and
the complication, it can get crazy,
but the complication is a lot of times
you have to make multiple hops to go from the contact
in Salesforce to a unique invoice ID,
or purchase order number, right?
Because you have the contact,
and then they have a company in Salesforce,
and it's not always a one-to-one match
of what they're called in the different systems, right? Because in NetSuite, it's not always a one-to-one match
between two systems where there are some discrepancies You know, that's just a simple example, right?
you can use that key, there are some complications there. But there are keys that you can use, right?
But if you have to make a hop for that operational use case
where there are different keys and then a physical asset,
like an invoice, related to a company,
how would you handle a situation like that?
Because there we get into some really interesting
data modeling challenges and some data discrepancies. Yes, absolutely. So basically, if you have several hops, for example, say you have to query first,
you have an email in Salesforce, but also in Nest Suite, so you have to query the contact by email,
which might not even be possible. Then you have to get...
Right, that might not be possible. So then you have to do company name and then...
Exactly. So then you have to get to contact, then you have to get to company, then you have
to get to opportunities, and you have to get like contact and you have to get the company and you have to get opportunities and you have to get like
so list of all the invoices for opportunities and then get the ID all
that invoice to actually get about the payment so all of these are very complex
workflow so if you are in the illegal world you have to get maybe like a
workflow with 15 steps and merges and it It's brutal. Yeah, it's very bad.
And so what happens is that now if you have your entire data
real time fresh into your Snowflake,
you can craft your SQL query to actually get very powerful.
So actually like you have in your workflow,
you would have one Snowflake query.
Then you have the check to check if it returns a result.
Because maybe like you know, you are at a very millisecond
point in time where data wasn't available or something.
So make a check, right?
And then like once you have aggregated all the data,
so with a single query, you're not overloading your snowflake,
your NetSuite with many API calls, right?
You are just querying snowflake, which is much more, you know,
loadable, so to speak.
Then you have your insight, you take your data
and you insert that into the right Salesforce or HUB or NetSuite
table and that creates an update into your NetSuite.
So I feel like, so, so to a sync really makes it that your
date, your tables into your Snowflake or into your Postgres even
are really a real time, you know,
read and write interface to your enterprise system.
It's an equivalent to using an API.
Basically, you're just using an API via SQL.
That's the only thing you're doing.
That's a very easy way to understand it.
So, actually, every time you get an insert,
it's gonna make, actually, you're making an API call,
but actually this API call can be one million rows.
This read can be filtered with any kind of business
and custom logic that you have.
It's really extremely, it's an API which is as flexible
as SQL and as well documented as SQL.
And you can see your data directly. Yeah, yeah. No, I think that is the, yeah, flexible as SQL and as well documented as SQL.
And you can see your data directly.
Yeah, Brooks is telling us we're at the buzzer here, but what a great,
I mean, I think that's the paradigm shift and it took me a minute to get there.
Maybe because I'm a little slow, which is why I fell for the marketing type on Salesforce and HubSpot Sync.
run the loop, which isn't really great for operational use cases, especially ones that are time sensitive, right?
And then your logic has to grow as part of a gigantic Frankenstein model that gets crazier
and crazier over time, right?
And so it's really interesting because I would almost describe StackSync as inverting the
problem, right?
Where it's like, you literally don't think about APIs, you just write a SQL query for as inverting the problem, right?
You literally don't think about APIs,
you just write a SQL query for the use case that you want to solve.
It's super fast because it's a really low, it's not a heavy query.
And then you don't even worry about the APIs, it just happens, right? modifying, you're adding these queries over time
and modifying the logic is isolated.
So it's very easy for anyone to reason about what's going on with Salesforce
contact to invoice, pass, do, whatever use case.
It's a very visible logic which is easy to build and to migrate and to debug, right? And especially it's a very declarative way to operate, right?
API is very even driven, do this, do that.
And then SQL is very declarative, right?
It's like, you know, just of everything,
every data you can see, you know,
from a top level perspective,
just like pump everything you have,
you need and get it back, you know?
And this is very declarative and which really enables you
to build much more robust pipelines.
So that's where we, StackSync really puts this declarative and which really enables you to have much more robust pipelines.
That's where Stacksync really puts this declarative and SQL way to operate
on top of traditional event-driven, dirty APIs.
Just from a simplification perspective, just try to connect to the NetJit API.
In a week, we're going to be still there and we're gonna say, okay, well, maybe it's useful to have something that manages to me.
That my friend is a great sales pitch.
Yeah, just not talking about the integration piece, right?
Just like the authentication, right?
Yeah, that's brutal.
Documentation.
I know a few survivors back in the 90s, which actually understood how the API worked.
So this is a bit like the statement.
Yeah, yeah.
Okay, one more question here.
Can you give our listeners a sneak peek?
What feature are you working on that you're most excited about?
So right now, basically, at StackSync, we are making, basically,
we're also launching workflow automation
tooling which actually plugs in into the syncs.
So for example, say you have a two-way sync, some data events are transferred in real-time,
you can trigger, you can say once when you see these transferings, when you see a new
contact, tell me.
And this tell me can be anything between a sequence of
divvage transformations, can be a workflow automation, can be a Slack
notification to your sales team, can be anything. And this is really
something we'll say a contact change status, boom, notification to the
relevant sales rep. And all of this kind of enrichment. I was holding a
webinar last week about how can you actually say every
time there is a new contact created, go to LinkedIn, you know,
get real time live data about the entire LinkedIn profile,
make a summary and fill up all of this, you know, database fields,
which are actually like the CRM fields.
And now every time I would just type an email into my CRM,
I would see all of this field populating immediately, instantly.
And so this is really with live leading data.
And this is all this enrichment use cases are exactly what we're building.
So really this upgrade into enterprise scale mode of your operations.
This is what Stacksync actually does.
And Stacksync now has become also the leader into the NetStreet 2 way sync.
So if you have any struggle into your team, which has NetStreet,
Shopify, Zendesk, HubFod Salesforce involved,
you know, happy to chat and actually help you architect your best use case
in your precise business scenario.
Cool.
Ruben, this has been awesome.
I really appreciate the time.
Love that we dug in.
Love that we dem in. Love that we
demystified zero whatever it is. Zero something. I guess zero something is an oxymoron, but this
has been great. Congrats on all the success and we hope you have much more in the future.
Thank you so much, guys. Thank you so much guys, thank you so much for hosting. The Data Stack Show is brought to you by Rutter Stack.
Learn more at ruddersack.com.