The Data Stack Show - 242: The Data Convergence: How Operational and Analytical Data Are Merging with Ruben Burdin of Stacksync

Starting point is 00:00:00 For the next two weeks as a thank you for listening to the Data Stack show, Rudderstack is giving away some awesome prizes. The grand prize is a LEGO Star Wars Razor Crest 1023 piece set. They're also giving away Yeti mugs, anchor power banks, and everyone who enters will get a Rudderstack swag pack. To sign up, visit rudderstack.com slash TDSS-giveaway. Hi, I'm Eric Dotz. And I'm John Wessel. Welcome to the Data Stack Show.

Starting point is 00:00:36 The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Before we dig into today's episode, we want to give a huge thanks

Starting point is 00:00:59 to our presenting sponsor, RutterSack. They give us the equipment and time to do this show week in, week out, and provide you the valuable content. RutterSack provides customer data infrastructure and is used by the world's most innovative companies to collect, transform, and deliver their event data wherever it's needed, all in real time.

Starting point is 00:01:19 You can learn more at ruddersack.com. Welcome back to the Data Stack Show. We are recording live on site in Oakland, California at the Data Council Conference, ruddersack.com. you know, 30 second background on yourself. So my name is Ruben, you know, I grew up in France, you know, now I moved to San Francisco. I'm co-founder and CEO at StackSync. So I worked in several startups and medium companies, you know, medium size companies, in everything from ETL, reverse ETL, data stack, as well as on AI, you know,

Starting point is 00:01:57 before it became something big. Before that, I was, I did a master in computer science, like in Switzerland, and as well a bachelor in business. Actually, this is where actually I come with a double perspective on business and data world. That's awesome. So what are the topics we're excited about talking about? Is this convergence, I would call it,

Starting point is 00:02:16 of analytics and operational data and what that looks like. So what's something that you're excited to chat about? Yes, exactly. So what really excites me at the moment is really to see how what was traditionally analytical data and what was traditionally operational data actually like they really start to merge and actually meet the same level of SLA.

Starting point is 00:02:37 So actually to see that a data warehouse can become operational and a database can become analytical. And like, this is really something which I see coming in many of the small to large size companies. So it's really like a trend really in the industry. And I really think this is here to stay. Yeah, so this is really something which excites me. All right, let's dig in.

Starting point is 00:02:56 Let's do it. All right, Ruben, so excited to dig into operational versus analytical data, how and why those two things are converging. But before we do that, let's talk a little more about your background. So you mentioned you studied both business and computer science, and that gives you a unique perspective in your job today. I'm sure there are some stories there, maybe just kind of throughout your career so far,

Starting point is 00:03:22 what would you say, like how has that impacted and maybe given you a kind of superpower to get you where you are today? Yeah, absolutely. So when I was a kid, I really started to hike around and doing so much email pages and whatnot. And so what happened is when I started university, I started with a bachelor in business administration.

Starting point is 00:03:40 So, and my university just launched what was called in all this data science fundamental program. So actually like nobody heard about data science and all that. I was explaining to everybody what was data science when they went and told me like, what are you doing Ruben? And so, and from there, you know, I really understood, you know, this is where I can have a mean back photography.

Starting point is 00:03:56 I can, I really think I can identify patterns that nobody is seeing around and this, then this just turned out to be the largest company we know today. So, and this was like probably like seven years ago. So what happened is that basically I then I went to Singapore to work in startup, also to do an exchange semester and there I was fully technical and then I came back to Switzerland where I grew up and then I did like a Master in Computer Science.

Starting point is 00:04:20 So I think really the ability to come up with computer science and data science with a perspective of business, so if you're not thinking fully as an engineer with all the complexities and details, enables you to actually simplify some concepts which you just don't know, but actually you just end up abstracting them away and finding a better solution. And being able to think as an engineer into a business ecosystem really helps you to actually frame and make your way much quicker to things even without having the full business knowledge. And this is really something which enables you to go much faster. And so that's what I value a lot to do into my education.

Starting point is 00:04:57 And then after doing my master in computer science, I really rolled out in several companies. I built out data capabilities from zero to hero for several companies. I built out the capabilities from zero to hero for several companies. Also in AI before all this LLM thing was a thing. And yeah, this is where I come from. Awesome. I mean, it makes so much sense too. I think Tavik will dig into operational

Starting point is 00:05:20 versus analytical data really matched those two things we're talking about, right? Like your operational mind kind of business. How does that meet analytics? And we match those two things we're talking about, right? Like your operational mind kind of business, how does that meet analytics? And we tie those two things together, which is what you're doing at StackSync. So super excited to dig into that, but let's just kind of back up

Starting point is 00:05:36 and even just define operational, then analytical, like in the traditional sense before we talk about merging those, how would you just define operational data? Yeah, so operational data, imagine yourself more as a full-stack developer, right? So you're building an app with notifications and a UI and a postgres backend, for example, right? Most typical setup.

Starting point is 00:06:03 So what you have is actually like operational data is really the data which is based out of example, right? It's most typical setup. So what you have is actually like operational data is really the data which is based out of events, right? Out of clicks, you know, out of actions that the user take and which need an answer right now. So actually there are actions which are not batched. They are okay in real time. And they are, so actually there are two characteristics. Like one is they're real time.

Starting point is 00:06:22 And second, they are not batched. They are like single events where it is single JSON, not even a list. It's a single JSON. I sign up, it writes to Postgres and then I get access to the app like to Ponsign. Instantly. Exactly. This is like, there is no delay. And then there is this, so it's like the data which actually you even need a response to. And so, and this is like, and so then like you have this, the contrary, right? The most opposite spectrum side, right, is more the analytical data. So, maybe like the way I define analytical data is data which actually makes sense at a navigated level.

Starting point is 00:07:02 So it's not a single event, it's actually a navigate of the level. For example, let's say your revenue metrics, they only make sense once you can compute several deals together, right? Like a single deal makes no sense, it makes for a revenue metric. So you would need to actually aggregate. So aggregate means stepping back enough

Starting point is 00:07:18 to have several data points and not, because if you dig too much into a time range, for example in a dashboard, you would only have one data point, which you go very precise. So this is where analytical data is make sense at a aggregate level, is oftentimes data from yesterday or from even earlier.

Starting point is 00:07:39 And so also if it breaks, if there is no criticality on fixing it. I mean, like there can be business consequences, but at least a customer does not lose access to the app. This is really so it's like batched and it's non-real-time. This is really like the elements which make analytical data analytical. Before we talk about the convergence of those two things, how is that possible today? What are the tech advancements that have made that possible?

Starting point is 00:08:11 Well, I think there are plenty. There are plenty. There is the emergence of ETL. In the history of data engineering and data in the world, in the beginning, we didn't have enough, anyway, just storage. So we just made, hey, we need some storage to store some data. So now we invented stuff like BigQuery, Snowflake, you know, like database in general. And so, I mean, just database. Then we made, oh, no, it's bigger data. So now we did data warehouses.

Starting point is 00:08:37 So the different thing. And now to have a data warehouse, it's useful, but we have no data inside of it. So now we started to create data pipelines, right? And this is where actors like Fyfran and Urbite also come in, among many. And then I just ship data to the data warehouse. Yep. So now you have the data warehouse with some data inside. Now how do we make it even useful?

Starting point is 00:08:59 Because it's useful, but you know, just there, just costs money, right? So now we're gonna make data boarding, right? So now, Nucle Studio, you know, Tableau, Sigma, the sort to emerge, and become very, very advanced. Now that we have some data warehouse with some data inside, and also some dashboard, we have some metrics, something which makes sense, some metrics which make sense from a business perspective. And now what?

Starting point is 00:09:22 Now what happens? At that moment, you actually need to take action on the data and this is the emergence of the reverse ATL kind of trend, right? So it's like taking data from a database or data warehouse, I mean more data warehouse but like shifting it back to the external systems where your business teams take action and live daily, right?

Starting point is 00:09:44 And so this is where the reverse TL. And then you have, OK, now we have this loop, right? But this loop needs to be now faster, right? The data grows bigger. You know, we have much more product data. We have much more logs. And so now we need it bigger and faster, right? So bigger means, OK, different technologies to handle things.

Starting point is 00:10:04 Now you have to batch. You have to take care of API rate limits, quotas and all this, but also you need to make it faster, right? And faster, the way you build a real-time infrastructure is completely different on how you build a batch infrastructure, right? You can think of real-time as smaller and more frequent batches, but actually if you think like this and you try to sync to your snowflake like this, it's gonna destroy your cost. And so this is why streaming is not really equal

Starting point is 00:10:32 to many small batches. Even though in technical terms it actually is, right? So in the end everything is a byte. So yeah, so this is where basically now you're becoming into real time and where two-way sync comes in, right? So now you don't have like one way sync plus all the way and if we realize it One way sync and all the way does not make two way sync And now because actually you have conflicts which happen because real time actually makes so that you don't have time to

Starting point is 00:10:57 To consolidate data before conflicts happen Yeah, and this is why you know two way sync does not equal anymore one way plus other way, especially into this real-time context. And so the beauty of this is that we didn't talk about this whole transition and evolution of light, one way is the other way. In storage, one way is the other way, two ways. The evolution of this, we didn't talk about the fact that the data itself, the underlying asset, by nature became more real-time, right? Because now you think it's faster, so it changes faster became more real time, right? Because now you sync it faster.

Starting point is 00:11:25 So it changes faster between the two sites, which is an external system, like Salesforce, CRM, and a database like Postgres or Snowflake, Data Warehouse. So now that you have these two systems, which just go more and more into real time and constantly even more, you know, and to what I think is an ultimate step into making this fully consistent, then it becomes real time. So now you are also delivering more real time dashboards. It's the one we talked about, Luka, Sigma, and all this. And so this actually is an entire quality of the data ecosystem increased on every side.

Starting point is 00:11:59 And so this is where we start seeing a pattern where analytical data becomes more and more fresh, now like fresher. And this is where we start actually joining the attributes of operational data, which is events, right? So it actually calls single events, actually more towards training, as well as real time and analytical data, right? Actually, you can actually ship back scalable data, which also shares that root of being an aggregate and making sense at an aggregate level because you have more data available into the rounds. You can actually ship real-time aggregates.

Starting point is 00:12:35 And this is actually where the two worlds start converging. So we're not that convergence, but what do you think, Eric? Well, okay, I have a question. So I think about because this is really interesting you're describing and this is essentially what stack sync does You're describing at a very fundamental level communication between two databases, right? Which was typically done through a series of pipelines and patch jobs and you know all that sort of stuff And so now we're talking about this when I I think about, just instantly if someone said, okay, think about an example of analytical data being used operationally, and you bring real time into it,

Starting point is 00:13:13 I instantly think of something like Flink, which is very different than two databases talking, right? But you have an actual event stream, you're running calculations over that, and then you're basically using what would typically be considered like aggregate analytical data, but you're doing something over that, and then you're basically using what would typically be considered like aggregate analytical data, but you're doing something like maintaining a state

Starting point is 00:13:29 that you can then use to like enforce something within an application, right, that needs to happen in real time or whatever, which is, you know, that's a streaming architecture, that's pretty common, but we're talking about a similar type of use case, but between two database systems, which is really interesting.

Starting point is 00:13:47 Right. I think another, I really like to overlay like real world physical examples here. Another background in logistics and distribution. Absolutely. And what I think is really interesting about that is the generation that we're moving past was called data warehousing, right? So like a warehouse is somewhere like you store things, physically store things.

Starting point is 00:14:08 And then I was thinking like, well, like how did we evolve this? Like how did Amazon and all of their, like their distribution evolve? It's called a fulfillment center, right? And fulfillment centers are highly operational. Warehouses are more like you bring something you store it might sit for a while, it's low transaction,

Starting point is 00:14:24 like it's not much, if you're in a center. He then like things are constantly going in and out You're doing value-add work like white labeling for example And it's just a very different feel in a warehouse and I think we're seeing this and it wouldn't as warehouse has become more operational They're closer to these fulfillment centers increases complexity higher standards for like low latency and speed and then lots going in and out Versus like just going in and out versus just going in and storage. Give us an example from a customer, you talked about Postgres and Salesforce, which I think is probably really familiar

Starting point is 00:15:00 to all of our listeners, right? Yes. Give us a real customer example from someone of the chiefs worked with at Stackless where that dynamic of real-time two way, how does that work? What's the use case? And then how would you actually build out? Yeah, absolutely. So actually, there's several very typical use cases.

Starting point is 00:15:17 And for example, let's say one very typical use case is how you actually build internal tooling on top of your ASML system. So for example, let's say you have a portal where you actually activate user access rights to a SaaS product, for example. And so, or eventually also like where, for example, let's say you would actually connect, you would actually check if your invoice has been paid by a customer, enforcing some business rules, which are very hard to actually frame with Salesforce workflows

Starting point is 00:15:45 because it's a very limited tool. Yeah. And so, and you want to actually, you want your business team to actually manipulate this data into a very, into a way which is very customized to your business. Right? So, how do you do it? So, actually now you're going to have to actually build a UI, almost like an app. And actually, like instead of querying the database,

Starting point is 00:16:05 which gives you some consistency guarantees, you actually would need otherwise to actually send API calls to the API all the time, right? Which involves, you know, like pagination, you know. Thinking about building this in Salesforce and I'm... You know what I mean? I feel like a sick feeling in my stomach. Oh really?

Starting point is 00:16:24 It's a lightning app, right? It's a simple lightning app. It's a simple lightning app. Yeah, It's a lightning app, right? It's a simple lightning app. It's a little Apex code. That's it. It's done. The trick is done. We just don't want to be that guy to build it. So we just want to be someone else.

Starting point is 00:16:39 And this is your alternative. It's like, okay, now you get stacks, which powers real time into a thing between your Salesforce and your database. So you would actually synchronize all of your data to all of your relevant data from your Salesforce to your database. Actually, you have the exact same schema, so exact same objects,

Starting point is 00:16:57 so exact same everything, right? Data types, perfect everything. And so it's like if technically, instead of writing to the API of Salesforce, which would actually let's imagine now you actually have a UI and like 100 employees, a team, I could log in and actually start doing the same transactions. It's going to be, it's going to make like hundreds of calls per second. Right. So, and this is not sustainable, right? So this is why actually you need a database, which actually can handle much more and actually where you can batch this. So

Starting point is 00:17:24 Stacksync would just come in You have all your data into your database and you would build your internal portal Exactly as you would build a normal app into the database, right? So very familiar for your engineers to build so your engineers and you go to market knows exactly How to behave because like the database is a very comfortable zone for them It's what they use and daily, and what they love. They know it very well. No need to read documentation. So now they would just read and write from a database. And so every time there is an access right that needs to be changed, someone

Starting point is 00:17:53 calls you and say, hey, Ruben, please change my access writing for this product. I've got an invoice and I didn't get it. So I wouldn't go into my internal portal. I would actually be able to change anything. And this would actually change into my database, right? So very easy to build. And StackSync would just synchronize this directly to your Salesforce. And so you didn't make a single API to Salesforce, but you actually modified your data

Starting point is 00:18:15 from internal portal to Salesforce because you use a database as an integral tool. And you can use a data warehouse for this as well for different use cases, right? So I mean, like this is extensible to technically some reverse CL use cases where you actually have both read and write operational, but technically imagine yourself as an evolution of software. In the beginning, when you built the software and someone would, like in the very early

Starting point is 00:18:40 age, the first days of software, a client of a software would tell you, hey, please give me access to my underlying data. They would give them access to the underlying database. But the problem is that the customer would mess up with the data, we completely unmanageable. So we invented the concept of APIs. It's a structured firewall which actually enables only certain type of operation which can be validated before it's actually accessed the database. And so Stacksend by doing the 2S sync actually replicates this database. I mean use an API to replicate the database. So if we put back, okay we give you back access to the database which is underlier system and this also breaks vendors lock-in. So instead of

Starting point is 00:19:22 like before you were mentioning like lighten Hub, right? So John, and so John, now instead of building the Lightning app, you know, which is crazy, and you are forced to do, because actually the data is in Salesforce, so you have to use whatever can access Salesforce database. Within your more locked in. Yeah, and now you're even more locked in. You know, and so what happened is that now basically

Starting point is 00:19:41 because you have the same data, but actually you have it outside of your Salesforce ecosystem or your NetSuite ecosystem or your server ecosystem. Actually, you can build exactly like you would build a Lightning app, you know, customized thing, but with your stack, which you're familiar with, with the database you're familiar with. And it's not locked in because actually it's like with like your owner of your own data, which is within your own premises, right? And so actually like in the end you end up with like the exact same product which is

Starting point is 00:20:11 like a custom app on top of your data, but you are just outside of this locking ecosystem and do we think just removes API layer which actually was very useful for many years but now creates a complexity as data scales. I have a question about the schema. It sounds like it works like magic, but I just need to verify that, right? I have a Postgres database that backs my app, and I have Salesforce. And my guess is that listeners who have had to build custom integrations to do some of the type of stuff that we're talking about,

Starting point is 00:20:52 or have been exposed to the nightmare of, we have this Frankenstein Salesforce application and we need to get Dana into it or whatever. The Postgres instance that's backing my app has a data hierarchy that doesn't necessarily represent the same information, the same information hierarchy that's in Salesforce, right? So one good example would be,

Starting point is 00:21:15 I have the concept of users in my app, but in Salesforce I would have the concept of also an account because the sales team rolls that up and looks at it, like would roll the concept of also an account because the sales team rolls that up and looks at it, like would roll multiple users up, right? But my app doesn't really have that concept necessarily in the same way that it's reflected in Salesforce, right? And so there's a difference in the way

Starting point is 00:21:36 that the data is actually structured. And so you kind of mentioned, you know, it's like, okay, you have the same schema, but I'm sure a couple of listeners were like, wait, how does that actually work? Because the schemas are fundamentally different in terms of the way that the information is represented. And either there's a subset in one or a subset in the other

Starting point is 00:21:53 and the actual relationship between the data is different. Yeah, so Eric, I think here you are absolutely right. So actually here you are mentioning that because the apps are different, actually so your in-house grown app and you want to integrate with your CRM, and CRM itself actually have different data models, right? So they organize accounts and contacts differently

Starting point is 00:22:12 as you would use workspaces and users into your app. And also like the IDs that you use to associate it. In Salesforce, you're gonna use Salesforce IDs to associate your contacts and accounts, but in your app, you would use your user ID, WordPress ID, and whatnot. So these data models are fundamentally different, right? So actually, how does that actually work?

Starting point is 00:22:32 So think about already having all of your Salesforce data being able to be synced in written, write fashion from your database, already abstracted away all the complexity of the API, of the authentication, rate limiting, like formatting, like formatting, you know, like also casting data, right? Of course, if you write a data type, you know, like a timestamp and a float, you know, in a database,

Starting point is 00:22:52 you know, might not be the same as in Salesforce. So actually you lose precision on decimals, you know, timestamp, you know, have different time zones. So actually all of this complexity, you have to take care of. And so already let's assume that, you know, now you just have your data, your Salesforce data into your database.

Starting point is 00:23:08 So you would have, let's say different table, so your users table would not be synced, would not be the one to sync with your Salesforce. Actually you would have a Salesforce users table and an app users table. And the reason why is because also like, not only the schema is different, but also the content of different semantic content.

Starting point is 00:23:25 A user in the app is not necessarily the one contacting of CRM, which might just be a prospect, you know? So also, otherwise it means, you know, everyone, every time someone texts you on email, it's obviously updating to your CRM and now you create an account for this person as they receive a welcome email, you know, like, yeah, it's very strange. So, so, so, you know, like a magic thing to log in, right? It's kind of aggressive. So what you would do actually,

Starting point is 00:23:49 would actually synchronize the same, but actually you would be able to just, from your app, just write this new user into this user's table of your app, and this new user would just stand up into your contact table in sensor, so actually you would have to make one transaction with two tables that you write, and this would actually synchronize to both places. So actually you would have one transaction with two tables that you write

Starting point is 00:24:05 and this will actually synchronize to both places. So actually you can manage this difference. So there is still a small difference, right? But also it's because like the semantic content of both is not the same. And also maybe like, so when you want to create, you know, someone signs up, it's likely someone that already texted

Starting point is 00:24:21 your customer success team or sales team, et cetera. So the contact is already existing into your CRM. So you don't want to only insert but you want to up-cert in reality and also update some fields. For example, let's say you want to maybe auto complete, I mean, auto fields a customer, your customer profile into your app from the information you have in the CRM and vice versa. Update the CRM data. So this consolidation actually is something you can actually build much easier in the postgres with SQL queries, right? Then you would actually have to add with a SQL query

Starting point is 00:24:55 because and an API code, which actually are not atomic by definition, right? So one can fail, the other one cannot. And this is where the CMO runs to you and say, hey, the metrics are not correct, and I didn't get my bonus. Can you please check your cash flow? Oh, actually, we had a bug in like 50 customers

Starting point is 00:25:12 actually didn't backfill until you share it. We got throttled on. Yeah, salesmen throttled us. Yeah, yeah, salesmen throttled us, and like, sorry for your Bahamas holidays. You know, like, I will run back, you know, copy paste manually. You know, that's a list. Thank you so much.

Starting point is 00:25:27 Now this is, this cannot work into a reliable setup. And this is why, you know, this mission critical use cases cannot just rely on having like API calls, which are not on topic with that any queries and you know, like which actually have like different realities. And so, and also like, imagine yourself, also the token is revoked, right? You know, the person who authorized Salesforce,

Starting point is 00:25:46 you know, got fired. Like, so now the user is canceled. So now actually what do you do? You know, your app just uses access to your Salesforce systems. What do you do with the leads? You need to have a track cache, you know? Which actually save all of these records

Starting point is 00:25:58 and becomes like a huge mess. I usually send to engineering to make it really robust. This is hundreds of thousands of dollars of consulting or engineering time. And so adding into a database is much better because now you can just work into a database. If your CRM is out of sync, you know, it's disconnected, you would just, actually StackSync will just tell you, hey, we can't connect to your Salesforce anymore, please re-authorize. And once you re-authorize, we're going to catch up everything which happened in between. From your app perspective, the reality of Salesforce, which is what is supposed to be into your database,

Starting point is 00:26:30 is still up to date with the app, right? It's still like atomic and sub-mini-second, you know, like in sync. And so this is exactly what two-way sync enables. And so, and if we see, and if we see it like, you know, all this, it transforms both the operational world and also like the ETL plus reverse ETL world because actually like we understand that as real-time pace you know also the way you handle conflict between apps is actually very different. So but

Starting point is 00:26:57 about all these errors and all these problems. So this is what ETL and reverse ETL does not equal to async. But two-way sync, you know, because it has both direction plus a conflict resolution mechanism and a consistency mechanism. This is two-way sync as one would intuitively understand it. Totally, it makes several cents. We're gonna take a quick break from the episode to talk about our sponsor, Rutter Stack.

Starting point is 00:27:20 Now, I could say a bunch of nice things as if I found a fancy new tool, but John has been implementing Rutter Stack for over half a decade. Now, I could say a bunch of nice things And if you've ever seen a tag manager, you know how messy it can get. So RutterStack has really been one of my team's secret weapons. We can collect and standardize data from anywhere, web, mobile, even server side,

Starting point is 00:27:52 and then send it to our downstream tools. Now, rumor has it that you have implemented the longest running production instance of RutterStack at six years and going. Yes, I can confirm that. And one of the reasons we picked RutterSt at six years and going. of your stack. to learn more. Right, so marketing or whatever. 100%. What's interesting is that I'm essentially maintaining a database in the middle here. And so I can sort of almost this concept of like create real-time materialized views

Starting point is 00:29:11 that apply accurately to each system but based off of like a single database where all of the data lives and I can match the schemas. Yes, that's totally correct. And for example, I see a lot of companies which use Salesforce for sales, App Store for marketing, and you know, NextS And for example, I see a lot of companies which use Salesforce for sales, HubSpot for marketing, and NetSuite for accounting,

Starting point is 00:29:27 but the customer record is in all of them. And also, there's their app database to power their products, whatever, the logistics system or whatever. So imagine yourself, before we discussed about, okay, you have a SaaS app, right? So we have a user's table from the SaaS app and a user's table from,

Starting point is 00:29:44 in a context stable in Salesforce. Now you add NetSuite, HubSpot, and all your other systems, and you have the exact same logic. So the SQL query just get a little bit longer, but you didn't have to learn about the APIs of each system. You just actually make the same generic abstraction over all tables that you are thinking,

Starting point is 00:30:03 and this is exactly what's actually happens, and what's gonna make happen. So I had a customer last week who told me like, I'm very grateful for you guys to actually read the documentation which I don't want to read. Right? So he is really, you know, I'm very thankful for this. You know, I'm like, yeah, thank you so much, you know.

Starting point is 00:30:22 Actually, the guy did not, did never open the HubSpot documentation or system documentation, actually, both in the case of, for this case, actually it was both cases. And actually he was just able to actually get a production app, really, within two weeks and without even touching the documentation or making the single API call.

Starting point is 00:30:40 Actually this guy has no idea how the rotation of tokens works in, for most of the system which is actually different because he didn't even have to do it. That's the magic and this is the speed you want to give to your teams. This is how your teams can go faster by having the right tooling. So final question, why does it take so long to get here? Why do you think we went the ETL route, the first ETL route?

Starting point is 00:31:08 Like, if you sat down, I think, with somebody with no context, this is probably the most intuitive solution, but that was not the evolution of how things have evolved, at least in the analytics space, of reverse ETL vendors, separate from ETL vendors, separate from, so why do you think a progression happened as it has? Yeah, exactly.

Starting point is 00:31:29 So I think this is a great question, right? It's like, no... Why not have some... Ruben is asking the same question. Yeah, I want to. It was great. But actually, it's essentially... I mean, my suggestion to this is that,

Starting point is 00:31:40 first of all, the market evolved in a certain manner, right? And so, actually, we can't really forecast and actually it follows what the market evolved in a certain manner. Right? And so, we can't really like, you know, forecast and actually it follows kind of what the market says. So, it leads to, so, Stacksync is a natural evolution of the ETL, reverse ETL and two way sync industry, right? It's like the data industry just evolves like this. And so, I can ask the same question and say,

Starting point is 00:31:59 hey, why do we use IBM DB2 and Oracle databases when we could actually just use POSRES, right? And maybe something else, even like a snowflake for everything, why not? So just because we actually like the technology was just not there at the moment, there yet. And then like, I mean, if we think about like, maybe seven years ago, streaming use cases

Starting point is 00:32:22 were only for this elite companies which were able to achieve this. You know, like even Flink, you know, or something like this, you know, was not even existing, right? And now some people build white label on Flink, you know, what label on like any kind of Kafka use cases. So this is just the underlying technology just became different, right?

Starting point is 00:32:41 So Snowflake or even BigQuery, right? You could only have maybe, I remember when I was working on Snowflake a few years ago, right? So Snowflake or even BigQuery, right? You could only have maybe, I remember when I was working on Snowflake three years ago, right, and seven years ago, six years ago, actually. So the concurrency limit was only seven queries concurrent, right? How do you want to do some real time

Starting point is 00:32:54 when you have only seven queries concurrent, right? So maybe three are taken by the business team to refresh all of these dashboards, and you only have three for all of your pipelines. So how do you want to do real time on this? It's just not possible. But now, you know, it's like much more. And Snowflake had no CDC,

Starting point is 00:33:09 but now they have Snowflake streams. So you can capture changes much faster and before you would have actually I don't even know how to do it. So this is how you can actually, there's an underlying technology just evolved and make this use case possible. And you know, it's not that the people before

Starting point is 00:33:23 didn't do the job, it's just because like they didn't have to tooling so they didn't but think I don't know to think to be very honest, right To a sink is a much more complex technology than ETL or STL would be and this is why you know Like the market is just so juicy already for these technologies, which are simpler. Why would you engage in more complexity if? The market is already juicy, right? No, it's just. Compared to that, like just writing a Python script

Starting point is 00:33:51 is way easier. Right? I mean, for an individual use case, like I need to get data from here. Yeah, limited use case. Into Salesforce, right? Right. You're just gonna write, like,

Starting point is 00:33:59 just write some Python and, you know, whatever, right? You're not gonna actually introduce a database system into that equation, because it does add a lot of complexity. But as a managed service, it's like, well, whatever, right? You're not going to actually introduce a database system into that equation because it does add a lot of complexity. But as a managed service, it's like, well, yeah, I mean, this is like, makes way more sense. 100%. This is exactly correct. also what you're doing in Postgres because when you think about scale, like that's pretty interesting from like a tech standpoint.

Starting point is 00:34:27 Come back on and let's get really nerdy and dig into like how you're actually. Yeah, absolutely. How to scale like Postgres to hundreds of millions of fracers and real time sub-second sync, you know, on this. It's really impressive. So yeah, so just for like, let's talk for everybody, you know, like, so StackSync powers real time and to a sync between any CRM or ERP and databases. So Stacks supports Salesforce, HubSpot, which we mentioned, NetSuite as well, but we also support any other CRM

Starting point is 00:34:53 and any kind of database, right? So from Oracle DB to MySQL, Postgres, MongoDB, Snowflake, BigQuery, you know, all the, I already mentioned MongoDB, but you have any database and this, this generic pattern is something you can apply to any system. And so yeah, more than happy to get in touch if you guys have any questions.

Starting point is 00:35:11 And so like, if we have this discussion about this, how to scale the postgres, this is another chapter to actually take the technology for just much more full. Yeah, yeah, yeah. Awesome, weird. Before we wrap up, folks want to learn more about StackSync or connect with you Ruben, where can they find the company, where can they find you?

Starting point is 00:35:28 Absolutely. So people can find us like on Stacksync.com or just email me at Ruben at Stacksync.com and I'm super happy. I read all of my emails. So you will get an answer from me personally. Awesome. And that's R-U-B-E-N at Stacksync.com. Exactly. Awesome. Cool Ruben.-U-B-E-N at Stacksync.com. Exactly. Awesome. Cool. Ruben, thanks so much for joining us. This is episode two live from Data Council A RAP, but we will have another episode dropping,

Starting point is 00:35:54 so tune back in and we will see you on the next one. The Data Stack Show is brought to you by Rutter Stack, the warehouse native customer data platform. Rutter Stack is purpose-built to help data platform. Rutterstack is purpose-built to help data teams turn customer data into competitive advantage. Learn more at rudderstack.com.

The Data Stack Show - 242: The Data Convergence: How Operational and Analytical Data Are Merging with Ruben Burdin of Stacksync

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.