The Data Stack Show - 169: Data Models: From Warehouse to Business Impact with Tasso Argyros of ActionIQ

Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. We are here on the Data Stack Show with Tasso Argyros. Welcome to the show, Tasso. We're so excited to have you. Great to be here. I've been looking forward to it. Thank you for having me.

Starting point is 00:00:38 All right. Well, give us an abbreviated background. So you're the CEO of Action IQ, but you've done lots of database stuff. So just give us a brief background. Yeah. So I've been a database guy my whole life, more or less, my whole professional life. I grew up in Greece where I studied engineering. And then I came to the US. I started at PhD at Stanford to study databases and distributed systems. And about a couple of years into that, I dropped out and just started one of the first third nothing massively scalable database companies at the time. That was back in the 2000s. It was called Aster Data.

Starting point is 00:01:22 And Aster was one of the first companies that could deploy very large databases on commodity hardware, right? So much lower cost to store and analyze big amounts of data. That was, you know, pre-Hadoop, it was around the same time that MapReduce and that stuff was coming out.

Starting point is 00:01:39 I sold that to Teradata, which is one of the, you know, a big data warehouse company at the time. I was the largest enterprise data warehouse company and spent a few years there. Definitely a great school in databases and the business of databases. And then, you know, I wanted to do something slightly different. So I left and I started ActionIQ. We're a customer data platform.

Starting point is 00:02:06 So there's definitely a bunch of database technology involved, but at the end of the day, we have a UI and our goal is to empower the business users along with the data engineers. So databases were a technical product and ActionIQ serves a dual purpose as I like to think about it. You know, which kind of brings us to today, you know, CDP is a big, exciting market and I'm sure we'll talk about it in the show. Yeah, a hundred percent. And by the way, it's also like one of the things that like really excites me in like the conversation that we are going to have today is this connection between the data systems, at scale especially, and the business use case.

Starting point is 00:02:51 And you chose, I think, kind of an extreme use case here because you have a problem that, from my experience at least, when we are dealing with customer data at scale, it can be hard for the data platform that you are using and like how to interact with it. But at the same time, you have one of the most like demanding in a way customers out there, which is marketing people, right?

Starting point is 00:03:16 Who have to use this and they have to use it like in a way that it's very provable that brings value to the company. So I'd love to get more into connecting the dots there, how data systems and the evolution of them led to today to support this kind of use cases, and also how you solve this very hard product problem.

Starting point is 00:03:42 It's one thing to build a database with a terminal like SQL. It's another thing to build something that someone needs to slice and dice data for marketing campaigns, right? So that's something that I'm really excited about.

Starting point is 00:04:00 What's in your mind? What you would love to get deeper into today? Yeah, so I think of course what you would love to get deeper into like today? Yeah. So I think, of course, what you say is spot on, right? So I think with the CDP, you know, we had our work cut out for us because, first of all, for the business user, you need to abstract things enough so that they can do stuff without understanding all the underlying data.

Starting point is 00:04:29 They shouldn't know SQL and they shouldn't be able to know what every table column means to do work. So you need to abstract things enough for the business user to do the work, but not so much that they can really do that much anymore, right? Because you've abstracted things to the point of elimination. And the other thing that I think is interesting is that it's not just the business users, right? So we have the business user persona, but we also have the data engineer and analyst persona.

Starting point is 00:04:54 So database, you know, you have the database users or engineers or analysts that are using it, right? Everybody knows, you know, at least SQL, right? And people understand data structures and what the data means. And in our world, some of our users do, but some of our users don't, right? So you also have like this multitude of users. So it was definitely an interesting problem

Starting point is 00:05:15 which is kind of what I was looking for. But beyond that, I think it's interesting to think how the CDP and the database world have been kind of intertwined, right? And, you know, some of the latest trends in the CDP world, like composability, are enabled and were created because of how the cloud databases, right, have evolved in the past few years. So I think database architecture evolution and CDP evolution

Starting point is 00:05:45 kind of go hand in hand even though they're separate spaces. So I think it'd be very interesting to talk about that and you know what is a

Starting point is 00:05:53 CDP right and which is you know hours of debate right. It can take place in that. What is a composable CDP

Starting point is 00:06:02 all the stuff is fascinating to discuss. Yep. That's super interesting. I can't wait to get into the details here. Yeah. Let's dig in. Let's do it. Okay. So let's start where your story begins at Stanford and then kind of go from there because you sort of wound your way through databases and then sort of ended up at the

Starting point is 00:06:26 business user. Can you just trace that path for us a little bit? Yeah. So I landed at Stanford and it was really such a fascinating time for me. I got into the PhD program, computer science, which is obviously a very high esteem program with so many great people have come out of it. And before Stanford, I had done some research in data mining, data analytics, and my intention was to go study databases.

Starting point is 00:06:55 But what happened was I ended up meeting this professor, David Sheridan, who is this really brilliant Canadian professor and researcher. And David was the first check into Google. So he gave them the first seed money. I think he ended up owning 1% of Google or something like that, which is pretty good. I don't know if he still has it or not. Pretty good is probably an understatement, but we'll leave it at that. That was a good ROI for a seed investment.

Starting point is 00:07:30 And together with a couple of other posts like Rajiv Modwani who unfortunately passed away and a couple of others. And if you recall at the time, Google had implemented search using commodity boxes, right? There was AltaVista before they were using these big mainframes, very expensive, right? And Google, they would take these pizza boxes and deploy the search. So David came to me and he was like, hey, you're a database guy. Could you build a database the way Google built its search? That was kind of the initial problem statement, I would put it.

Starting point is 00:08:10 And then I met up with a couple of other students that were looking at the same problem from different perspectives. There was some peer-to-peer database research at the time that was relevant. And so we started AstroData out of Stanford. So my advisor put in some money. You know, we had angel investors. There was no formal seed investment back then, right? So you had to find individuals to do that.

Starting point is 00:08:32 And we did end up developing a database that essentially storage and compute was together in commodity boxes. We would buy Dell or HP servers and many of them, right? Hundreds of them. Our first customer was MySpace, which at the time was, you know, as big as Facebook, right? It was the Facebook of the day. Wow.

Starting point is 00:08:57 And we would deploy massive scale initially for MySpace's customer data. And what's interesting about our approach... Can you just talk just very briefly about what was MySpace doing before and then what were they sort of migrating onto Aster? What were specific workloads? Because obviously they didn't do everything at once,

Starting point is 00:09:21 or I would guess that they didn't. Yeah, so MySpace, so essentially MySpace was a Microsoft shop, so they were using SQL Server at the time. What SQL Server couldn't do was all the clickstream data, right? So you could do the profile data in SQL Server, and they could act as, you do operational analytics on that, but it was the event data that was massive in scale, right? Because the MySpace user were all over the place, obviously, right?

Starting point is 00:09:47 Yeah, yeah. That they couldn't do. So they used this initially specifically for any event behavioral information. It was a naster. And then the profile, the most static information about the customers was in SQL Server. And over time, they expanded the usage, right? They did more and more. Like I remember,

Starting point is 00:10:08 MySpace had one of the first revenue sharing agreements for music. So all the loyalties would be computed through Aster data because it had to do how much of a song did you listen to, had to do with how much money you would pay to the labels. So that was also based on event data. So it had to be computed. You know, very stressful, by the way, to be running, you know, queries for, you know,

Starting point is 00:10:31 what was like huge amounts of money at the time out of our systems. And so, you know, and what's interesting, I think about that architecture is that before Aster, just like with Google search, if you had the large scale database,

Starting point is 00:10:43 you had to buy a mainframe. You would buy a multi-CPU server from IBM. You would buy a disk array from HP. And you would have to spend $10 million just on the hardware, just to get you started, just to build a 10-terabyte data warehouse. And so the whole idea with Aster was, okay, we bring storage and processing together. You partition the data, you partition the workload. Today, that's obvious. But at the time, that was a very new approach, right?

Starting point is 00:11:14 Believe it or not. And we were one of the very first vendors, teams to do it as a product. And so, you know, that kind of led us to when the big data exploded there. There was in 2010, there was this whole explosion about big data. Big data was everywhere. It was in the cover of The Economist. And then very quickly after that, the more legacy database vendors like Teradata showed a big interest, right, and acquired us. And, you know, subsequently, there was a couple more iterations of database architecture. There was Hadoop, right, and the whole NoSQL movement, which didn't go very far.

Starting point is 00:12:00 And that kind of came back into SQL, but in the cloud, which kind of resulted in what we know today as, you know, the Snowflake Databricks type of architecture, which ironically separated processing and data again, right? So we went from data and processing being separate in the mainframe disk array world to coming together in the MPP database world and Hadoop, and then it got separated again in the cloud just because the network interconnects, right? Became so efficient that you could actually afford to do it when you couldn't do it before but that was kind of the quick the quick story on master okay and i have a ton of questions there and then i of course i

Starting point is 00:12:38 want to ask about action iq but just a one quick if you will indulge me, can you talk about selling MySpace as your first customer? Because, you know, again, that was in tech world some time ago, actually not that long ago. Right. But it would kind of be like a database startup saying, you know, Facebook is our first customer, which is a really big sale. And so can you just give us that story for the entrepreneurs in the audience? Because that's just, I need to know that. Yeah, no, it was really, first of all, it was a huge deal. I think the deal itself was 10x the money we had raised at the time, just to give you a sense, right?

Starting point is 00:13:24 So it was like, I think it was almost like 70s or something crazy. Wow. And the way it happened was that Ron Conway was one of my seed investors and he connected me to Adam Bain, who ended up being the CRO for Twitter later on. And Adam Bain at the time was running Fox Interactive. Fox had just bought MySpace. And so I was very fortunate to be dealing with a very entrepreneurial, I mean, for those of you that know Adam, he's a super smart, super entrepreneurial guy. And he saw in us,

Starting point is 00:13:58 and he knew he had to scale MySpace, right? He was very growth-minded, right? He was very ambitious and he knew that MySpace had to scale in order for MySpace to scale MySpace, right? He was very growth-minded, right? He was very ambitious and he knew that MySpace had to scale in order for MySpace to scale. MySpace data infrastructure had to scale, right? So you had a very technically aware, ambitious business owner.

Starting point is 00:14:18 So Adam was the person we interacted with through Ron Conway. And at the time, you know, the reality is Adam and MySpace didn't have that many options, right? So their options were SQL Server on the one side or Teradata on the other side, right? And then SQL Server couldn't scale. And Teradata, which probably for the amount of data we're talking about, would probably have cost, if I had to guess, close to $100 million. Yeah.

Starting point is 00:14:47 That's the amount of money we're talking about, right? Yeah. So you had to do something. And we were right in between, right, in terms of cost and what we could handle. And, you know, we were, again, you didn't have many options, which takes us back to, I think, every time you close a big deal, the reason that massive deal happens is because the customer absolutely needs what you're selling. It's like vital for them. And there's no alternative that comes close. Yeah.

Starting point is 00:15:21 And that transaction met both criteria, right? It was critical that myspace could scale their data operations and there was no alternative at the time and it paid off for them right i mean we did you know a lot of what we promised we would do and i'm not sure what else it could have done at the time you know again later on you know 10 years later there were a lot more options sure at the time yeah they were. And so that's how this whole thing came around. But to be very honest with you, I was out of school, I was 24 years old. And if I had the experience I have today, I would never ask for so much money. Like that was crazy.

Starting point is 00:15:58 Like it was complete inexperience that made me ask for how much I asked for the time to be completely honest. Yeah. Well, like a true technical founder, you sound simultaneously like someone who has a deep grasp of the combination and separation of storage compute and how to make an enterprise sale, which is kind of a very... You have to, right? You have to learn this stuff. Yeah. And you know, they are opposed to pricing was very rational, right? We're like, we, we ran the math and we're like, all right, that's a reasonable price, but you just look at the price. You would get scared, but yeah, the math was correct in the end. Yeah. Yeah. Well, yeah. I love hearing the phrase,

Starting point is 00:16:41 the math is correct. Okay. So tell us about Action IQ. And then we'll go back a little bit because I want to talk about databases and different flavors because you've got an interesting journey. But what is Action IQ and why did you decide to start it after working in databases? Yeah, so I think by now it's probably obvious that Aster, you know, we did a lot. Aster was a generic database, right?

Starting point is 00:17:04 So we had a lot of use cases. We did like, from the MySpace use case, we ended up working with healthcare companies, financial companies, a lot of big banks globally, telcos. But we ended up getting used for a lot of customer data because at the time, it was the event data that couldn't be processed by the traditional databases. So almost by accident, a lot of our use cases were around customer data. And subsequently, even at Teradata,

Starting point is 00:17:33 one of my observations was I was fascinated how the vast amounts of customer data that would live in the IT systems, Aster, Teradata, whatever was your data warehouses, right? Massive amounts of customer data that would live in the IT systems, Aster, Teradata, whatever was your data warehouses, right? Massive amounts of customer data. And then when you would look what happens with the business, which is where the value of data is supposed to be created, right? Because at the end of the day,

Starting point is 00:17:56 IT doesn't store customer data for IT's purpose, it stores it to power business use cases, right? Or product use cases. If you look at the business systems, they could at best store 1%, 0.1%, 0.01% of that customer data at best. And the reason was there was this bifurcation, right? So you could either buy a product for engineers

Starting point is 00:18:20 that scales like Aster, or you would buy like an email tool for the business that has almost no data infrastructure behind it. And there's this huge gap in between. So I started thinking a lot about that gap because in my experience, at Aster at Teradata, oftentimes we would put a lot of customer data in those databases and we would succeed in doing what we said we would do. But unless the business got direct access to it, the value wouldn't be there.

Starting point is 00:18:52 It was almost like, you know, I used to joke that, you know, the operation was successful but the patient died, right? Because, you know, the data got into that place. But the value was not there because, you know, the people were supposed to create the value. They weren't technical. They didn't know SQL. There had to be people in between. They were very slow.

Starting point is 00:19:10 The systems were not connected, et cetera, et cetera. So I got fascinated by this problem of how do you bridge, not just the systems, but the two worlds? Because we're talking about different cultures, right? The data engineering culture where I was part of is one culture. And then let's say the marketing culture has a completely different culture that value different things and understand different things, different language. And so the intersection of these two worlds was fascinating to me.

Starting point is 00:19:36 And I decided to start the company to solve this problem. And when I was starting the company, I wasn't sure exactly what it would look like, but I knew it would have to scale with data as much as Azure data. And it would have to have a UI that you wouldn't have to be a data engineer to use. That was kind of my two criteria when I started the company. And that's how ActionIQ was created. Fascinating. Okay. And just give us the pitch on ActionIQ. What does it do? Obviously, it's a UI on some sort of database, but why do people buy it? Data Warehouse, Data Lake, or multiple, right? So we can do data federation. We used to bring the data over, but now we push the queries down with a composable model, which we can discuss.

Starting point is 00:20:30 On the application side, we connect multiple applications. So theoretically, every business system you have that's touching the customer should be connected to ActionEQ. So email, CRM, customer success... Email, CRM, web personalization, call center, direct mail, client and stuff for retail, right?

Starting point is 00:20:54 Decisioning system, next best action systems, you know, product, right? Because product is customer facing, right? So there's really a very long tail. I mean, there's probably, an enterprise probably has like 100 to 200

Starting point is 00:21:10 of these systems, right? At least. Yep. And those are sort of integrations that you support. Correct. There's integration we support, which can be push or pull.

Starting point is 00:21:20 And then on the interface itself, part of the interface for the business user, part is for data engineers, but for the the business users what we want them to do is to get access to an abstracted version of the data and being able to say who do i want to target why and what do i want to do with these people through what channel right and being able to deploy a new experience, right? The marketing person may call it a campaign, right? But you can go beyond marketing with this. Deploy something new, run a new experiment, run a new test with customers

Starting point is 00:21:53 without having to write SQL, without having to know what's this column, this table, this data warehouse, none of this stuff, right? So we offer a sales service that you didn't have before and we offer agility so you can do things in a day that would take you a month to do before when it comes to creating this new experience and orchestration right which just simply means email doesn't do its own thing

Starting point is 00:22:19 and web does its own thing now you have kind of something to coordinate what this one customer sees through any channel they may happen to interact with. Yeah. I kind of think about that as like a, you know, marketing is kind of like a dag in many ways, right? It's just that the nodes be like different tools that are sort of emitting something out of an API, but whatever. That's my nerdy.

Starting point is 00:22:44 Yeah. sort of emitting something out of an API, but whatever. That's my nerdy. And what's interesting also, because we say marketing and I'd say marketing a lot as well, right? But marketing in many places have become kind of the ambassador of the customer. But if you think about it, many of what we're talking about is not marketing. For instance, we have a lot of B2B customers and technology customers you know atlassian is a good example right and a lot of how you interact with your users there it almost looks like customer success right which is not really marketing but it is the user interaction but in most companies like companies are not organized around the customer or the user

Starting point is 00:23:23 they're organized around functions and revenue, and they're more functionally organized. So marketing ends up taking the lead in many places and saying, how do we align around this one customer, right? But it really becomes a very cross-functional thing because most functions, right, if you think about product, marketing, customer success, support, everybody's touching the customer. And in theory, everybody should be one, or if it's not one, it should be tightly coordinated or orchestrated in some way.

Starting point is 00:23:53 Yeah, makes total sense. Okay. So I want to dig into databases a little bit here. When you built Aster, you built what you called a multi-purpose or let's say a workload agnostic piece of infrastructure. used heavily for clickstream data just because the infrastructure around more traditional SQL-based databases wasn't optimized for that. That's a pretty different problem to solve than, at least from my perception, than building a database that's geared towards essentially driving customer experiences. So when you think about workload agnostic, you need to think about sort of optimizations on a more general level. Thinking about particular data types, handling a large variety of data types,

Starting point is 00:25:08 you know, lots of edge cases. When you built Action IQ, what was that transition like? Because now you're really focusing in on the customer data. And so you can be much more opinionated. Can you talk about how you approach that? 100%, yeah, yeah, yeah. So, you know, this is, it's really something that most database,

Starting point is 00:25:31 having been on the database side, right? You struggle with this problem all the time because you have to support a long tail of use cases. And the tail is really long, right? And you get into all this esoteric functionality that a small part of your use cases need, and you have to support them, and everybody has to be an equal citizen almost. And that becomes very difficult. With ActionIQ, the day we started, both myself and my co-founder were both database backgrounds, it was almost released, the feeling I would describe as released, that we could say, screw that 90%.

Starting point is 00:26:10 How can we further optimize this, right? How can we do it very quickly? It was truly released. I mean, and I can give you a few very basic examples, right? We support arbitrary data models, but at the end of the day, you have some customer identifier and you have some event timestamps.

Starting point is 00:26:26 Even simple things like that allows you to make decisions that can optimize performance, can optimize storage in ways that you could never do in a database because it would break a lot of things. To give you another example, we do a lot of segmentation-like operations in the UI. Yep. Segmentation, turns out it's a left outer join, right, in SQL at the end of the day. Yep. So guess what? We have a really fast left outer join optimizer, right? Yeah.

Starting point is 00:26:56 And this is like one of a hundred different types of joins the database has support, right? But in our case, it just happened we knew that as we get 10x the usage of anything else and we could optimize it, we could optimize it day one. So, you know, it was all these things we knew we could always do, right? I mean, I knew I could do all these things in the past, but we wouldn't do it because you have to support all these things. And it's really a blessing and a curse, right? I mean, the reason why, you know, a data breaks a snowflake, have this huge valuation is because

Starting point is 00:27:27 they can support all these use cases. Sure, sure. But then the technical complexity that gets created is kind of enormous. So I would say we were careful to not put too many constraints in the action IQ data queries and all the whole, how we structure the system. But even the very basic constraints we were able to do just because we knew we were dealing with customer data that had some very basic properties allowed us to do things that, you

Starting point is 00:27:55 know, we would have never done in a database. So that's kind of the database side, right? But for me personally, that was the easy part because I was, you know, that was kind of my expertise. The difficult part was understanding how the business users wanted to use the system. And that was a learning curve, right? And what we did there was that our first customer was this e-commerce company in New York, Guild Group, that used to be a NASA customer as well. Oh yeah, I, Guild Group, that used to be a NASA customer as well. Oh, yeah. I remember Guild Group.

Starting point is 00:28:29 Yeah, and they had a great, very technical team, very sophisticated team at the time, right? Very high-flying. I mean, the whole space kind of fizzled away, but it wasn't nothing they could do about it. But for the first year of our life as ActionIQ, we got to sit next to Guild Group's marketing team. So our actual physical seats, desks were right next to our customers for a whole year. Oh, wow. And so we could see,

Starting point is 00:28:54 I could see them from my desk using our product and we would talk to them. We would have lunch with them. They would tell us what they're doing, how they're doing it. A lot of these people came from the financial services sector. So they brought a lot of very mature best practices for crm

Starting point is 00:29:10 and then you know our first hire or one of our first hires was a ux designer because we knew that was not i mean you know again we're database people right we're infrastructure engineers so we wanted to have ux for the rows and columns, right? Yeah. And SQL and C++, Scala, right? It's how we use Scala to build the product. I wrote code for the first year, right? I hadn't written code in a while.

Starting point is 00:29:35 It was fun for me to get back to it for a little bit. But we knew when it comes to UX, that was another thing, right? So we made a UX hire super early. We ended up hiring a great person. We sat next to our business users very early. We forced ourselves to get close to where I felt were the weakest. Because ActionX is all about bringing data infrastructure and business application in one company. And again, this is not about the technology,

Starting point is 00:30:08 it's about the culture, it's about the mentality, right? That's why I did the company, right? It was fascinating to me to do something like that, but you had to force yourself to get uncomfortable early on to bring these two things together. Yeah, I love it. Now, I mean, what a great story about just, I mean, I think that's actually just Sage

Starting point is 00:30:28 advice in general about, you know, having a desk next to your first customer. I was worried that you were going to say our first customer at Action IQ was MySpace. Yeah. I'm glad you didn't say that. Yeah, no, you didn't say that yeah no I didn't we actually wanted I mean you know we needed a local customer to do that right

Starting point is 00:30:51 so I tried really hard to find a customer that was local to us in New York City okay so I know Kostas has a ton of questions but one more question for me maybe two can we talk about the data model? So you said that you worked really hard not to put constraints on the queries that Action IQ is able to execute

Starting point is 00:31:15 from a segmentation standpoint. Okay. I get that in theory. The world kind of runs off of the Salesforce model, which is like a lead contact account, you know, however, you know, that basic data model, right? And at the end of the day, they're sort of an end user, whatever their relationship with the hierarchy of other entities that your business is interested in. Right, right, right. You're sending a message or an advertisement to a particular user or group of users, right, right. the data model that they have does not actually afford flexibility to represent, let's say a business model or a data model like Gilt Group, right? Really hard to represent that in sort of the rigid Salesforce data model. How do you approach that from a database standpoint? So you want flexibility, but you also need to have some

Starting point is 00:32:27 sort of underlying data model that allows the UI to create a sense of logic and predictability for an end user. How do you reconcile those? Yeah. So first, maybe for some context, to talk a little bit about where our users and customers. So we started in, you know, GuildGrowth was essentially a retail company, right? E-commerce, but retailer. So we started in retail, but since then we've expanded and, you know, we do all kinds of different B2Cs, right? So we do, you know, a lot of media, right?

Starting point is 00:32:59 Like, you know, folks like News Corp, Washington Post, Sony. We do a lot of financial services and we do a lot of B2B, right? We do a lot of combination of B2B and B2C. Folks like Dell, right? HP and others, all big enterprise, right? Big enterprise, B2C to B2B. When I started Action IQ, I had, you know, again, all my experience was in the data space, right? And my observation was that setting up the data model and the ideal pipelines sucked.

Starting point is 00:33:32 That's where things took a lot of time, right? And would get complicated. So my first criteria in Action IQ was that I wanted to be able to reuse the exact same model that existed in the data warehouse. Which, by the way, at the time, now with composability, it's obvious. At the time, I think we were like five, 10 years ahead of time, right? When we said that. Oh, no question. Like marketers weren't thinking about the data warehouse 10 years ago.

Starting point is 00:34:02 Right. And many still aren't today, actually, but that's probably another topic. Yeah. And even vendors, right? I mean, if you look at the big vendors, right? Every vendor has its own data model and they expect someone to take the data from wherever it is, maybe the data warehouse, and load it, ETL it into their own data model. But when we started ActionIQ, you know, like eight years ago now, we said we have to be able to reuse

Starting point is 00:34:26 the same data model. We can maybe augment it or we can put metadata on the data model, but we want to use exactly the same data model that lives in the data warehouse. Because again, my goal was not to build a new data mart with customer data, my goal was to take all the data that exists on the IT side and make it accessible by the business side. So that forced certain things, right? So the approach we took early on was to say, we're going to support whatever data model is there, and we're going to allow the users to tag the data model to tell us what is an identity, what is an event, what's a timestamp, and

Starting point is 00:35:07 also what are the joint graphs, right, in that data model. But we would leave the data model from the data warehouse. Essentially, it's more like caching the data on ActionIQ more than loading the data or transforming the data. I would keep a cache of the data with ActionIQ. Interesting, yeah. And then you could tag the data model on top of it, would keep a cache of the data with ActionIQ. Interesting, yeah. And then you could tag the data model on top of it, but we had to be able to support it.

Starting point is 00:35:30 That led us actually to implement our backend database as an in-memory database because we had to support arbitrary queries and arbitrary models with interactive times, right? Which is pretty hard to do, generally speaking. But it was essentially, it was a lift, right? It like lifted and moved it versus transformed it. Wow.

Starting point is 00:35:54 And so since then, right, we have expanded that. And I could talk how that ties to the UI and everything else if you're interested. But now, for example, the B2 b2b use cases right would tend to have more complex identity as you were saying when we say we supported one identity now we can support a hierarchy of identities right so you have to have a user that that's part of an account that's part of it you know like big client whatever that may be right so we expanded support the concept of more hierarchical identities and a whole bunch of other stuff. But a fundamental principle, that one requirement, right, we set for ourselves

Starting point is 00:36:31 forced us to be very open about what kind of data models we support. Fascinating. I mean, I have a hundred more questions. The supporting arbitrary queries makes the whole thing make a lot more sense. And having a caching layer makes the whole thing make a lot more sense. Because traditional SaaS vendors force you to basically input your data into pre-existing queries that they run already. Okay, I will stop. Costas, please jump in. I'm going to hand the mic to you.

Starting point is 00:37:01 Yeah, thank you, Eric. So Tasso, you mentioned that you started seeing this use case of clickstream data, like user event data, since your time in Aster Data, right? I want to ask you, first of all, about what's unique about this data from a technical perspective, right? What makes it so challenging, or maybe not that challenging, we'll see. To accommodate, let's say, the processing of this data at scale in traditional databases, right?

Starting point is 00:37:38 And how these things have changed also, like, through time, because of master data today, like, there's a lot of progress that has happened. But tell us a little bit about that because my feeling is that it's also pretty unique in some ways, the type of data that you have to work. That's right. So that challenges the systems itself. So tell us a little bit more about that. Yeah. So from a NASA perspective, first of all, I'll come to your question very quickly, but just to state maybe the obvious, the reason why Clickstream data we work with was a business reason primarily, which is the dollar per terabyte value of the data was low. So if you're a bank and you have data about your customer's accounts and their balances, that data is worth a ton of money.

Starting point is 00:38:23 It's very low versus the value. Clickstream, you don't even know if it's valuable at all until you analyze it. Right. And maybe it's not so you can almost do it two by two and say high value, you know, large scale with low dollar per terabyte. That's the data we dealt with, right? You had a hundred terabytes of low value data that would come to us because we could support the cost structure, right?

Starting point is 00:38:46 To make it economical. But to answer your question, I think what people don't realize is that click stream data is time series data. And that's where a lot of the complexity comes, right? So a lot of what we had to do with Aster, people were interested, not just in searching the data, but saying, okay,

Starting point is 00:39:07 what is the sequence of events that leads to something good or bad, right? So you have, you know, we did a lot of stuff like, okay, I remember, right? We had a big grocery chain as a customer that we're trying to figure out what are the gateway products? chain as a customer that we're trying to figure out what are the gateway products what's the kind of product that if a customer buys now they used to buy only you know groceries from me now they're buying all their meat and fish or whatever right so what is the what's what are the paths in that clickstream data that lead to positive or negative outcomes. And that's time series data. Now, SQL is really bad language for time series data because SQL essentially is a way to model set theory.

Starting point is 00:39:51 It's very good with set intersections, unions. That's what SQL is at the end of the day. But time series is not that, right? So we ended up building a lot of custom functions that would allow our users back at the yesterdays right it was still sqlite would allow our users to do time series queries on top of clickstream data so the way we would organize the data store the data partition the data and we expanded sql to support time series queries that was a lot of the

Starting point is 00:40:19 innovations we did in addition to the basic architecture right that was shared nothing but that stuff can get very complicated because unless you know exactly what you're doing from an implementation perspective, you know, if you try to do time series analysis with basic SQL, it's just extremely slow, right? You have to shift a ton of data around. It just doesn't work.

Starting point is 00:40:40 So that is one difference. Yeah. Okay. That's awesome. By the way, question, now like naive question, but we have time series data, right? We pretty much like in the industry, we have like a dedicated type of databases, like for time series data, right? Especially for things that work like in the observability space right because all these things at the end like time series data i do have my opinion what the difference there with customer

Starting point is 00:41:11 data though but i don't know yeah no like my my opinion on that why not go and use one of these solutions right that technically at least they are supposed to be working well for data that are time series and also as you said they have a very low let's say value per terabyte of data data like from like a data center like okay whatever like

Starting point is 00:41:37 yeah yeah yeah it's interesting when things start breaking but until then it's just like a lot of noise right so yeah so back then first of all most like a lot of noise, right? So why would we... So back then, first of all, most of the systems didn't exist, right? Like when NASA existed, there was no time series databases.

Starting point is 00:41:52 I mean, I shouldn't say that. Let's say they weren't popular, right? It wasn't something we were aware of at the time or were looking at. I think today you have the option of using that for some of that, but also the customer queries are a little bit different. So I'll give you a very concrete example. And I know we have a technical audience, right? So just to go for one minute, so it's a little bit more technical. One of the things we created at Azure was this thing called n path you would give a regular expression of events it could be a b star c and we would map that to the clickstream data and you could define what a b and c are right so

Starting point is 00:42:34 it could be a is like you enter your website on this page b star is you do like zero more of these things and since you end up in this checkout thing and we would map this regular expression on the time series data to help you find patterns across your customers that's not what time series databases do right time series databases for the most part they're concerned with calculating aggregates and other metrics right on top of the data here we're looking for behavioral patterns that span weeks potentially right, right, of data. So I would argue even today, it's probably a different problem. But at the time, there was not even the option of the time series databases to be considered.

Starting point is 00:43:15 Yeah, 100%. I totally agree with you. Actually, I think user event data and click clickstream data have like this very unique characteristic of being like a time series but there are like quite a few dimensions on that to each point that's right that's right makes the problem like quite different than having let's say calculating cpu usage right that's like a completely different type yeah you might have i don't know like when we're calculating like cpu signal like from a data center, probably you have orders of more data. But the dimensionality of the data is like much lower.

Starting point is 00:43:51 And that makes like a huge difference. That's right. Like in what kind of like... That's exactly right. And I think subtle differences in the actual order of events or sequence of events matter with behavior, right? So again, like I think in observability, a lot of it is about the aggregate metrics or when you hit a certain threshold and this and that. With customer behavior, right, you're looking, why are people dropping off, right? It's a why, right?

Starting point is 00:44:20 It's not a what or a when, it's a why. Why are people dropping off my website at a certain point, for example? And for example, a big part of this is how do you visualize the data to give people the opportunity to notice patterns and figure out what questions they should be asking in there. So you have all these fancy diagrams that we had actually. Again, at Atasta, we didn't do much of that, but we had implemented some light UI on top of it. Or to give you another example,

Starting point is 00:44:52 one of the interesting use cases, early use case at Atasta was the people you may know at LinkedIn. So LinkedIn was an early customer. One, the lead data scientist at the time was a person, brilliant guy called Jonathan Goldman. He worked under DJ Patel, when they're becoming the, I believe he was the chief data scientist of the United States later on. And Jonathan used this technology

Starting point is 00:45:20 to create the first version of the people you may know, right? That now is ubiquitous. This has nothing to do with metrics, right? You're trying to see how do people connect with each other and what this graph says about who you may or may not know. That's not typically what time series databases would be concerned with, but it is an events-based or network-based problem, for example. So, yeah, we have some really fascinating early use cases that now there's probably more diverse tooling.

Starting point is 00:45:52 Like I don't think today you would use a single platform maybe to do everything. But still, there's many of the stuff we were doing back then that I'm not sure there's a clear replacement for that type of operational analytics. I think today what people end up doing is, you know, you load them in a data lake, right? And then you can deploy something like a Databricks that you run, has a more flexible language, right? Beyond SQL.

Starting point is 00:46:17 And you essentially write some custom analytics and custom code to do what you have to do. So I think the modern approach has a lot more processing power available and is more customized, but is less abstracted. Right. So I would say the world has moved on probably for good reason, but I'm naturally simpler to do today the kind of things we were doing back then. Yeah, yeah, 100%. And do you think, you talked about something very interesting. You said about the value per terabyte of data, right?

Starting point is 00:46:58 Back then. And someone who is, let's say, oblivious of what's going on with this industry would say, but the data you are talking about here, with Action IQ, it's customer behavior. Isn't this the most important type of data that you have in a company? Right?

Starting point is 00:47:18 So, especially today, with all this ML, AI stuff, let's say the workloads like they are shifting a little bit more on like building predictive power on top of like behaviors and all these things that like okay like 15 years ago probably there were more of any cases do you think that's this like tag of like like dollar price, sorry, per terabyte is changing because of these new use cases and technologies,

Starting point is 00:47:51 or it has remained still kind of like dealing with logs, let's say. Yeah. I mean, I think today the cost of processing data has come so low, right? Storing and processing data is so much cheaper today than what it used to be that I think people look more at the aggregate value processing data has come so low, right? Storing and processing data is so much cheaper today than what it used to be that I think people look more at the aggregate value of data. Because when I say dollar per terabyte is low, you have so much terabyte that in aggregate,

Starting point is 00:48:15 it could be super valuable data set, right? So I feel today the conversation has moved again for good reason, and it's less about the dollar per terabyte and it's more about what's the aggregate value and the other thing we see is that the cost is in the processing more than is less how much data you have it's more like how expensive is the processing you want to do with it you can store date almost any data so you have today for very little you can run simple processing on top of it for very little but as we've seen with

Starting point is 00:48:46 llm training for example right if you try to do some very complex processing can get extremely expensive extremely quickly so now i feel the metric is dollar per cost of model right or cost of model is dollar per value dollar to train the model over the value of the model yeah yeah it's not about data size anymore is that about storing it's about okay how expensive is it and then human labor is super expensive right again part of why action iq is so successful is because every time you try to have humans interface between business and the data, they become a bottleneck very fast. There's just not enough competent people that you can insert in between the business and the data to make it happen. So if you can make the

Starting point is 00:49:39 business even a little bit more self-service, a little bit more agile, that's a huge win. And it's not that you're saving money. I mean, you're still going to hire as many data engineers as you can. It's that instead of something takes a month because everybody's waiting on everybody else, now you can do it in a day that allows the business to move at a much higher speed, right? So I think these are the modern problems, I would say, or challenges. So the work has moved on, but there's still those questions, right? It's like, you know, how much is it worth for a modeler and insight?

Starting point is 00:50:14 But it's just a different ratio that people are using to think about it. Yeah, yeah. No, that's an excellent way to put it. So a product question now. It's one thing to go and build a product for one persona, right? So, when you were at AstroData, you were, I mean,

Starting point is 00:50:34 younger, but fortunate enough to primarily build a product for a very specific type of persona, which is the technical persona, let's say. Like a system engineer or whatever. It's a completely different set of problems when you're trying to build something that has to be good for multiple personas.

Starting point is 00:50:56 It's not just multiple personas. Here we have some people that, in some cases, they pretty much hate each other because of how different they are, right? So we have a data platform. So naturally, you need the involvement of some data engineering or some IT people, at least. And then you have the marketeers, right? The people who are actually interacting with the data and creating value out of these.

Starting point is 00:51:21 And these two personas are very different. Many organizations probably don't even talk to each other because they are so... Not because they hate each other, but just because of the different functions there. How do you build a product when you have to keep happy both of them, right? And build user experiences for both of them to succeed at the end. Yeah, it's a great question. Great question. So first, for context, ActionIQ, we're very enterprise-focused, right? And usually, when we go into an enterprise, what we find is that there is a structure to do ActionIQ-like things. How does this work, right? There's a team that sometimes is called an analytics team or a marketing operations team or something,

Starting point is 00:52:09 but these are people that understand data, they know how to write SQL. And then there's a business team that's using this marketing ops team as a concierge team, essentially, right? So you have the business folks and then they'll send an email, they submit a ticket,

Starting point is 00:52:25 they buy gifts, you know, to these people. And they're like, can you please pull me a list with these people? Or can you please help me understand how many customers we have that meets criteria? So our goal, so we assume there's something like that there and some collaboration. Why is this important? Because it's marketing ops folks, they already know the data and they know what the business wants and what language they're using, right? So these are our eyes in essentially implementing, deploying,

Starting point is 00:52:55 and actually IQ. And the way I think about it is that I want to take these marketing ops folks, which usually they're very competent, they have a lot of skills, right? It's analysts. And turn them from a one-none responders to requests to being the administrators, the configurators,

Starting point is 00:53:14 and the power users of Action IQ. And take 90% of those requests, and once they configure Action IQ, they push them to the higher layer interfaces we have and give them over to marketers, right? So in ActionIQ, for example, there's a translation layer that you can take database concepts, right?

Starting point is 00:53:34 Like a table or a column and rename it, reformat it, or do something to make it presentable to the business, right? It's almost like a dictionary that translates database terminology to business terminology. And because these marketing ops, data analytics, customer analytics teams, whatever they're called, right? Center of excellence, right?

Starting point is 00:53:55 Sometimes have been going back and forth service and request for years. They know how to do the translation already. It's a matter of giving them the tools where they can point ActionIQ to the data sources, right? Presumably one or more data warehouses, right? Or data marts. Build that dictionary of terms, right? And then expose that to the business user.

Starting point is 00:54:18 And then they only get involved whether it's like new terms, new data, new requirements, or it is something that's so complicated that the business needs someone to double check or whatever. But 90% of the stuff gets automated. And then, you know, we have, we're an enterprise company, right? So we support a lot of governance things. So you can have, you know, the analyst approving things

Starting point is 00:54:38 before they go out, they can double check, you can have like checks and balances, right? To make sure that if they want to oversee what the business is doing, they can also do that as well. But we try to teach people how to fish, right? I mean, that's the idea. Instead of like, you know, trying to feed them one fish at a time, teach them how to fish. And it's also a huge improvement of life for the analysts. Because these high urgency requests that come from the business you wake up in the morning

Starting point is 00:55:06 and you have like five emails because somebody needs something today it's really not the fun part of the jobs you know for more people in data engineers or data analysts yeah 100 okay that was awesome so i would like to spend like a a couple of the last minutes that we have here on talking a little bit more about CDPs as a category of products and also talk about something that we hear a lot lately, which is composability over them, right? So what does it mean for a CDP, like a customer data platform, to be composable?

Starting point is 00:55:43 What are the semantics behind that? I'll ask two questions, actually. One is from a technical point of view and one from the customer, like the user point of view, right? Yeah, exactly. So composability in general, right, to start there and beyond CDP,

Starting point is 00:56:01 what it means is that essentially it's a different world for specialization and optionality. So instead of having one thing that does everything, you have one thing that may be doing a lot of things, but that technology gives you the opportunity to delegate certain parts of its functionality to other systems. Specifically in CDP, the biggest thing composability means is that instead of copying data over from the data warehouse to the CDP, the biggest thing composability means is that instead of copying data over from the data warehouse to the CDP and doing the processing in the CDP, right, in the CDP vendors cloud, the CDP sends the queries down to the data warehouse, right? It sees the data

Starting point is 00:56:36 model, it doesn't copy the data, it pushes the query down to decrypt the data up. Now, like the term CDP itself, right, composability is being abused today by certain vendors right so certain vendors use the word composability to mean you know they will talk for example about having a connector with a cloud data warehouse but that connector doesn't push anything down right just gets the data out but they make all this composable. So you have to be a little bit careful with, you know, the poetic license that marketing always has. As I'm sure Eric and all of us know. But you know, it's composability for CDPs means the cloud data warehouse or the

Starting point is 00:57:21 data warehouse is your processing engine, essentially. Now, the way we mean it specifically at ActionIQ, we have what we call hybrid compute, which means you can have most of your data in Databricks, right, or Snowflake or Teradata, but you can still have some data in ActionIQ, if that makes sense. It's completely up to the user. And you can have multiple systems that we access to get the data. So we support essentially a query federation layer as a base layer of ActionIQ. And on top of that, we have maintained our own ability to store and process data, right? Now, the beauty of this is it's completely up to, you know, the analysts and the data

Starting point is 00:58:02 engineers, right, to decide how they want to manage this configuration. You can have a single cloud data warehouse and all the data engineers to decide how the one-to-one is configured, you can have a single cloud data warehouse and all the data is there, and that's all we use. Or you can have two or three. Maybe it's a cloud data warehouse and a couple of analytics data mark that has the data that we access. Or you can have the same thing, but also have some data on ActionIQ. As far as the user is concerned, they do not know and they shouldn't know. It's completely transparent, they play in the UI, they click buttons and the queries are

Starting point is 00:58:31 routed appropriately, whether it's in our customer's IT systems or data systems, whether it's an ActionIQ-owned systems, the data is composed appropriately and the results represent the UI. So we provide a lot of flexibility there. The benefit of composability, right, is that you don't have to move. It's governance and security, right, largely. Like the moment you copy data, you have to build pipelines. Data copies can run out of sync, even if they shouldn't run out of sync. Like if you know, you know, kind of a thing, right? If you have the data stored in many places, it can get out of sync, you can have definition

Starting point is 00:59:08 problems. And then more and more, there's more concern about security and privacy among our customers, right? There's legislation, you know, GDPR, all these things, more awareness around information security and risks. So our customers love to not have to move data, not only for governance, but also for security purposes. And so, you know, there's a lot of benefits

Starting point is 00:59:31 that come with composability. And this is something we have developed, I would say, the last few years. We would have started there, but when we started ActionIQ, the problem was the database signals at the time did not scale well enough to support an interactive UI.

Starting point is 00:59:50 The reason why this works today is because storage and compute is separated again. And compute is a lot more elastic with the modern technologies. Again, the Databricks, Snowflakes, even Teradata, Redshift, everybody's evolving to separate compute and storage and make compute elastic.

Starting point is 01:00:09 So you can support a much more diverse mixed workload on top of the systems today versus what you could do before. 10 years ago, if you had Action IQ going directly, like an Oracle system, right, or whatnot, even the smallest query would probably take half an hour, right? I mean, things tend to take a lot of time, especially when you have other queries that are high-priority running, and that just could not work. But being database people, right, the moment those systems became able to support this type of workloads, we immediately said, this is it, that's the future.

Starting point is 01:00:43 We've seen it, we know it, and we evolved the product to support essentially this hybrid architecture that can do either or. That's so interesting. Actually, we probably need to have a full episode just to talk about that.

Starting point is 01:00:59 In myself, coming from Starburst and working with a federated query engine, what I find extremely fascinating here is how much the workload matters, actually, in making federation work or not.

Starting point is 01:01:15 I feel like this case, federation couldn't work. That's right. One last thing, before I give the microphone back to Eric, I find it fascinating how things make cycles in a way. And one last thing before I give the microphone back to Eric. Just, I don't know, I find it fascinating how things make cycles in a way. The first time that I heard the term customer data platform was related to treasured data.

Starting point is 01:01:38 And it's funny because treasured data, and I'm pretty sure I'm not wrong here, but they built the first version of the platform on Presto that was a video query. So it's interesting to see how the concepts go back and forth and how things need to mature to make things actually work, right? Because probably they were kind of too early in what they were doing back then in terms of like making, let's say, the technology like work back there. But it's... Yeah, it's a little bit different. It's a little bit different. I mean, different CDPs do slightly different things also, right? Which I think makes it very interesting.

Starting point is 01:02:16 But I mean, Tracer data doesn't talk about composability at all, right? Oh, yeah. And think they support it or the plant support it. But part of the reason is a lot of what they're doing is bringing data together. Like there's some CDPs whose job is to build data marts, like a customer 36

Starting point is 01:02:32 with a data mart, less than access a customer 36 if that's somewhere else. Yeah. And if you're building the customer 36, it doesn't make sense

Starting point is 01:02:40 for you to be composable, right? I mean, you are, you're competing with the cloud data warehouse, essentially, right, if you're that type of SDP. But if you're the type of SDP like us that's accessing data, we started as I mentioned before, right?

Starting point is 01:02:53 The first founding principle is use whatever data model is in place in the data warehouse, then composability makes a ton of sense and it fits really well into our model. Yeah, 100%. Anyway, we need to definitely find more time to talk about that stuff. Yeah, it's fascinating.

Starting point is 01:03:10 Super fascinating. Eric, microphone is back to you. All yours? Yes, well, we're at the buzzer as we like to say, but Tasso, okay, here's my question. This is more of a personal question. So,

Starting point is 01:03:23 you have had a very unique journey in that you founded a data infrastructure company, a database, and sold it, which is extremely difficult to do in its own right. And now you've built a successful company that serves business users. Okay. So if you had to start something new, but it could not be in SaaS at all, what would you do? Oh, man, you know, that's a great question. The reason why I started Action Accus is because I love learning new things. And that showed, you know, it was such a big new challenge.

Starting point is 01:04:09 So I haven't thought about it. You know, I'm so obsessed with what I'm doing right now that I haven't thought what that would be. But probably would be something that it could benefit from data, but it would have nothing to do with either SaaS or data infrastructure, right? I would probably take my skills. I mean, I'm a huge believer in interdisciplinary opportunity. I think the bigger opportunities are in these Venn diagrams, right? Where lots of people do A and lots of people do B, but very few people understand A and

Starting point is 01:04:41 B together. So I would ask myself, right, now that I've done A and B, what would be that C thing, right, that would benefit from everything? That's how we think about it. But maybe I'll think that question for the next time we talk, Eric, and I'll do the answer. Absolutely. Yeah, we'll do another episode on your future.

Starting point is 01:05:01 Also, thank you so much. I have so many opportunities. Yes, indeed. Thank you so much for giving many opportunities. Yes, indeed. Thank you so much for giving us the time today. What a great episode. Yeah, I really enjoyed it, guys. Thank you so much for having me here. Really fun.

Starting point is 01:05:14 We hope you enjoyed this episode of the Data Stack Show. Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers.

Starting point is 01:05:38 Learn how to build a CDP on your data warehouse at rudderstack.com.

Your Ad Here

The Data Stack Show - 169: Data Models: From Warehouse to Business Impact with Tasso Argyros of ActionIQ

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.