The Data Stack Show - 169: Data Models: From Warehouse to Business Impact with Tasso Argyros of ActionIQ
Episode Date: December 20, 2023Highlights from this week’s conversation include:The Evolution of Databases and Data Systems (2:33)Abstracting Data for Business Users (4:31)Building a Database for Google-like Search (7:58)The Big ...Data Explosion (11:10)Selling Myspace as First Customer (13:14)Starting ActionIQ (16:57)The customer-centric organization (22:46)Transitioning to customer data focus (23:53)Understanding business users' needs (28:30)Supporting Arbitrary Queries and Data Models (34:42)Unique Technical Perspective of Clickstream Data (37:01)The value per terabyte of data (46:45)Building a product for multiple personas (50:45)Composability and Benefits (58:05)Evolution of Storage and Compute (1:00:09)Composability and Treasure Data (1:02:10)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
We are here on the Data Stack Show with Tasso
Argyros. Welcome to the show, Tasso. We're so excited to have you.
Great to be here. I've been looking forward to it. Thank you for having me.
All right. Well, give us an abbreviated background. So you're the CEO of Action IQ,
but you've done lots of database stuff. So just give us a brief background. Yeah. So I've been a database guy my whole life,
more or less, my whole professional life. I grew up in Greece where I studied engineering. And then
I came to the US. I started at PhD at Stanford to study databases and distributed systems.
And about a couple of years into that, I dropped out and just started one of the first
third nothing massively scalable database companies at the time.
That was back in the 2000s.
It was called Aster Data.
And Aster was one of the first companies
that could deploy very large databases
on commodity hardware, right?
So much lower cost to store and analyze
big amounts of data.
That was, you know, pre-Hadoop,
it was around the same time that MapReduce
and that stuff was coming out.
I sold that to Teradata,
which is one of the, you know,
a big data warehouse company at the time.
I was the largest enterprise data warehouse company and spent a few years there.
Definitely a great school in databases and the business of databases.
And then, you know, I wanted to do something slightly different.
So I left and I started ActionIQ.
We're a customer data platform.
So there's definitely a bunch of database technology involved, but at the end of the day, we have a UI and our goal is
to empower the business users along with the data engineers. So databases were a technical product
and ActionIQ serves a dual purpose as I like to think about it.
You know, which kind of brings us to today, you know, CDP is a big, exciting market and I'm sure we'll talk about it in the show.
Yeah, a hundred percent.
And by the way, it's also like one of the things that like really excites me in like
the conversation that we are going to have today is this connection between the data systems, at scale especially,
and the business use case.
And you chose, I think, kind of an extreme use case here because you have a problem that,
from my experience at least, when we are dealing with customer data at scale, it can be hard
for the data platform
that you are using and like how to interact with it.
But at the same time,
you have one of the most like demanding in a way
customers out there,
which is marketing people, right?
Who have to use this
and they have to use it like in a way
that it's very provable
that brings value to the company.
So I'd love to get more into connecting the dots there,
how data systems and the evolution of them led to today
to support this kind of use cases,
and also how you solve this very hard product problem.
It's one thing to build a database with a terminal
like SQL.
It's another thing
to build something that
someone needs to slice
and dice data for marketing campaigns, right?
So that's something that
I'm really excited about.
What's in your mind?
What you would love to get
deeper into today? Yeah, so I think of course what you would love to get deeper into like today?
Yeah.
So I think, of course, what you say is spot on, right?
So I think with the CDP, you know, we had our work cut out for us because, first of
all, for the business user, you need to abstract things enough so that they can do stuff without
understanding all the underlying data.
They shouldn't know SQL and they shouldn't be able to know what every table column means to do work.
So you need to abstract things enough for the business user to do the work, but not
so much that they can really do that much anymore, right?
Because you've abstracted things to the point of elimination.
And the other thing that I think is interesting is that
it's not just the business users, right?
So we have the business user persona,
but we also have the data engineer and analyst persona.
So database, you know, you have the database users
or engineers or analysts that are using it, right?
Everybody knows, you know, at least SQL, right?
And people understand data structures and what the data means.
And in our world, some of our users do,
but some of our users don't, right?
So you also have like this multitude of users.
So it was definitely an interesting problem
which is kind of what I was looking for.
But beyond that, I think it's interesting to think
how the CDP and the database world
have been kind of intertwined, right?
And, you know, some of the latest trends in the CDP world, like composability, are enabled
and were created because of how the cloud databases, right, have evolved in the past
few years.
So I think database architecture evolution and CDP evolution
kind of go hand in
hand even though
they're separate
spaces.
So I think it'd be
very interesting to
talk about that and
you know what is a
CDP right and
which is you know
hours of debate
right.
It can take place
in that.
What is a
composable CDP
all the stuff is
fascinating to
discuss.
Yep. That's super interesting. I can't wait to get into the details here.
Yeah. Let's dig in.
Let's do it.
Okay. So let's start where your story begins at Stanford and then kind of go from there because
you sort of wound your way through databases and then sort of ended up at the
business user.
Can you just trace that path for us a little bit?
Yeah.
So I landed at Stanford and it was really such a fascinating time for me.
I got into the PhD program, computer science, which is obviously a very high esteem program
with so many great people have come out of it.
And before Stanford, I had done some research in data mining, data analytics,
and my intention was to go study databases.
But what happened was I ended up meeting this professor, David Sheridan,
who is this really brilliant Canadian professor and researcher.
And David was the first check into Google.
So he gave them the first seed money.
I think he ended up owning 1% of Google or something like that,
which is pretty good.
I don't know if he still has it or not.
Pretty good is probably an understatement, but we'll leave it at that. That was a good ROI for a seed investment.
And together with a couple of other posts like Rajiv Modwani who unfortunately passed
away and a couple of others.
And if you recall at the time, Google had implemented search using commodity boxes, right?
There was AltaVista before they were using these big mainframes, very expensive, right?
And Google, they would take these pizza boxes and deploy the search.
So David came to me and he was like, hey, you're a database guy.
Could you build a database the way Google built its search?
That was kind of the initial problem statement, I would put it.
And then I met up with a couple of other students
that were looking at the same problem from different perspectives.
There was some peer-to-peer database research at the time that was relevant.
And so we started AstroData out of Stanford.
So my advisor put in some money.
You know, we had angel investors.
There was no formal seed investment back then, right?
So you had to find individuals to do that.
And we did end up developing a database that essentially storage and compute was together in commodity boxes.
We would buy Dell or HP servers and many of them, right?
Hundreds of them.
Our first customer was MySpace,
which at the time was, you know,
as big as Facebook, right?
It was the Facebook of the day.
Wow.
And we would deploy massive scale
initially for MySpace's customer data.
And what's interesting about our approach...
Can you just talk just very briefly about
what was MySpace doing before
and then what were they sort of migrating onto Aster?
What were specific workloads?
Because obviously they didn't do everything at once,
or I would guess that they didn't.
Yeah, so MySpace,
so essentially MySpace was a Microsoft shop, so they were using SQL Server at the
time.
What SQL Server couldn't do was all the clickstream data, right?
So you could do the profile data in SQL Server, and they could act as, you do operational
analytics on that, but it was the event data that was massive in scale, right?
Because the MySpace user were all over the place, obviously, right?
Yeah, yeah.
That they couldn't do.
So they used this initially specifically for any event behavioral information.
It was a naster.
And then the profile, the most static information about the customers was in SQL Server.
And over time, they expanded the usage, right?
They did more and more.
Like I remember,
MySpace had one of the first revenue sharing agreements for music.
So all the loyalties would be computed through Aster data because it had to do how much of a song did you listen to,
had to do with how much money you would pay to the labels.
So that was also based on event data.
So it had to be computed.
You know, very stressful, by the way,
to be running, you know,
queries for, you know,
what was like huge amounts of money
at the time out of our systems.
And so, you know,
and what's interesting,
I think about that architecture
is that before Aster,
just like with Google search,
if you had the large scale database,
you had to buy a mainframe.
You would buy a multi-CPU server from IBM.
You would buy a disk array from HP.
And you would have to spend $10 million just on the hardware, just to get you started, just to build a 10-terabyte data warehouse.
And so the whole idea with Aster was, okay, we bring storage and processing together.
You partition the data, you partition the workload.
Today, that's obvious.
But at the time, that was a very new approach, right?
Believe it or not.
And we were one of the very first vendors, teams to do it as a product.
And so, you know, that kind of led us to when the big data exploded there. There was in
2010, there was this whole explosion about big data. Big data was everywhere. It was in the
cover of The Economist. And then very quickly after that, the more legacy database vendors
like Teradata showed a big interest, right, and acquired us.
And, you know, subsequently, there was a couple more iterations of database architecture.
There was Hadoop, right, and the whole NoSQL movement, which didn't go very far.
And that kind of came back into SQL, but in the cloud, which kind of resulted in what we know today as, you know, the Snowflake Databricks type of architecture, which ironically
separated processing and data again, right?
So we went from data and processing being separate in the mainframe disk array world
to coming together in the MPP database world and Hadoop, and then it got separated again
in the cloud just because the network interconnects, right?
Became so efficient
that you could actually afford to do it when you couldn't do it before but that was kind of the
quick the quick story on master okay and i have a ton of questions there and then i of course i
want to ask about action iq but just a one quick if you will indulge me, can you talk about selling MySpace as your first customer?
Because, you know, again, that was in tech world some time ago, actually not that long ago.
Right. But it would kind of be like a database startup saying, you know, Facebook is our first customer, which is a really big sale.
And so can you just give us that story for the entrepreneurs in the audience?
Because that's just, I need to know that.
Yeah, no, it was really, first of all, it was a huge deal.
I think the deal itself was 10x the money we had raised at the time, just to give you
a sense, right?
So it was like, I think it was almost like 70s or something crazy.
Wow.
And the way it happened was that Ron Conway was one of my seed investors and he connected
me to Adam Bain, who ended up being the CRO for Twitter later on.
And Adam Bain at the time was running Fox Interactive.
Fox had just bought MySpace.
And so I was very fortunate to be dealing with a very entrepreneurial, I mean, for those of you
that know Adam, he's a super smart, super entrepreneurial guy. And he saw in us,
and he knew he had to scale MySpace, right? He was very growth-minded, right? He was very
ambitious and he knew that MySpace had to scale in order for MySpace to scale MySpace, right? He was very growth-minded, right? He was very ambitious and he knew that MySpace had to scale
in order for MySpace to scale.
MySpace data infrastructure had to scale,
right? So you had
a very
technically aware,
ambitious business owner.
So Adam was the person we
interacted with through Ron Conway.
And at the time,
you know, the reality is Adam and MySpace didn't have that many options, right?
So their options were SQL Server on the one side or Teradata on the other side, right?
And then SQL Server couldn't scale.
And Teradata, which probably for the amount of data we're talking about, would probably have cost, if I had to guess, close to $100 million.
Yeah.
That's the amount of money we're talking about, right?
Yeah.
So you had to do something.
And we were right in between, right, in terms of cost and what we could handle.
And, you know, we were, again, you didn't have many options, which takes us back to, I think, every time you close a big deal, the reason that massive deal happens is because the customer absolutely needs what you're selling.
It's like vital for them.
And there's no alternative that comes close.
Yeah.
And that transaction met both criteria, right?
It was critical that myspace
could scale their data operations and there was no alternative at the time and it paid off for them
right i mean we did you know a lot of what we promised we would do and i'm not sure what else
it could have done at the time you know again later on you know 10 years later there were a
lot more options sure at the time yeah they were. And so that's how this whole
thing came around. But to be very honest with you, I was out of school, I was 24 years old.
And if I had the experience I have today, I would never ask for so much money. Like that was crazy.
Like it was complete inexperience that made me ask for how much I asked for the time to be completely honest.
Yeah. Well, like a true technical founder, you sound simultaneously like someone who
has a deep grasp of the combination and separation of storage compute
and how to make an enterprise sale, which is kind of a very...
You have to, right? You have to learn this stuff. Yeah. And you know,
they are opposed to pricing was very rational, right? We're like, we, we ran the math and we're
like, all right, that's a reasonable price, but you just look at the price. You would get scared,
but yeah, the math was correct in the end. Yeah. Yeah. Well, yeah. I love hearing the phrase,
the math is correct. Okay. So tell us about Action IQ. And then we'll go back a little bit
because I want to talk about databases and different flavors
because you've got an interesting journey.
But what is Action IQ and why did you decide to start it
after working in databases?
Yeah, so I think by now it's probably obvious that Aster,
you know, we did a lot.
Aster was a generic database, right?
So we had a lot of use cases.
We did like, from the MySpace use case, we ended up working with healthcare companies,
financial companies, a lot of big banks globally, telcos.
But we ended up getting used for a lot of customer data because at the time, it was
the event data that couldn't be processed by the traditional databases.
So almost by accident,
a lot of our use cases were around customer data.
And subsequently, even at Teradata,
one of my observations was I was fascinated
how the vast amounts of customer data
that would live in the IT systems,
Aster, Teradata, whatever was your data warehouses, right? Massive amounts of customer data that would live in the IT systems, Aster, Teradata, whatever was your data warehouses, right?
Massive amounts of customer data.
And then when you would look what happens with the business,
which is where the value of data is supposed to be created, right?
Because at the end of the day,
IT doesn't store customer data for IT's purpose,
it stores it to power business use cases, right?
Or product use cases.
If you look at the business systems,
they could at best store 1%, 0.1%, 0.01%
of that customer data at best.
And the reason was there was this bifurcation, right?
So you could either buy a product for engineers
that scales like Aster,
or you would buy like an email tool for the business that
has almost no data infrastructure behind it.
And there's this huge gap in between.
So I started thinking a lot about that gap because in my experience, at Aster at Teradata,
oftentimes we would put a lot of customer data in those databases and we would succeed in doing what
we said we would do.
But unless the business got direct access to it, the value wouldn't be there.
It was almost like, you know, I used to joke that, you know, the operation was successful
but the patient died, right?
Because, you know, the data got into that place.
But the value was not there because, you know, the people were supposed to create the value.
They weren't technical.
They didn't know SQL.
There had to be people in between.
They were very slow.
The systems were not connected, et cetera, et cetera.
So I got fascinated by this problem of how do you bridge,
not just the systems, but the two worlds?
Because we're talking about different cultures, right?
The data engineering culture where I was part of is one culture.
And then let's say the marketing culture has a completely different culture that
value different things and understand different things, different language.
And so the intersection of these two worlds was fascinating to me.
And I decided to start the company to solve this problem.
And when I was starting the company, I wasn't sure exactly what it would look
like, but I knew it would have to scale with data as much as Azure data. And it would have to have a UI
that you wouldn't have to be a data engineer to use. That was kind of my two criteria when I
started the company. And that's how ActionIQ was created. Fascinating. Okay. And just give us the
pitch on ActionIQ. What does it do? Obviously, it's a UI on some sort of database, but why do people buy it? Data Warehouse, Data Lake, or multiple, right? So we can do data federation. We used to bring the data over,
but now we push the queries down with a composable model,
which we can discuss.
On the application side, we connect multiple applications.
So theoretically, every business system you have
that's touching the customer should be connected to ActionEQ.
So email, CRM, customer success...
Email, CRM, web personalization,
call center, direct
mail, client
and stuff for retail, right?
Decisioning system, next
best action
systems,
you know, product, right? Because
product is customer facing, right?
So there's really a very long tail.
I mean, there's probably,
an enterprise probably has like 100 to 200
of these systems, right?
At least.
Yep.
And those are sort of integrations
that you support.
Correct.
There's integration we support,
which can be push or pull.
And then on the interface itself,
part of the interface for the business user,
part is for data engineers, but for the the business users what we want them to do is to get access to an abstracted
version of the data and being able to say who do i want to target why and what do i want to do with
these people through what channel right and being able to deploy a new experience, right? The marketing person may call it a campaign, right?
But you can go beyond marketing with this.
Deploy something new, run a new experiment,
run a new test with customers
without having to write SQL,
without having to know what's this column,
this table, this data warehouse,
none of this stuff, right?
So we offer a sales service
that you didn't have before and we offer agility
so you can do things in a day that would take you a month to do before when it comes to creating
this new experience and orchestration right which just simply means email doesn't do its own thing
and web does its own thing now you have kind of something to coordinate what this one customer
sees through any channel they may happen to interact with.
Yeah.
I kind of think about that as like a, you know, marketing is kind of like a dag in many
ways, right?
It's just that the nodes be like different tools that are sort of emitting something
out of an API, but whatever.
That's my nerdy.
Yeah. sort of emitting something out of an API, but whatever. That's my nerdy.
And what's interesting also, because we say marketing and I'd say marketing a lot as well,
right?
But marketing in many places have become kind of the ambassador of the customer.
But if you think about it, many of what we're talking about is not marketing. For instance, we have a lot of B2B customers and technology customers you know atlassian is a
good example right and a lot of how you interact with your users there it almost looks like
customer success right which is not really marketing but it is the user interaction but
in most companies like companies are not organized around the customer or the user
they're organized around functions and revenue,
and they're more functionally organized. So marketing ends up taking the lead in many
places and saying, how do we align around this one customer, right? But it really becomes a
very cross-functional thing because most functions, right, if you think about product,
marketing, customer success, support, everybody's touching the
customer.
And in theory, everybody should be one, or if it's not one, it should be tightly coordinated
or orchestrated in some way.
Yeah, makes total sense.
Okay.
So I want to dig into databases a little bit here. When you built Aster, you built what you called a multi-purpose or let's say a workload agnostic piece of infrastructure. used heavily for clickstream data just because the infrastructure around more traditional
SQL-based databases wasn't optimized for that. That's a pretty different problem to solve than,
at least from my perception, than building a database that's geared towards essentially driving customer experiences.
So when you think about workload agnostic, you need to think about
sort of optimizations on a more general level.
Thinking about particular data types, handling a large variety of data types,
you know, lots of edge cases.
When you built Action IQ,
what was that transition like?
Because now you're really focusing
in on the customer data.
And so you can be much more opinionated.
Can you talk about how you approach that?
100%, yeah, yeah, yeah. So, you know, this is, it's really something that most database,
having been on the database side, right? You struggle with this problem all the time because
you have to support a long tail of use cases. And the tail is really long, right? And you get into
all this esoteric functionality that a small part of your use cases need,
and you have to support them, and everybody has to be an equal citizen almost.
And that becomes very difficult.
With ActionIQ, the day we started, both myself and my co-founder were both database backgrounds,
it was almost released, the feeling I would describe as released,
that we could say, screw that 90%.
How can we further optimize this, right?
How can we do it very quickly?
It was truly released.
I mean, and I can give you a few very basic examples, right?
We support arbitrary data models,
but at the end of the day,
you have some customer identifier
and you have some event timestamps.
Even simple things like that allows you to make decisions that can optimize performance,
can optimize storage in ways that you could never do in a database because it would break
a lot of things.
To give you another example, we do a lot of segmentation-like operations in the UI. Yep.
Segmentation, turns out it's a left outer join, right, in SQL at the end of the day.
Yep.
So guess what? We have a really fast left outer join optimizer, right?
Yeah.
And this is like one of a hundred different types of joins the database has support,
right? But in our case, it just happened we knew that as we get 10x the usage of anything else
and we could optimize it, we could optimize it day one.
So, you know, it was all these things we knew we could always do, right?
I mean, I knew I could do all these things in the past,
but we wouldn't do it because you have to support all these things.
And it's really a blessing and a curse, right?
I mean, the reason why, you know, a data breaks a snowflake, have this huge valuation is because
they can support all these use cases.
Sure, sure.
But then the technical complexity that gets created is kind of enormous.
So I would say we were careful to not put too many constraints in the action IQ data
queries and all the whole, how we structure
the system.
But even the very basic constraints we were able to do just because we knew we were dealing
with customer data that had some very basic properties allowed us to do things that, you
know, we would have never done in a database.
So that's kind of the database side, right?
But for me personally, that was the easy part because I was, you know, that was kind of
my expertise.
The difficult part was understanding how the business users wanted to use the system.
And that was a learning curve, right? And what we did there was that our first customer was
this e-commerce company in New York, Guild Group, that used to be a NASA customer as well.
Oh yeah, I, Guild Group, that used to be a NASA customer as well. Oh, yeah. I remember Guild Group.
Yeah, and they had a great, very technical team, very sophisticated team at the time,
right? Very high-flying. I mean, the whole space kind of fizzled away, but it wasn't nothing they could do about it. But for the first year of our life as ActionIQ, we got
to sit next to Guild Group's marketing team.
So our actual physical seats,
desks were right next to our customers
for a whole year.
Oh, wow.
And so we could see,
I could see them from my desk using our product
and we would talk to them.
We would have lunch with them.
They would tell us what they're doing,
how they're doing it.
A lot of these people came
from the financial services sector.
So they brought a lot of very mature best practices for crm
and then you know our first hire or one of our first hires was a ux designer because we knew
that was not i mean you know again we're database people right we're infrastructure engineers so we
wanted to have ux for the rows and columns, right?
Yeah.
And SQL and C++, Scala, right?
It's how we use Scala to build the product.
I wrote code for the first year, right?
I hadn't written code in a while.
It was fun for me to get back to it for a little bit.
But we knew when it comes to UX, that was another thing, right?
So we made a UX hire super early.
We ended up hiring a great person.
We sat next to our business users very early.
We forced ourselves to get close to where I felt were the weakest.
Because ActionX is all about bringing data infrastructure and business application in one company.
And again, this is not about the technology,
it's about the culture, it's about the mentality, right?
That's why I did the company, right?
It was fascinating to me to do something like that,
but you had to force yourself to get uncomfortable
early on to bring these two things together.
Yeah, I love it.
Now, I mean, what a great story about just,
I mean, I think that's actually just Sage
advice in general about, you know, having a desk next to your first customer.
I was worried that you were going to say our first customer at Action IQ was MySpace.
Yeah.
I'm glad you didn't say that.
Yeah, no, you didn't say that yeah
no I didn't
we actually wanted I mean you know we needed
a local customer to do that right
so I tried really hard to find a customer that was
local to us in New York City
okay so
I know Kostas has a ton of questions but
one more question for me
maybe two
can we talk about the data model? So you said that
you worked really hard not to put constraints on the queries that Action IQ is able to execute
from a segmentation standpoint. Okay. I get that in theory. The world kind of runs off of the Salesforce model, which is like a lead contact account,
you know, however, you know, that basic data model, right? And at the end of the day,
they're sort of an end user, whatever their relationship with the hierarchy of other
entities that your business is interested in. Right, right, right.
You're sending a message or an advertisement to a particular user or group of users, right, right. the data model that they have does not actually afford flexibility to represent,
let's say a business model or a data model like Gilt Group, right? Really hard to represent that
in sort of the rigid Salesforce data model. How do you approach that from a database standpoint?
So you want flexibility, but you also need to have some
sort of underlying data model that allows the UI to create a sense of logic and predictability for
an end user. How do you reconcile those? Yeah. So first, maybe for some context,
to talk a little bit about where our users and customers. So we started in, you know,
GuildGrowth was essentially a retail company, right?
E-commerce, but retailer.
So we started in retail, but since then we've expanded
and, you know, we do all kinds of different B2Cs, right?
So we do, you know, a lot of media, right?
Like, you know, folks like News Corp, Washington Post, Sony.
We do a lot of financial services and we do a lot of B2B, right?
We do a lot of combination of B2B and B2C.
Folks like Dell, right?
HP and others, all big enterprise, right?
Big enterprise, B2C to B2B.
When I started Action IQ, I had, you know, again, all my experience was in the data space, right?
And my observation was that setting up the data model and the ideal pipelines sucked.
That's where things took a lot of time, right?
And would get complicated.
So my first criteria in Action IQ was that I wanted to be able to reuse the exact same model that existed in the data warehouse.
Which, by the way, at the time, now with composability, it's obvious.
At the time, I think we were like five, 10 years ahead of time, right?
When we said that.
Oh, no question.
Like marketers weren't thinking about the data warehouse 10 years ago.
Right.
And many still aren't today, actually, but that's probably another topic.
Yeah.
And even vendors, right?
I mean, if you look at the big vendors, right?
Every vendor has its own data model and they expect someone to take the data from wherever
it is, maybe the data warehouse, and load it, ETL it into their own data model.
But when we started ActionIQ, you know, like eight years ago now, we said we have to be able to reuse
the same data model. We can maybe augment it or we can put metadata on the data model,
but we want to use exactly the same data model that lives in the data warehouse. Because again,
my goal was not to build a new data mart with customer data, my goal was to take all the data that exists on the IT side
and make it accessible by the business side.
So that forced certain things, right?
So the approach we took early on was to say, we're going to support whatever data model
is there, and we're going to allow the users to tag the data model to tell us what is an
identity, what is an event, what's a timestamp, and
also what are the joint graphs, right, in that data model.
But we would leave the data model from the data warehouse.
Essentially, it's more like caching the data on ActionIQ more than loading the data or
transforming the data.
I would keep a cache of the data with ActionIQ.
Interesting, yeah. And then you could tag the data model on top of it, would keep a cache of the data with ActionIQ. Interesting, yeah.
And then you could tag the data model on top of it,
but we had to be able to support it.
That led us actually to implement our backend database
as an in-memory database
because we had to support arbitrary queries
and arbitrary models with interactive times, right?
Which is pretty hard to do, generally speaking.
But it was essentially, it was a lift, right?
It like lifted and moved it versus transformed it.
Wow.
And so since then, right, we have expanded that.
And I could talk how that ties to the UI and everything else if you're interested.
But now, for example, the B2 b2b use cases right would tend to have
more complex identity as you were saying when we say we supported one identity now we can support
a hierarchy of identities right so you have to have a user that that's part of an account that's
part of it you know like big client whatever that may be right so we expanded support the concept of
more hierarchical identities and a whole bunch
of other stuff. But a fundamental principle, that one requirement, right, we set for ourselves
forced us to be very open about what kind of data models we support.
Fascinating. I mean, I have a hundred more questions. The
supporting arbitrary queries makes the whole thing make a lot more sense. And having a caching layer makes the whole thing make a lot more sense.
Because traditional SaaS vendors force you to basically
input your data into pre-existing queries that they run already.
Okay, I will stop.
Costas, please jump in.
I'm going to hand the mic to you.
Yeah, thank you, Eric.
So Tasso, you mentioned that you started seeing this use case of clickstream data,
like user event data, since your time in Aster Data, right?
I want to ask you, first of all, about what's unique about this data from a technical perspective,
right?
What makes it so challenging, or maybe not that challenging, we'll see.
To accommodate, let's say, the processing of this data at scale
in traditional databases, right?
And how these things have changed also, like, through time,
because of master data today, like, there's a lot of progress that has happened.
But tell us a little bit about that because my feeling is that it's also pretty unique in some ways, the type of data that you have to work. That's right.
So that challenges the systems itself. So tell us a little bit more about that.
Yeah. So from a NASA perspective, first of all, I'll come to your question very quickly,
but just to state maybe the obvious, the reason why Clickstream data we work with was a business
reason primarily, which is the dollar per terabyte value of the data was low. So if you're a bank and
you have data about your customer's accounts and their balances, that data is worth a ton of money.
It's very low versus the value.
Clickstream, you don't even know if it's valuable at all until you analyze it.
Right.
And maybe it's not so you can almost do it two by two and say high value,
you know, large scale with low dollar per terabyte.
That's the data we dealt with, right?
You had a hundred terabytes of low value data that would come to us
because we could support the cost structure, right?
To make it economical.
But to answer your question,
I think what people don't realize
is that click stream data is time series data.
And that's where a lot of the complexity comes, right?
So a lot of what we had to do with Aster,
people were interested,
not just in searching the data, but saying, okay,
what is the sequence of events that leads to something good or bad, right?
So you have, you know, we did a lot of stuff like, okay, I remember, right?
We had a big grocery chain as a customer that we're trying to figure out what are the gateway
products? chain as a customer that we're trying to figure out what are the gateway products what's the kind
of product that if a customer buys now they used to buy only you know groceries from me now they're
buying all their meat and fish or whatever right so what is the what's what are the paths in that
clickstream data that lead to positive or negative outcomes. And that's time series data. Now, SQL is really bad language for time series data because SQL essentially is a way to model
set theory.
It's very good with set intersections, unions.
That's what SQL is at the end of the day.
But time series is not that, right?
So we ended up building a lot of custom functions that would allow our users back at the
yesterdays right it was still sqlite would allow our users to do time
series queries on top of clickstream data
so the way we would organize the data store the data partition the data
and we expanded sql to support time series queries that was a lot of the
innovations we did in addition to the basic architecture
right that was shared nothing but that stuff can get very complicated
because unless you know exactly what you're doing
from an implementation perspective,
you know, if you try to do time series analysis
with basic SQL, it's just extremely slow, right?
You have to shift a ton of data around.
It just doesn't work.
So that is one difference.
Yeah.
Okay.
That's awesome. By the way, question,
now like naive question, but we have time series data, right? We pretty much like in
the industry, we have like a dedicated type of databases, like for time series data,
right? Especially for things that work like in the observability space right because all these
things at the end like time series data i do have my opinion what the difference there with customer
data though but i don't know yeah no like my my opinion on that why not go and use one of
these solutions right that technically at least they are supposed to be working well for data that are time series
and also as you said
they have a very low let's say
value per terabyte of data
data like from like a data center
like okay whatever
like
yeah yeah yeah
it's interesting when things start breaking
but until then it's just like a lot of noise
right so
yeah
so back then first of all most like a lot of noise, right? So why would we...
So back then, first of all, most of the systems didn't exist, right?
Like when NASA existed, there was no time series databases.
I mean, I shouldn't say that.
Let's say they weren't popular, right?
It wasn't something we were aware of at the time or were looking at.
I think today you have the option of using that for some
of that, but also the customer queries are a little bit different. So I'll give you a very
concrete example. And I know we have a technical audience, right? So just to go for one minute,
so it's a little bit more technical. One of the things we created at Azure was this thing called n path you would give a regular expression of events it could be a b star
c and we would map that to the clickstream data and you could define what a b and c are right so
it could be a is like you enter your website on this page b star is you do like zero more of these
things and since you end up in this checkout thing and we would map this regular expression
on the time series data to help you find patterns across your customers that's not what time series
databases do right time series databases for the most part they're concerned with calculating
aggregates and other metrics right on top of the data here we're looking for behavioral patterns
that span weeks potentially right, right, of data.
So I would argue even today, it's probably a different problem.
But at the time, there was not even the option of the time series databases to be considered.
Yeah, 100%.
I totally agree with you.
Actually, I think user event data and click clickstream data have like this very unique characteristic of
being like a time series but there are like quite a few dimensions on that to each point that's right
that's right makes the problem like quite different than having let's say calculating cpu
usage right that's like a completely different type yeah you might have i don't know like when
we're calculating like cpu signal like from a data center, probably you have orders of more data.
But the dimensionality of the data is like much lower.
And that makes like a huge difference.
That's right.
Like in what kind of like...
That's exactly right.
And I think subtle differences in the actual order of events or sequence of events matter with behavior, right? So again, like I think in observability, a lot of it is about the aggregate metrics or
when you hit a certain threshold and this and that.
With customer behavior, right, you're looking, why are people dropping off, right?
It's a why, right?
It's not a what or a when, it's a why.
Why are people dropping off my website at a certain point, for example?
And for example, a big part of this is how do you visualize the data to give people the
opportunity to notice patterns and figure out what questions they should be asking in
there.
So you have all these fancy diagrams that we had actually. Again, at Atasta, we didn't do much of that,
but we had implemented some light UI on top of it.
Or to give you another example,
one of the interesting use cases,
early use case at Atasta
was the people you may know at LinkedIn.
So LinkedIn was an early customer.
One, the lead data scientist at the time
was a person, brilliant guy called
Jonathan Goldman. He worked under DJ Patel, when they're becoming the, I believe he was
the chief data scientist of the United States later on. And Jonathan used this technology
to create the first version of the people you may know, right? That now is ubiquitous.
This has nothing to do with metrics, right?
You're trying to see how do people connect with each other and what this graph says about who you may or may not know.
That's not typically what time series databases
would be concerned with,
but it is an events-based or network-based problem, for example.
So, yeah, we have some really fascinating early use cases
that now there's probably more diverse tooling.
Like I don't think today you would use a single platform
maybe to do everything.
But still, there's many of the stuff we were doing back then
that I'm not sure there's a clear replacement
for that type of operational analytics.
I think today what people end up doing is, you know, you load them in a data lake, right?
And then you can deploy something like a Databricks that you run, has a more flexible language, right?
Beyond SQL.
And you essentially write some custom analytics and custom code to do what you have to do. So I think the modern approach has a lot more processing power available and is
more customized, but is less abstracted.
Right.
So I would say the world has moved on probably for good reason, but I'm
naturally simpler to do today the kind of things we were doing back then.
Yeah, yeah, 100%.
And do you think, you talked about something very interesting.
You said about the value per terabyte of data, right?
Back then.
And someone who is, let's say, oblivious of what's going on with this industry would say, but the data
you are talking about here,
with Action IQ, it's
customer behavior. Isn't this
the most important type
of data that you have in a company?
Right?
So, especially
today, with all this ML,
AI stuff,
let's say the workloads like
they are shifting a little bit more on like building predictive power on top of like behaviors
and all these things that like okay like 15 years ago probably there were more of any cases do you
think that's this like tag of like like dollar price, sorry, per terabyte is changing
because of these new use cases and technologies,
or it has remained still kind of like dealing with logs, let's say.
Yeah.
I mean, I think today the cost of processing data has come so low, right?
Storing and processing data is so much cheaper today than what it used to be that I think people look more at the aggregate value processing data has come so low, right? Storing and processing data is so much cheaper today
than what it used to be
that I think people look more at the aggregate value of data.
Because when I say dollar per terabyte is low,
you have so much terabyte that in aggregate,
it could be super valuable data set, right?
So I feel today the conversation has moved again
for good reason,
and it's less about the dollar per terabyte
and it's more about what's the aggregate value and
the other thing we see is that the cost is in the processing more than is less how much data you
have it's more like how expensive is the processing you want to do with it you can store date almost
any data so you have today for very little you can run simple processing on top of it for very little but as we've seen with
llm training for example right if you try to do some very complex processing can get
extremely expensive extremely quickly so now i feel the metric is dollar per
cost of model right or cost of model is dollar per value dollar to train the model over the value of the model
yeah yeah it's not about data size anymore is that about storing it's about okay how expensive
is it and then human labor is super expensive right again part of why action iq is so successful
is because every time you try to have humans interface between business and
the data, they become a bottleneck very fast. There's just not enough competent people that
you can insert in between the business and the data to make it happen. So if you can make the
business even a little bit more self-service, a little bit more agile, that's a huge win.
And it's not that you're saving money.
I mean, you're still going to hire as many data engineers as you can.
It's that instead of something takes a month because everybody's waiting on everybody else,
now you can do it in a day that allows the business to move at a much higher speed, right?
So I think these are the modern problems, I would say, or challenges.
So the work has moved on, but there's still those questions, right?
It's like, you know, how much is it worth for a modeler and insight?
But it's just a different ratio that people are using to think about it.
Yeah, yeah.
No, that's an excellent way to put it.
So a product question now.
It's one thing to go and build a product
for one persona, right?
So, when you were at AstroData,
you were, I mean,
younger, but fortunate enough to
primarily build a product for
a very specific type of persona, which is
the technical persona, let's say.
Like a system engineer
or whatever.
It's a completely different set of problems when you're trying to build something
that has to be good for multiple personas.
It's not just multiple personas.
Here we have some people that, in some cases,
they pretty much hate each other
because of how different they are, right?
So we have a data platform.
So naturally, you need the involvement of some data engineering or some IT people, at least.
And then you have the marketeers, right?
The people who are actually interacting with the data and creating value out of these.
And these two personas are very different. Many organizations probably
don't even talk to each other because they are so... Not because they hate each other,
but just because of the different functions there. How do you build a product when you
have to keep happy both of them, right? And build user experiences for both of them to succeed at the end.
Yeah, it's a great question. Great question. So first, for context, ActionIQ, we're very
enterprise-focused, right? And usually, when we go into an enterprise, what we find is that
there is a structure to do ActionIQ-like things. How does this work, right? There's a team that sometimes is called an analytics team
or a marketing operations team or something,
but these are people that understand data,
they know how to write SQL.
And then there's a business team
that's using this marketing ops team
as a concierge team, essentially, right?
So you have the business folks
and then they'll send an email,
they submit a ticket,
they buy gifts, you know, to these people. And they're like, can you please pull me a list with
these people? Or can you please help me understand how many customers we have that meets criteria?
So our goal, so we assume there's something like that there and some collaboration.
Why is this important? Because it's marketing ops folks, they already know the data
and they know what the business wants
and what language they're using, right?
So these are our eyes
in essentially implementing, deploying,
and actually IQ.
And the way I think about it
is that I want to take these marketing ops folks,
which usually they're very competent,
they have a lot of skills, right?
It's analysts.
And turn them from a one-none responders to requests
to being the administrators, the configurators,
and the power users of Action IQ.
And take 90% of those requests,
and once they configure Action IQ,
they push them to the higher layer interfaces we have
and give them over to marketers, right?
So in ActionIQ, for example,
there's a translation layer
that you can take database concepts, right?
Like a table or a column
and rename it, reformat it,
or do something to make it presentable to the business, right?
It's almost like a dictionary
that translates database terminology to business terminology.
And because these marketing ops, data analytics, customer analytics teams, whatever they're
called, right?
Center of excellence, right?
Sometimes have been going back and forth service and request for years.
They know how to do the translation already.
It's a matter of giving them the tools
where they can point ActionIQ to the data sources, right?
Presumably one or more data warehouses, right?
Or data marts.
Build that dictionary of terms, right?
And then expose that to the business user.
And then they only get involved
whether it's like new terms, new data, new requirements,
or it is something that's so complicated
that the business needs someone to double check or whatever.
But 90% of the stuff gets automated.
And then, you know, we have, we're an enterprise company, right?
So we support a lot of governance things.
So you can have, you know, the analyst approving things
before they go out, they can double check,
you can have like checks and balances, right?
To make sure that if they want to oversee what the business is doing, they can also do that as well.
But we try to teach people how to fish, right?
I mean, that's the idea.
Instead of like, you know, trying to feed them one fish at a time, teach them how to fish.
And it's also a huge improvement of life for the analysts.
Because these high urgency requests that come from the business you wake up in the morning
and you have like five emails because somebody needs something today it's really not the fun
part of the jobs you know for more people in data engineers or data analysts yeah 100 okay that was
awesome so i would like to spend like a a couple of the last minutes that we have here
on talking a little bit more about CDPs as a category of products
and also talk about something that we hear a lot lately,
which is composability over them, right?
So what does it mean for a CDP,
like a customer data platform, to be composable?
What are the semantics behind that?
I'll ask two questions, actually.
One is from a technical point of view
and one from the customer,
like the user point of view, right?
Yeah, exactly.
So composability in general, right,
to start there and beyond CDP,
what it means is that essentially
it's a different world for specialization
and optionality. So instead of having one thing that does everything, you have one thing that
may be doing a lot of things, but that technology gives you the opportunity to delegate certain
parts of its functionality to other systems. Specifically in CDP, the biggest thing composability
means is that instead of copying data over from the data warehouse to the CDP, the biggest thing composability means is that instead of copying
data over from the data warehouse to the CDP and doing the processing in the CDP, right, in the
CDP vendors cloud, the CDP sends the queries down to the data warehouse, right? It sees the data
model, it doesn't copy the data, it pushes the query down to decrypt the data up. Now, like the
term CDP itself, right, composability is being abused today by
certain vendors right so certain vendors use the word composability to mean you know they will talk
for example about having a connector with a cloud data warehouse but that connector doesn't push
anything down right just gets the data out but they make all this composable. So you have to be a little bit careful with, you know, the poetic
license that marketing always has.
As I'm sure Eric and all of us know.
But you know, it's composability for CDPs means the cloud data warehouse or the
data warehouse is your processing engine, essentially.
Now, the way we mean it specifically at ActionIQ, we have what we call hybrid compute, which means you can have most of your data in Databricks, right, or Snowflake or Teradata, but you can
still have some data in ActionIQ, if that makes sense.
It's completely up to the user.
And you can have multiple systems that we access to get the data.
So we support essentially a query federation layer as a base layer of ActionIQ.
And on top of that, we have maintained our own ability to store and process data, right?
Now, the beauty of this is it's completely up to, you know, the analysts and the data
engineers, right, to decide how they want to manage this configuration. You can have a single cloud data warehouse and all the data engineers to decide how the one-to-one is configured, you
can have a single cloud data warehouse and all the data is there, and that's all we use.
Or you can have two or three.
Maybe it's a cloud data warehouse and a couple of analytics data mark that has the data that
we access.
Or you can have the same thing, but also have some data on ActionIQ.
As far as the user is concerned, they do not know and they shouldn't
know. It's completely transparent, they play in the UI, they click buttons and the queries are
routed appropriately, whether it's in our customer's IT systems or data systems, whether it's
an ActionIQ-owned systems, the data is composed appropriately and the results represent the UI. So we provide a lot of flexibility there.
The benefit of composability, right, is that you don't have to move.
It's governance and security, right, largely.
Like the moment you copy data, you have to build pipelines.
Data copies can run out of sync, even if they shouldn't run out of sync.
Like if you know, you know, kind of a thing, right?
If you have the data stored in many places, it can get out of sync, you can have definition
problems.
And then more and more, there's more concern about security and privacy among our customers,
right?
There's legislation, you know, GDPR, all these things, more awareness around information
security and risks.
So our customers love to not have to move data,
not only for governance, but also for security purposes.
And so, you know, there's a lot of benefits
that come with composability.
And this is something we have developed,
I would say, the last few years.
We would have started there,
but when we started ActionIQ,
the problem was the database signals at the time
did not scale well enough
to support an interactive UI.
The reason why this works today is because
storage and compute is separated
again. And compute is a lot
more elastic with the
modern technologies.
Again, the Databricks,
Snowflakes, even Teradata,
Redshift, everybody's evolving to separate compute and storage and make compute elastic.
So you can support a much more diverse mixed workload on top of the systems today versus what you could do before.
10 years ago, if you had Action IQ going directly, like an Oracle system, right, or whatnot, even the smallest query would probably take half an hour, right?
I mean, things tend to take a lot of time,
especially when you have other queries that are high-priority running,
and that just could not work.
But being database people, right,
the moment those systems became able to support this type of workloads,
we immediately said, this is it, that's the future.
We've seen it, we know it,
and we evolved
the product to support essentially
this hybrid architecture
that can do either or.
That's so interesting.
Actually, we probably need to have a full
episode just to talk about that.
In myself,
coming from Starburst and
working with a federated query engine,
what I find
extremely fascinating here is
how much the workload
matters, actually,
in making federation work or not.
I feel like
this case, federation couldn't work.
That's right.
One last thing,
before I give the microphone back to
Eric, I find it fascinating how things make cycles in a way. And one last thing before I give the microphone back to Eric.
Just, I don't know, I find it fascinating how things make cycles in a way.
The first time that I heard the term customer data platform was related to treasured data.
And it's funny because treasured data, and I'm pretty sure I'm not wrong here,
but they built the first version
of the platform on Presto that was a video query.
So it's interesting to see how the concepts go back and forth and how things need to mature
to make things actually work, right?
Because probably they were kind of too early in what they were doing back then in terms of like making, let's say, the technology like work back there. But it's...
Yeah, it's a little bit different. It's a little bit different. I mean,
different CDPs do slightly different things also, right? Which I think makes it very interesting.
But I mean, Tracer data doesn't talk about composability at all, right?
Oh, yeah.
And think they support it or the plant support it. But part of the reason is
a lot of what they're doing
is bringing data together.
Like there's some CDPs
whose job is to build data marts,
like a customer 36
with a data mart,
less than access
a customer 36
if that's somewhere else.
Yeah.
And if you're building
the customer 36,
it doesn't make sense
for you to be composable, right?
I mean, you are,
you're competing
with the cloud data warehouse, essentially,
right, if you're that type of SDP.
But if you're the type of SDP like us
that's accessing data, we started
as I mentioned before, right?
The first founding principle is
use whatever data model is in place
in the data warehouse, then
composability makes a ton of sense and it fits
really well into our model.
Yeah, 100%. Anyway,
we need to definitely find more time to talk
about that stuff. Yeah, it's fascinating.
Super fascinating.
Eric,
microphone is back to you. All yours?
Yes, well, we're at the buzzer
as we like to say, but Tasso,
okay, here's my question. This is more of a
personal question.
So,
you have had a very unique journey in that you founded a data infrastructure company, a database, and sold it, which is extremely difficult to do in its own right.
And now you've built a successful company that serves business users.
Okay.
So if you had to start something new, but it could not be in SaaS at all, what would
you do?
Oh, man, you know, that's a great question.
The reason why I started Action Accus is because I love learning new things.
And that showed, you know, it was such a big new challenge.
So I haven't thought about it.
You know, I'm so obsessed with what I'm doing right now that I haven't thought what that would be.
But probably would be something that it could benefit from data, but it would have nothing to do with either SaaS or data infrastructure,
right?
I would probably take my skills.
I mean, I'm a huge believer in interdisciplinary opportunity.
I think the bigger opportunities are in these Venn diagrams, right?
Where lots of people do A and lots of people do B, but very few people understand A and
B together.
So I would ask myself, right, now that I've done A and B,
what would be that C thing, right, that would benefit from everything?
That's how we think about it.
But maybe I'll think that question for the next time we talk, Eric,
and I'll do the answer.
Absolutely.
Yeah, we'll do another episode on your future.
Also, thank you so much.
I have so many opportunities.
Yes, indeed. Thank you so much for giving many opportunities. Yes, indeed.
Thank you so much for giving us the time today.
What a great episode.
Yeah, I really enjoyed it, guys.
Thank you so much for having me here.
Really fun.
We hope you enjoyed this episode of the Data Stack Show.
Be sure to subscribe on your favorite podcast app
to get notified about new episodes every week.
We'd also love your feedback.
You can email me, ericdodds, at eric at datastackshow.com.
That's E-R-I-C at datastackshow.com.
The show is brought to you by Rudderstack,
the CDP for developers.
Learn how to build a CDP on your data warehouse
at rudderstack.com.