The Data Stack Show - 40: Graph Processing on Snowflake for Customer Behavioral Analytics
Episode Date: June 16, 2021Highlights from this week’s episode include:Launching Affinio and the engineering backgrounds of the co-founders (2:36)The massive transformation in customer data privacy regulation in the past eigh...t years (6:23)Creating the underpinning technology that can apply to any customer behavioral data set (10:05)Ranking and scoring surfing patterns and sorting nodes and edges (14:13)Placing the importance of attributes into a simple UI experience (19:28)Going from a columnar database to a graph processing system (25:20)Working with custom or atypical data (32:46)The decision to work with Snowflake (37:43)Next steps for utilizing third-party tools within Snowflake (52:18)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are
run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
Welcome back to the Data Stack Show.
A really interesting guest today.
We have Tim and Steven from a company called Affinio.
And here's a little teaser for the conversation.
They run in Snowflake.
They have a direct connection with Snowflake, but they do really interesting marketing and consumer data analytics,
both for social and for first party data using graph, which is just a really interesting concept
in general. And I think one of the, one of my big questions, Costas, is around the third party
ecosystem that is, that's being built around Snowflake. And I think that's something that
is going to be really, really big in the next couple of years. There are already some major
players there, and we see some enterprises doing some interesting things there. But in terms of
mass adoption, I think a lot of people are still trying to just sort of get their warehouse
implementation into a good place and unify their data.
So I want to ask about that from someone who is playing in that third-party Snowflake ecosystem.
How about you? What are you interested in?
Yeah, Eric, I think this conversation is going to have a lot of Snowflake in part of it. One thing is what you talked about, which has to do more with the ecosystem
around the data platforms like Snowflake.
But the other and more technical side of things is how you can implement
these sophisticated algorithms around graph analytics
on top of a columnar database like Snowflake.
So yeah, I think both from a technical and a business perspective,
we are going to have a lot of questions around how Athenio is built
on top of Snowflake.
And I think this is going to be super interesting. Cool. Well, let's dive in. Tim and Steven,
welcome to the Data Stack Show. We're really excited to chat data warehouses. And personally,
I'm excited to chat about some marketing stuff because I know you play in that space. So
thanks for joining us. Yeah, excited to be here. Thanks for having us. We'd love to get a little bit of background on each of you and just a high-level overview
of what Affinio, your company, does for our audience.
Do you mind just kicking us off with a little intro?
Yeah, absolutely.
I'd be happy to.
So pleasure being on, guys.
And realistically, just to give you a quick sense of sort of what Affinio is all about,
a little bit of background.
So we've created Affinio about eight years ago and started off with a really simple concept where eight years ago,
Steve and I happened to be running a mobile app B2C company. And instead of looking at social media based on, you know, to see what people are talking about our brand,
we started off with a really simple experiment of looking at who else our followers on social were following. And that afternoon, we sort of
aggregated that data and saw sort of the compelling opportunity against this sort of interest in
affinity graph that nobody seemed to be using or utilizing for basically advertising and marketing
applications. And we thought it was just a huge opportunity. So we had doubled down and created
what continues to be our core intellectual property, which is a custom built graph analytics engine under the hood.
And what we've done is over those eight years, we basically leveraged, you know, analyzing essentially social data as a starting point. enterprise customers really excited about what they could unlock from both insights and
actionability against the data that we were providing them with, as well as basically
using our technology. So over the last two years, we made a conscious effort to double down
and start porting a lot of that core graph technology directly into Snowflake. And most
recently, and we're just about to announce sort of the release of four of our four essentially apps
inside the Snowflake marketplace that enable organizations to essentially use our graph
technology directly on their data without us ever seeing the analytics and without us ever seeing
the output so it's in a completely private format all leveraging sort of the secure function
capability and Snowflake and the data sharing capability. So super excited to be here. We're obviously huge fans of both Snowflake as
well as sort of warehouse first approaches. And we think the opportunity between Affinio and
Rudderstack is a great compliment. Very cool. And Tim, do you want to just give a quick 30 second
or one minute background on you personally? Yeah, certainly. So Tim Burke, CEO of Affinio. My background is actually mechanical engineering.
Steven was on the show and my CTO and co-founder. We've been working together for 12 years now,
both engineers by trade. He's electrical, I'm mechanical. I do a lot of the biz dev and sales
work within Affinio, obviously from my position, a lot of customer-facing activities
and all that. Stephen, introduce himself. Yeah, I'm Stephen Hankinson, CTO at Affinio.
Like Tim said, I'm an electrical engineer, but I've been writing code since I was about 12 years
old and just really enjoy working with large data, big data and solving hard problems.
Very cool. Well, so many things to talk about, especially Snowflake and
sort of combining data sets. And that's just a fascinating topic in general. But one thing that
I think would be really interesting for some context. So Affinio started out providing
graph in the context of social. And one thing I'd love to know, so you started
eight years ago, and the social landscape, especially around sort of privacy and data
availability, etc, has changed drastically. And so I'm just out of out of pure curiosity,
I'm interested to know, you know, what were the kinds of questions that your customers wanted to answer eight years
ago when you sort of introduced this idea? And then how has the landscape impacted the social
side of things? I know you're far beyond that, but you have a unique perspective in dealing with
social data over a period where there's just been a lot of change, even from a regulatory standpoint.
Absolutely. I would say you nailed
it on the head. It's been sort of a transformational period for data privacy, customer data privacy.
And that, you know, first and foremost, has probably been one of the biggest impacted areas
has been, you know, social data as a whole. So we've definitely seen a massive transition,
right? I mean, I would say that a lot of that transition over the last
few years is partially, you know, a change in our focus for that exact reason, right? Recognizing
that deprecations in public APIs, deprecation sort of available, you know, privacy aspects of that
data availability across social has changed drastically, right? And so for us, it was been,
you know, we've been sort of, you know, first, you know, at front of the line watching all this
happen in real time. But for us, the customers at the end of the day are still trying to solve
the same problem. It's how do I understand, learn more about my customers such that I can,
you know, service them better, provide better customer experience,
find more of my high value customers, like net net, I don't think the challenges change. I think
the assets against which those, you know, the data assets against which those customers are
actually leveraging to find those answers is going to change and has been changing, right. And so
what we're trying to do is our move from sort of our legacy social product, much of the time was addressing deeper understandings of the interest profiles and rich sort of interest profiling of large social audiences is kind of where we got started. valuable assets or valuable insights for a marketer, because when you understand the scope
and depth of, you know, your audience's interest patterns, you can basically leverage that for
where to reach them, how to reach them, how to personalize content, knowing what offers they're
going to want to, you know, click through to. And I don't think that's actually changed, right? I
think that what people are recognizing more, more so than anything, and obviously, you guys would
see this firsthand as well, is,
you know, many of those data assets that I think many organizations were willing to either have
vendors collect on their behalf or own on their behalf, it has changed drastically. And now it's
requiring basically these enterprises and organizations to own those data assets and be
able to do more with them. And so what I would say is what we're seeing sort of firsthand is the markets come around
to recognizing the need to collect a lot of first-party data.
Many organizations have obviously put a lot of effort and a lot of energy and a lot of
resource behind sort of creating that opportunity within their enterprise.
But I would say, quite honestly, what we see is that there's a lack of sort of ability
to make meaningful insight and actionability from those large data sets that they're creating. So
that's kind of what our focus is on is sort of trying to enable the enterprise to be able to
unlock at scale applications no differently than what we've done previously on massive social data
assets, but in this time on their first party data and natively inside Snowflake, you know, privacy first format.
Super interesting.
And just one more follow-up question to that.
I'm at risk of nerding out here and stealing the microphone from Costas for a long period,
which I've been known to do in the past.
In terms of graph, was the transition from sort of third party social data to accomplishing similar things on your first party data on Snowflake, was that a big technological transition?
Or, I mean, I'd just love to know from an under the hood standpoint, how did that work?
Because the data sets are, you know, there's similarities and differences.
No, it's a great point. I mean, for those not sort of familiar with graph technology,
obviously, the foundation of sort of, you know, traditional graph databases are founded on sort
of transforming, you know, relational database into nodes and edges, right, and looking for
essentially connectivity or analyzing the connectivity in a data asset.
So our underpinning data technology, which Stephen created firsthand, is this custom-built graph
technology. It analyzes data based on that premise. Everything is a node, everything is an
edge. At that primitive level, it enables us of ingest and analyze any format of, you know,
customer data without having to do drastic changes to the underpinning technology. And
so what I would highlight is that we've, you know, we're the most compelling data assets that we can
analyze and the most compelling insights you can gather typically are driven by customer behavioral patterns, right? So unlike traditional, I would say, demographic data,
which has its utility and obviously always has
in a marketing and advertising application,
but I would argue that demographics has traditionally been used
as a proxy to a behavioral pattern, right?
And what we see and what we see the opportunity to unlock
is that if you're analyzing and able to uncover patterns inside of raw customer behavioral, which ultimately are simply a surrogate to that
underpinning behavior you're looking to change. What we're seeing and what we see as an opportunity
is across these massive data sets that are basically being pulled into Snowflake and
aggregated in Snowflake. When you start to analyze those behaviors at the raw level and unlock
patterns across a massive number of consumers at that level, you can then start actioning on that and leveraging
those insights for advertising, personalization, you know, targeted campaign, next best offer
in a format that basically is driven by you unlocking that behavioral pattern. So for us,
you can think of it, you know, when I speak of customer behavioral pattern, everything that,
you know, relates to transactional data, content consumption patterns, search behavior, you know, click data, click stream data.
I mean, all those become signals of intent, of interest, and ultimately are sort of a rudimentary behavior, which for us, we can ingest, transform that data into a graph inside a snowflake, analyze those connections and similarity patterns across those behaviors
natively in the data warehouse.
And then in doing so, create therefore audiences around interest, you know, common interest
patterns and lookalikes and build propensity models off those behaviors.
And so the transformation uniquely, I mean, I wouldn't understate it.
And Stephen obviously, you know, put a lot of time into that transformation.
I think it was more so that we had initially architected
the underpinning technology
for the purpose of a certain data set.
What we unlocked and identified was
there was a host of first-party data applications
we could apply this tech to.
And that was sort of the initial aha moment for us
in terms of moving it into a Snowflake instance
and then Snowflake capability so that we can basically put it
and apply it to any customer behavioral data across that data set.
That's super interesting.
I have a question that, I mean, probably Stephen might have a lot to say
about that, but you're talking a lot about graph analysis
that you're doing.
Can you explain to us and to our audience a little bit
how graphs can be utilized to do analysis
around the behavior of a person
or in general, the data that you are usually working with?
Because from what I understand,
the story behind Affin is that when you started,
you were doing analytics around social graphs, right?
Where the graph is like a very natural kind
of data structure to use there. But how
this can be extended to other
use cases? Yeah, I would say
one example of that would be
in surfing patterns, like Tim had mentioned,
where essentially we can get a data set
of basically sites that people
have visited and even keywords on those
sites and other attributes related to those sites,
times that they visit them.
And essentially we can put that all together
into sort of a graph of people traversing the web.
And then we're able to use some of our scoring algorithms
on top of that.
So essentially rank and score those surfing patterns
so that we can essentially put together people
or users that look similar into a separate segment
or audience that then we can essentially pop up
and show analytics on top of,
so people can get an idea of what that group of people
enjoy visiting online or where they go
or what types of keywords they're more looking at online
based on the data set that we're working with.
I guess that would be one example of a graph related that's not social,
for example.
Can I just pick up on that, Costa, as well?
I mean, I think the thing that we see is that, you know, as Stephen alluded to, at the sort of lowest level of sort of the signals
that are being collected, you know, what we're creating in, you
know, just to liken it to a social graph, obviously, you have a follower pattern, which defines and
creates essentially the, you know, the social graph, what we're doing is sort of taking those
common behaviors is basically sort of the nodes and edges. So as Stephen alluded to, whether it
be, you know, sites that people visit, whether it be content, similar content that they're consuming,
whether it's the transactional history that looks similar to one another.
The application effectively is just how we transform, to your point, those individual events into essentially a large customer graph on first party data within the warehouse.
And then, like I said, then from there, the analytics and applications are very, very similar,
regardless of sort of whether you're analyzing
a social graph, a transactional graph,
a web surfing graph.
It ultimately comes down to sort of what your definitions
are for those nodes and edges at the core.
Yeah, and what's the added value of pursuing,
like, or trying to describe and represent this problem
as a graph instead of like, I don't know, more traditional analytical techniques
that people are using so far?
For us, it comes down to, I mean, specifically segmentation
at the core of what advertisers and marketers do on a daily basis,
the sort of cut and slice and dice data, oftentimes is restrictive
to a single event, right?
So find me the customers that bought product X, find me the customers that viewed TV show
Y, oftentimes sort of is restrictive in sort of the analytics capabilities within the scope
of that small segment.
What we're doing is we're able to take that segment, look across all their behaviors beyond them, you know, beyond that sort of initial defined audience segment.
And by compiling all those attributes simultaneously inside of Snowflake, we're actually able to uncover the other affinities beyond that.
So besides watching TV show X, right, what are the other shows that are of that audience are over indexing or sort of have high affinity besides buying product?
Why? What other products are they buying?
And those signals from a marketer's perspective starts to unlock everything from recommendation engine, next best offer,
new net new personalized customer experience recommendations in terms of recognizing that this group as a whole
has these patterns. And that's at the core, you know, when you think of it, you can certainly
achieve that in a traditional relational database, if you have two, three, 10 attributes per, you
know, per ID, when you start going into scales, you know, that we're analyzing with our technology
inside of Snowflake, you're talking about potentially hundreds of millions of IDs against tens of thousands to hundreds of thousands of attributes.
So when you actually try to surface and say, what makes this segment tick and what makes them
unique, trying to resolve that and identify the top 10 attributes of high affinity to that audience
segment is extremely complex in a relational database or relational format.
But using our technology and using graph technology, the benefit is that that can be
calculated in a matter of seconds inside the warehouse so that people like, you know,
marketing and advertisers can unlock those over-indexing high affinity signals beyond
the audience definition that they first, you know, first applied. And that helps with everything,
like I said, understanding the customer all the way through to, you know, things like next best offer,
as well as sort of media, you know, media platforms of high interest.
Right. That's, that's super, super exciting for me. I have, I have a question that's more of like
a product related question, not much technical, but how do you expose this kind of structure to your end user, which
from what I understand is a marketeer, right?
And I would assume that like most of the marketeers don't really think in terms of graphs, or
it's probably like something a little bit more abstract in their heads.
Can you explain to me how you managed to expose all this expressivity that a graph can offer
to this particular problem to a
non-technical person like a marketeer?
Yeah, no, for us, I mean, it's a great question.
For us, a lot of what we created eight years ago, and even the momentum on our social application
eight years ago, was sort of the simplicity, identifying those over-indexing signals,
the ability to sort of do unsupervised clustering on those underpinning behaviors to unlock what I
would deem sort of these data-driven personas. And so we've been, we put a lot of energy into,
you know, trying to restrict how much data you surface to your end user and trying to simplify
it based on their objective.
And so, you know, a key element to that and recognizing that within the framework of these applications that we've built inside Snowflake, our end user actually does not get exposed,
you know, to the underpinning, you know, graph based transformation and all the magic that's
happening inside of Snowflake.
What they do get exposed to and what our algorithm is able to do
is essentially surface in rank order
the importance of those attributes
and place those into a simple UI experience.
And the benefit at the end of the day
is that because all these analytics
are running natively inside Snowflake,
any application that has a direct connector
to Snowflake can essentially query
and pull back these aggregate insights. So think of that from a direct connector to Snowflake can essentially query and pull back
these aggregate insights. So think of that from, you know, from a standard BI application that has
a standard, you know, connector into Snowflake with very little effort, they can essentially
leverage the intelligence that we've built inside of Snowflake and pull forward essentially, you
know, based on an audience segment definition, you know, the over-indexing affinities in rank order
for that particular population.
So I think the challenge for us, I think you nailed it.
For many in the marketing field,
graph technology is not one of their primary backgrounds
and certainly not, if you ask them,
how would you use a you know, a standard,
you know, graph database, that's not something that, you know, most people are thinking about. What they are, though, thinking about and thinking hard about is, again, it's these simple
definitions of like, what are the other things or what are the things that make an audience segment
unique, make them tech, make them behave the way they behave. And unless you sort of approach that
problem statement with a graph-based technology under the hood, it's extremely complicated,
extremely challenging. And for many organizations we work with, you know, they talk about the fact
that what we're unlocking inside the warehouse in a matter of seconds would traditionally have
taken, you know, a data science team or an analyst team oftentimes, you know,
days, if not weeks to try to unlock. And so it's, for us,
it becomes sort of scalability. It's the,
it's the repeatability of these types of questions that, you know,
guys like Eric, I'm sure live and breathe every day is like,
what makes a unit of an audience tech, right.
And whether that is like of the people who churn,
what are the over-indexing signals
so that we can plug those holes in the product,
whether that's of the high value customers,
what makes their behavior on our platform unique?
Those are the things that we're trying to unlock
and uncover for a non-technical end user, right?
Because that is their daily activity
is they have to crack that nut on a daily basis
in order to achieve their KPIs.
And so that's what we're most excited about is we, you know, I think Stephen and I sort of eight
years ago, graph technology certainly as it pertained to applications and marketing was
really still very, very new. I would still say it's still very, very nascent. But I mean,
I think it's sort of coming of age because as we grow the data assets inside of things like, you know, Snowflakes Data Warehouse, unless you can sort of analyze across the entire breadth of that data asset and unlock in sort of an automated way these key signals that sort of make up an audience, the challenge will always be the same.
And the challenge is going to get worse, right, because we're not making data sets smaller, we're making them larger.
And so the complexity and challenge associated with that
just increases with time.
And for us, like, that's what we're trying to,
we're trying to trivialize and say,
listen, there's repeatable requests
to a marketing analyst and to a marketing team
and to an advertiser and a media buyer.
And dominantly, they're affinity-based questions,
whether people recognize it or ask it as such.
But a lot of the times, that's exactly what it is.
Of the person who just signed up on our landing page, right?
Like, what should we offer them, right?
What other signals can we, you know, what kind of signals influence what we recommend to them, how we manage them, how we manage the customer experience, how we personalize content. So those types of questions we see on a daily basis are trying to be addressed by marketing teams,
many of whom who don't have direct access,
obviously, to the raw data.
And that's why a lot of our technology
natively inside of Snowflake
is sort of unlocking the ability for them
to do that in aggregate
without ever being exposed to private or low-level data.
That's amazing.
I think that's one of the reasons
that I really love working
with these kinds of problems and engineering in general. This connection of something so abstract
as a graph is to a real life problem, like something that a marketeer is doing every day.
I think that's a big part of the beauty behind doing computer engineering, and I really enjoy that. But I have a more technical question now.
I mean, we talked about how we can use these tools to deliver value to the marketing community.
So how did you go from a columnar database system like Snowclick into a graph processing system?
How did you do that?
How will you bridge these two different data structures at the end?
From one side, you have more of a tabular way of representing the data
or a columnar way of representing the data.
And on the other hand, you have something like graph.
So how do these two things work together?
Yeah, so basically what we end up doing is we have some secure functions
in an Arsenal account that we share over to the customer.
And then what that does is it gives them a shared database, which includes a bunch of secure functions that we've developed.
And then we essentially work with the customer to give them either predetermined functions or queries that they will run on top of their data based on the, I guess,
structure of their tables. And the queries that we give to them essentially will pass their raw
data in through our encoder is what we call it. And that will output this new data into a new table.
And that really just looks like a bunch of garbage if you look at it in Snowflake. It's mostly binary
data, but it's a
probabilistic data structure that we store our data into. And then with that probabilistic data
structure, they can then use our other secure functions, which is able to analyze that graph
based data and output all of the insights that Tim was mentioning before. Essentially, you just feed
in a defined audience that you want to analyze, and it will run all the processing in the secure function
on top of that probabilistic data structure
and then output all of the top attributes and scores
for the audience that they're analyzing.
Oh, that's super interesting.
Can you, Stephen, share a little bit more information
about this probabilistic data structure?
Yeah, it's essentially putting it in a privacy-safe format that basically is feeding in all the IDs with different attributes that they want to be able to query against, essentially, and using some hashing techniques to essentially compute this new structure that is then able to be bumped up
against other encoded data sets of the same format.
And then once you mash them together, essentially, you can use some algorithms that we have in
our secure function library.
And from there, we can get all kinds of things like intersections, overlaps, unions of all
kinds of sets.
It's basically doing a bunch of set theory on these different data structures in a privacy
secure way.
Yeah, that's super interesting.
And there's, I mean, there's a big family of database systems, which is actually graph
databases, right?
So from your perspective, why it's better to implement something like what you described,
like compared to getting the data from a table, like on Snowflake,
and feeding it to a more traditional, let's say, kind of graph system?
I think the main benefit of doing it this way is they don't need to make a copy of their data
and they don't need to move their data.
It essentially all stays in one place.
Yeah, and I would just add to that, Kostas, as well, right?
I mean, when we speak of, you know,
the benefits of sort of Snowflake's underpinning architecture
and the concept of sort of not moving data,
for us, you know, what we're not trying to do
is sort of replicate all functionality of a graph database.
There's obviously applications in which case, you know,
that is absolutely suitable and reasonable to do
an entire sort of copy of a data set
and run that type of analytics
inside the warehouse.
But what we're trying to do
is take the applications
relative to marketing and advertising,
productize them in a format
that does not require that
and still leaves the data
where it is inside a snowflake,
you know, provides this level
of sort of anonymization.
And I would also highlight the fact that Stevens code
that does the encoding of that new data structure
also enables out of five to one data compression format,
which also supports basically more queries
for the same price when it comes down
to this affinity-based querying structure.
Yeah, that's very interesting.
This discussion that we are having about the comparison
between having a more niche kind of system around graph processing
and the general kind of graph database,
it's something that reminds me a little bit of something
that happens also here at RadarStack from an engineering point of view
because we have built like part of
our infrastructure needed some capability similar to what Kafka offers, right? But instead of like
incorporating Kafka in our system, we decided to go and like build part of this functionality over
Postgres in a system that's like tailor-made for exactly our needs. And I think that finding this trade-off
between a generic system towards a function
and something that is tailor-made for your needs,
it's like what makes engineering as a discipline
super, super important.
I think at the end, this is the essence
of making engineering choices
when we're building complex systems like this,
trying to figure out when we should use something more generic as a solution or when we should get a subset of this
and make it tailor-made for our problem that we are trying to solve. And that's extremely
interesting. I love that I hear this from you. We had another episode in the past with someone
from Neo4j and we were discussing about almost this, because if you think about it, like a graph database at the end
is a specialized version of a database, right?
Like at the end, database system Postgres can replicate
the exact same functionality that a graph database system can do, right?
But still, we are focusing more on a very narrowly defined problem,
and we can do it even more, and that's what you've done.
And I find like a lot of beauty behind this. So this is great to hear from you guys. I think it's also interesting just
picking up on that in terms of the decision around like, when do you optimize versus sort of,
you know, leave it generic. I mean, for us, you know, a big part of that, you can also see,
obviously, in market, right, there's, you know, machine learning and sort of, you know, machine
learning platforms that can, you know, have a host of different models can be used for you know a
host of different things through the swiss army knife application within you know an organization
for us anyway when when those custom requests come in from teams absolutely like those types
of platforms make a lot of sense because your data science team has to go in. It's sort of probably a custom model and a custom question that's being answered.
I think for us specifically, when it comes time to actually building an optimized solution, something that actually be building a custom model every time or should you actually push that workload into the warehouse?
And that's for us anyway, that's been a specific focus is like for those applications of those requests that you can have the marketer self-serve and get the answers they need in seconds, as opposed to putting it on the data science team backlog.
Those are the applications for us that we're sort of focused on and actually pushing in
and optimizing.
Yeah, yeah, I totally agree.
So last more technical question from my side.
You mentioned that the way that the system works right now is you get the raw data that
someone has stored in the Snowflake and you have some kind of encounters or transformers that transform this data into these probabilistic data structures
that you have. Do you have any kind of limitations in terms of what data you work with? Do you have
some requirements in terms of the schema that this data should have? And what's the pre-processing
that the user has to do in order to utilize the power of your system?
Yeah, so if it's essentially rectangular form data, it's pretty easy to ingest into the
encoder.
We have a UI that will do that for you.
But if there are some weird things about the data that wouldn't be typical, we can actually
work with them.
If they give us an example of what the data looks like, we can essentially craft a encoding query for them.
They just feed everything through,
and that will still end up in the right way
to go into our encoder
and still end up in the essentially probabilistic
graph format that we use.
So we haven't currently run into any data set
that we haven't been able to encode,
but yeah, it seems to be pretty generic at this point.
And is this process something that the marketeer is doing,
or there's some need for support from the IT
or the engineering team of a company?
We usually work with the IT at that stage,
and then once it's encoded,
the UI will work with the data that's already encoded.
And they can also set up tasks inside of Snowflake, which will update that database
over time or that data set over time to add new records or update the data as it comes
in.
But yeah, that is not handled by the marketeer.
All right.
And is Afinio right now offered only through Snowflake?
Is there like a hard requirement that someone needs to have the data on Snowflake to use the platform? It is currently cost us. I mean, we obviously went through sort of an exercise
evaluating which platform to sort of build on first. I mean, for us, it came down to two sort
of fundamental capabilities within Snowflake or probably three. I mean, the secure functions that
we're utilizing to obviously secure our IP in terms of those applications that we share over.
The ability to do the direct data sharing capability, it was sort of fundamental to that decision.
And then the third for us is obviously the cross-cloud application and the and retail and advertising space is and continues to be a good fit for our applications at this stage. for specific cloud applications. But where we are right now in terms of early market traction,
our bet is on Snowflake and the momentum that they currently have.
This is great.
And Tim, you mentioned, I think, earlier that your product is offered
through the marketplace that Snowflake has.
Can you share a little bit more about the experience that you have
with the marketplace, how important it is for your business and why?
Yeah, so I think the marketplace is still in its early stages,
you know, even with as many data partners that are already bought in.
For us, I think one of the clear challenges is that we, Affinio,
are not data providers.
So I think we're slightly nuanced within the framework of what traditionally has been built up on the data, you know, from a data marketplace or, you know, data asset perspective.
We, you know, we're positioned inside a marketplace deliberately and consciously, you know, with Snowflake because our applications sort of drive a lot of the data sharing functionality and sort of add to the capabilities
on top of that data marketplace, you know, that people can apply, you know, first, second,
third party data assets inside of Snowflake and run our type of analytics on top of it.
So for us, it's been unique in the framework of simply being positioned, obviously, almost
as a service provider inside of what otherwise is,
you know, currently positioned as a data marketplace. But recognizing that I think
over time, you'll start to see, you know, that bifurcate within Snowflake, and you will get
a separation in a unique marketplace that will be driven by sort of service providers like
ourselves alongside of, you know, straight data providers. So I think it's early stages. I think,
you know, what we're excited about is that, you know we we see a lot of our technology as being an accelerant
to many of those data providers directly and many of the ones that we've already sort of you know
started working with directly see it as see it as a value proposition and a value add to their
you know raw data asset that they may be sharing through snowflake but you'll see it as a means
with which to get more value from that data asset on the may be sharing through Snowflake, but see it as a means with which to get more value
from that data asset on the customer's behalf
by applying our application, our technology
in their Snowflake instance.
This is great.
Tim, you mentioned a few things about your decision
to go with Snowflake.
Can you share a little bit more information around that?
And more specifically, what is needed for you to consider
going to another data warehouse, cloud data warehouse,
something like BigQuery or something like, I don't know, Redshift.
What is offered right now by Snowflake that gives a lot,
tremendous value to you and makes you prefer, at this point,
build only on Snowflake?
Yeah, I think if we stood back and actually looked at where
Stephen and I sort of started off in terms of our applications
within first-party, like porting our graph technology
into first-party data, much of that was very centered
on applications and analytics specific to, you know,
and enterprises on first-party data only.
As it pertains to that model, if it was only restricted to that model, I think we would
have considered more broadly, you know, looking at doing that directly inside of any of or
all of the cloud infrastructures or cloud-based systems to begin with.
But, you know, I would say that ours is a combination of the ability to do, you know,
analytics directly on first-party data,
as well as Steven indicated, a major component of our technology that we've created inside of
Snowflake and unlocks this privacy-safe data collaboration across the ecosystem.
As a result of that, for us, the criteria in terms of selecting Snowflake was, again, the ability to leverage secure UDFs and secure functions to sort of lock and protect our IP that we're sharing into those instances.
But the second major component is sort of the second half of our IP, which is effectively this privacy safe data collaboration, which basically is powered by the, you know, the underpinning data sharing capability of Snowflake.
And so if and when sort of reviewing or evaluating other applications or other providers in terms
of context of where we report this next, I would say that that's sort of the lens that
we look through, right?
Is like, can we unlock the entire capability across this privacy safe data collaboration and analytics capability in a similar way that we've done it on Snowplate?
Because to me, that is the primary reason why we picked that platform.
Yep.
And one last question for me, and then I'll leave it to Eric.
And it's a question for both of you guys, just from a different perspective.
You've been around quite a while.
I mean, Afinio, as you said, we started like eight years ago.
That was pretty much like, I think, the same time that Snowflake also started.
So you've seen a lot around the cloud data warehouse and its evolution.
How things have changed in these past eight years, both from a business perspective, and
this is probably something more of a question for your team, and also from a technical perspective, how the landscape has changed.
Yeah, I think it's absolutely interesting, you know, the point that you're making. I mean,
I first learned of Snowflake directly from, you know, customers of ours who were sort of
at the time asking us specifically about, you know, the request is very simple. They said,
we love what you're doing with our social data. We would love it natively in Snowflake.
And that was honestly the first time we had sort of learned of that application many, many years
ago. But what I would say is that, you know, as far as the data warehouse is advanced from a
technical perspective, I think for us anyway, it still sort of belongs or certainly has its stronghold
directly in the CDO, CIO, CTO offices within many of these enterprises. What I expect to see and
what I think we're sort of helping drive and pioneer with what we've built on the marketing
advertising is sort of the value of the asset being stored inside of the data
warehouse has to become more broadly applicable and accessible across the organization beyond
what traditionally has been locked away to high-influencer required data science teams.
Because I think the value that needs to be tapped across global enterprises cannot funnel
directly through just a single team all the time.
And I think what we will see, and certainly I think as early stages are starting to see,
is awareness by other departments inside the enterprise of even where their data is stored,
quite honestly.
I mean, there's still conversations we're having with many organizations in the marketing
realm who have no idea where their data is stored, right?
So I think familiarity and comfort level
associated with sort of that data asset, how to access it, what they can access, how they can
utilize it, it will become the future of sort of where the data warehouse is going to go. But I
think we're still a long way there. There's still a lot of education there, but we're excited about
that opportunity specifically from the business perspective. Yeah, and on the tech side of things,
I would say the biggest changes are probably around
the whole privacy stuff that has changed over the years where you have to be a lot more
privacy aware and secure.
And basically working with Snowflake makes that a lot easier for us with the secure sharing
of code and secure shares of data as well.
So using that with our code embedded directly into them,
we can be sure that customers using this, their data is secure. And even if they're sharing data
over to other customers, it's secure to do that as well. This is great, guys. So Eric, it's all yours.
We're closing in on time here, but I do have a question that I've been thinking about really since the beginning.
And it's taking a step back.
So Kostas asked some great questions about why Snow this episode, sort of the next phase of data warehouse
utilization. And I'll explain what I mean a little bit. So a lot of times in the show,
we'll talk about major phases that technology goes through. And in the world of technology and data,
warehouses are actually not that old. You know, you have sort
of Redshift being the major player fairly early on, and then, you know, Snowflake hitting general
availability, I think, in 2014. But even then, you know, they were still certainly not as widespread
as they are now. And the way that we describe it is, we're currently living in the phase of
everyone's trying to put a warehouse, you know, sort of in the center of their stack and collect all of their data and do the things that, you know, sort of the, you know, marketing analytics tools have talked about for a long time where it's like get a complete view of the customer.
And everyone sort of realized, okay, I need to have a data warehouse in order to actually do that. And that's
a lot of work. And so we're in the phase where people are getting all of their data in the
warehouse. It's easier than ever. And we're doing really cool things on top of it. But I would
describe Affineo in many ways as almost being part of the next phase. And Snowflake is particularly
interesting here where let's say you collect all of your data.
Now you can combine it with all other sorts of things native, which is, you know, sort of an entire new world, right?
There are all sorts of interesting data sets in the Snowflake marketplace, et cetera.
But most of the conversation and most of the content out there actually is just around how do you get the most value out of your warehouse by collecting all of your data in it and doing interesting things on top of it.
And so I just love your perspective. Do you see sort of the same major phases? Are we right in
terms of being in the phase where people are still trying to collect their data and do interesting
things with it? And then give us a peek as a player who's, you know, sort of part of the
marketplace, part of the third party connections,
but being able to sort of operationalize natively inside your warehouse, what is that going to look like? I mean, marketing is an obvious use case, but I think it's going to be, you know, in the
next five years, that's going to be a major, major movement in the world of warehouses. Sorry,
that was long winded, but that's, that's my, that's what's been going through.
No, no, no, I totally, I mean, i totally i i mean it's sort of it's sort
of the stuff that we think about and talk about on a daily basis what i think i think you're you're
right on i think you know obviously the world has already woken up to the sort of fact that like
gathering collecting owning and managing all customer data in one location is going to be
critical in the future right i would say covet has woken the world up to that in terms of you know as
as many of us, you know,
have heard and seen is that, you know, COVID is, you know,
no better driver for digital transformation
than, you know, a pandemic.
So, but at the same time, I completely agree with you.
What I think personally, and I sort of just given
sort of what we're creating within these sort of,
you know, native applications inside of Snowflake,
I think you will start to see an emergence
of privacy-safe SaaS applications
that are deployed natively inside the warehouse.
I think you will see literally a transformation
of how SaaS solutions are being deployed.
And I think what you'll see is organizations like Affinio
who have traditionally hosted data
on behalf of customers
and provided web-based logins
to access that data
that's stored by the vendor.
I think you'll see
and continue to see a movement
where the IP
and the core capabilities
and the technologies of these vendors
will begin to start to port natively
into Snowflake. I believe that
Snowflake itself will actually start to find ways to find attribution around the compute and value
that those vendors like ourselves and the applications that are driving inside of the
warehouse. And I think you'll see just naturally extend into, you know, rev share models, where
for the enterprise, you know, you sign on to Snowflake, you have all these native app options
that you can turn on automatically, that basically allows you not only to reap more benefit, but just
get up to speed and make your data more valuable faster, right. And I think I honestly, you know,
Steve and I've talked about this for some time now. We honestly see that, you know, in the next 10 years, there'll be a transition.
And certainly maybe it probably won't eliminate the old model, but you'll see a new set of vendors that will start building in a native application format right out of the gate.
And that, I think, will transform the traditional SaaS landscape.
Yeah, absolutely. And a follow-on to that. So
when you think about data in the warehouse, you can look at it from two angles, right? The
warehouse is really incredible because it can really support, you know, any, well, not necessarily
any kind of data, right? But data that conforms to any business model, right? So B2C, B2B, et cetera.
It's sort of agnostic to that, right? Which makes it sort of fully customizable and
you can set it up to suit the needs of your business. So in some sense, everyone's data
warehouse is heavily customized. When you look at it from the other angle though, from this
perspective of third-party data sets and something that Kostas and I talk a lot about, which is sort of common schemas
or common data paradigms, right? If you look across the world of business, you have things like
Salesforce, right? Salesforce can be customized, but you have sort of known hierarchies, you know,
lead contact account, et cetera. Do you think that sort of the standardization of
those things or market penetration of sort of known data hierarchies and known schemas
will help drive that? Or is everyone just sort of customizing their data and that won't really
play a role? Yeah, that's a great question. I mean, it's conversations we've had with other
vendors, you know, and many of our customers relative to what they perceive as sort of
beneficial to, you know, many CDPs and market to your point, Eric, right? Like where the fixed
taxonomies and schemas basically enable, you know, an ecosystem and an app ecosystem and sort of
partner ecosystem to build easily on that schema on top of that. Yeah, completely.
You know, I would say that, you know,
I think it's still early to see how that actually, you know, comes about.
What I would say is that I think you will start seeing organizations sort of adopt many aspects within Snowflake and within their warehouse
of sort of, you know, best of breed schemas for the purpose of, you know,
as I would say, as I see this sort of application you know, best of breed schemas for the purpose of, you know, as I would say,
as I see this sort of application space build out, it's kind of the way that it has to scale,
right? So both from a partner and sort of marketplace, you know, marketplace play,
as well as, you know, the plug and play nature of how you want to deploy at, you know, this at
scale. I mean, ultimately, the game plan would be that, again, all these apps sort of run natively, you could turn them on, they already know what the scheme is behind the scenes,
and they can start running. As Stephen alluded to, there's obviously at this stage, a lot of sort of
handholding at the front end, until you sort of get those, you know, schemes established and are
encoded into a format that's, you know, queryable, etc. So I think, I think what you'll start to see
is sort of best of breed, you know, bridging across into Snowflake would be my assumption that I would say the more that you
see people sort of leveraging Snowflake as a, you know, build your own format of Snowflake,
it's kind of required, right? And I wouldn't be surprised to see that some elements of that be
adopted across into sort of, you know, best of class and best of breed within Snowflake directly for that purpose? Sure. Sure. Yeah, it is, it's kind of,
it's fascinating to think about a world where, you know, today you kind of have your set of,
your core set of tooling, right. And sort of core set of data and you build out your stack by
just making sure that things can integrate in a way that makes sense for your
particular stack, which, you know, in many cases requires a lot of research, et cetera.
And it's really interesting to think about the process of architecting a stack where you just
start with the warehouse and you make choices based on best of breed schemas. And, you know, at that point, the tooling is heavily abstracted, right?
Because you are basically choosing time to value in terms of best of breed schemas.
Super interesting.
Yeah, completely.
All right.
Well, we're close to time here.
So I'm going to ask one more question.
And this is really for our audience and anyone who might be interested in the Snowflake ecosystem. What's the best way to get started
with sort of exploring third-party functionality in Snowflake? I mean, Affinio, obviously,
really cool tool, check it out. But for those who are saying, okay, we're kind of at the point where we're unifying data and we want to think about augmenting it, you know, where do people go? What would you
recommend as the best steps in terms of exploring the world of doing stuff inside of Snowflake
natively, but with third-party tools and third-party data sets? I think it all starts with,
from our perspective, you know, many of the conversations we have with prospects and,
you know, customers is around sort of what questions are sort of the repeatable ones you
want to get addressed and want to answer it. And in combination with that, obviously, a key element
to what, you know, these types of applications enable is from a privacy perspective, it sort of
unlocks the ability to answer those types of questions by more individuals across the organization.
So many of the sort of starting points for us ultimately comes down to what are
those repeatable, you know, repeatable questions and repeatable work, you know, workloads that
you'd like to have, you know, trivialized and basically sort of plug and play inside of the
warehouse that would speed up what otherwise oftentimes, you know, is a three week wait time
or a three week model or a three week answer. And so I think, you know, is a three-week wait time or a three-week
model or a three-week answer. And so I think, you know, for us, that's where we start with most of
our prospects and discussions. And I would think, you know, for those thinking about or contemplating
that, that's a great place to start is sort of recognizing that this isn't for, you know, this
isn't the, you know, the silver bullet for to address all questions or all problems. But for
those that are
sort of rinse and repeat and repeatable, these types of applications are very, very powerful.
Love that. That's just thinking back to my consulting days when we were doing lots of
analytics or even sort of tool choice for the, for the stack. Always starting with the question,
I think is just a really, I think that's just a generally
good piece of advice when it comes to data. Well, this has been a wonderful conversation.
Tim, Steven, really appreciate it. Congrats on your success with Affineo, really cool tool. So
everyone in the audience, check it out. And we'd love to have you back on the show in another six
or eight months to see how things are going. Yeah, I would love to. Thanks very much. As always, a really interesting conversation. I think that one thing
that stuck out to me, and I may be stealing this takeaway from you, Costas, so I'm sorry,
but I thought it was really interesting how they talked about the interaction of graph with sort
of your traditional rows and columns warehouse in the paradigm of nodes with sort of your traditional, you know, rows and columns
warehouse in the paradigm of nodes and edges. That's something that's familiar to us, you know,
relative to identity resolution, you know, sort of in the stuff that we're, that we work on in the
world familiar with. And so kind of breaking down that relationship in terms of nodes and edges,
I think was a really helpful way to think about how they interact with Snowflake data. Yeah, yeah, absolutely. I think this part of
the conversation where we talked about different types of representation of the data and how
its representation can be more well suited for specific types of questions, it was great. And
if there's something that we can keep out of this is that
there's this kind of concept of the data remains the same at the end. What is expressed as part
of the data, it's the same thing, right? It doesn't matter if you represent as a graph,
as a table, or at the end as a set. Because if you notice like the conversation that we had
at the end, they end up representing the graph using some probabilistic
data structures that at the end represent sets and they do some set operations there to perform
their analytics and that's from a technical perspective is very interesting and i think
this is big part of what actually computer engineering and computer science is about right
like how we can transform from one representation to the other and what kind of expressivity these representations are giving to us. Keeping in mind that at the end,
all these are equivalent, right? Like the type of questions that we can answer are the same. It's
not like something new will come out from the different representation. It's more about
the ergonomics of how we can ask the questions, how more natural the questions fit to these models and structures,
and in many cases also around efficiency.
And it's super interesting that all these are actually built
on top of a common infrastructure,
which is the data warehouse, and in this case, Snowflake.
And that's like a testament of how often open platform Snowflake is.
Although, I mean, in my my mind at least it's like
pretty much the only other system that i have heard of being so flexible is like postgres
but postgres like a database that exists for like forever like like 30 years or something like
snowflake is a much much uh younger product but still they have managed to have like an
amazing amazing velocity when it comes like to building the product and the technology
behind it. And I'm sure that if they keep with that pace,
we have many things to see in the near future,
both from a technical and business perspective.
Great. Well,
thank you so much for joining us on the show and we have more interesting data
conversations coming for you every week, and we'll catch
you in the next one. at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you
by Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at
rudderstack.com.