The Data Stack Show - 40: Graph Processing on Snowflake for Customer Behavioral Analytics

Episode Date: June 16, 2021

Highlights from this week’s episode include:Launching Affinio and the engineering backgrounds of the co-founders (2:36)The massive transformation in customer data privacy regulation in the past eigh...t years (6:23)Creating the underpinning technology that can apply to any customer behavioral data set (10:05)Ranking and scoring surfing patterns and sorting nodes and edges (14:13)Placing the importance of attributes into a simple UI experience (19:28)Going from a columnar database to a graph processing system (25:20)Working with custom or atypical data (32:46)The decision to work with Snowflake (37:43)Next steps for utilizing third-party tools within Snowflake (52:18)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Welcome back to the Data Stack Show. A really interesting guest today.
Starting point is 00:00:34 We have Tim and Steven from a company called Affinio. And here's a little teaser for the conversation. They run in Snowflake. They have a direct connection with Snowflake, but they do really interesting marketing and consumer data analytics, both for social and for first party data using graph, which is just a really interesting concept in general. And I think one of the, one of my big questions, Costas, is around the third party ecosystem that is, that's being built around Snowflake. And I think that's something that is going to be really, really big in the next couple of years. There are already some major
Starting point is 00:01:10 players there, and we see some enterprises doing some interesting things there. But in terms of mass adoption, I think a lot of people are still trying to just sort of get their warehouse implementation into a good place and unify their data. So I want to ask about that from someone who is playing in that third-party Snowflake ecosystem. How about you? What are you interested in? Yeah, Eric, I think this conversation is going to have a lot of Snowflake in part of it. One thing is what you talked about, which has to do more with the ecosystem around the data platforms like Snowflake. But the other and more technical side of things is how you can implement
Starting point is 00:01:47 these sophisticated algorithms around graph analytics on top of a columnar database like Snowflake. So yeah, I think both from a technical and a business perspective, we are going to have a lot of questions around how Athenio is built on top of Snowflake. And I think this is going to be super interesting. Cool. Well, let's dive in. Tim and Steven, welcome to the Data Stack Show. We're really excited to chat data warehouses. And personally, I'm excited to chat about some marketing stuff because I know you play in that space. So
Starting point is 00:02:20 thanks for joining us. Yeah, excited to be here. Thanks for having us. We'd love to get a little bit of background on each of you and just a high-level overview of what Affinio, your company, does for our audience. Do you mind just kicking us off with a little intro? Yeah, absolutely. I'd be happy to. So pleasure being on, guys. And realistically, just to give you a quick sense of sort of what Affinio is all about, a little bit of background.
Starting point is 00:02:52 So we've created Affinio about eight years ago and started off with a really simple concept where eight years ago, Steve and I happened to be running a mobile app B2C company. And instead of looking at social media based on, you know, to see what people are talking about our brand, we started off with a really simple experiment of looking at who else our followers on social were following. And that afternoon, we sort of aggregated that data and saw sort of the compelling opportunity against this sort of interest in affinity graph that nobody seemed to be using or utilizing for basically advertising and marketing applications. And we thought it was just a huge opportunity. So we had doubled down and created what continues to be our core intellectual property, which is a custom built graph analytics engine under the hood. And what we've done is over those eight years, we basically leveraged, you know, analyzing essentially social data as a starting point. enterprise customers really excited about what they could unlock from both insights and
Starting point is 00:03:45 actionability against the data that we were providing them with, as well as basically using our technology. So over the last two years, we made a conscious effort to double down and start porting a lot of that core graph technology directly into Snowflake. And most recently, and we're just about to announce sort of the release of four of our four essentially apps inside the Snowflake marketplace that enable organizations to essentially use our graph technology directly on their data without us ever seeing the analytics and without us ever seeing the output so it's in a completely private format all leveraging sort of the secure function capability and Snowflake and the data sharing capability. So super excited to be here. We're obviously huge fans of both Snowflake as
Starting point is 00:04:30 well as sort of warehouse first approaches. And we think the opportunity between Affinio and Rudderstack is a great compliment. Very cool. And Tim, do you want to just give a quick 30 second or one minute background on you personally? Yeah, certainly. So Tim Burke, CEO of Affinio. My background is actually mechanical engineering. Steven was on the show and my CTO and co-founder. We've been working together for 12 years now, both engineers by trade. He's electrical, I'm mechanical. I do a lot of the biz dev and sales work within Affinio, obviously from my position, a lot of customer-facing activities and all that. Stephen, introduce himself. Yeah, I'm Stephen Hankinson, CTO at Affinio. Like Tim said, I'm an electrical engineer, but I've been writing code since I was about 12 years
Starting point is 00:05:15 old and just really enjoy working with large data, big data and solving hard problems. Very cool. Well, so many things to talk about, especially Snowflake and sort of combining data sets. And that's just a fascinating topic in general. But one thing that I think would be really interesting for some context. So Affinio started out providing graph in the context of social. And one thing I'd love to know, so you started eight years ago, and the social landscape, especially around sort of privacy and data availability, etc, has changed drastically. And so I'm just out of out of pure curiosity, I'm interested to know, you know, what were the kinds of questions that your customers wanted to answer eight years
Starting point is 00:06:06 ago when you sort of introduced this idea? And then how has the landscape impacted the social side of things? I know you're far beyond that, but you have a unique perspective in dealing with social data over a period where there's just been a lot of change, even from a regulatory standpoint. Absolutely. I would say you nailed it on the head. It's been sort of a transformational period for data privacy, customer data privacy. And that, you know, first and foremost, has probably been one of the biggest impacted areas has been, you know, social data as a whole. So we've definitely seen a massive transition, right? I mean, I would say that a lot of that transition over the last
Starting point is 00:06:45 few years is partially, you know, a change in our focus for that exact reason, right? Recognizing that deprecations in public APIs, deprecation sort of available, you know, privacy aspects of that data availability across social has changed drastically, right? And so for us, it was been, you know, we've been sort of, you know, first, you know, at front of the line watching all this happen in real time. But for us, the customers at the end of the day are still trying to solve the same problem. It's how do I understand, learn more about my customers such that I can, you know, service them better, provide better customer experience, find more of my high value customers, like net net, I don't think the challenges change. I think
Starting point is 00:07:30 the assets against which those, you know, the data assets against which those customers are actually leveraging to find those answers is going to change and has been changing, right. And so what we're trying to do is our move from sort of our legacy social product, much of the time was addressing deeper understandings of the interest profiles and rich sort of interest profiling of large social audiences is kind of where we got started. valuable assets or valuable insights for a marketer, because when you understand the scope and depth of, you know, your audience's interest patterns, you can basically leverage that for where to reach them, how to reach them, how to personalize content, knowing what offers they're going to want to, you know, click through to. And I don't think that's actually changed, right? I think that what people are recognizing more, more so than anything, and obviously, you guys would see this firsthand as well, is,
Starting point is 00:08:25 you know, many of those data assets that I think many organizations were willing to either have vendors collect on their behalf or own on their behalf, it has changed drastically. And now it's requiring basically these enterprises and organizations to own those data assets and be able to do more with them. And so what I would say is what we're seeing sort of firsthand is the markets come around to recognizing the need to collect a lot of first-party data. Many organizations have obviously put a lot of effort and a lot of energy and a lot of resource behind sort of creating that opportunity within their enterprise. But I would say, quite honestly, what we see is that there's a lack of sort of ability
Starting point is 00:09:05 to make meaningful insight and actionability from those large data sets that they're creating. So that's kind of what our focus is on is sort of trying to enable the enterprise to be able to unlock at scale applications no differently than what we've done previously on massive social data assets, but in this time on their first party data and natively inside Snowflake, you know, privacy first format. Super interesting. And just one more follow-up question to that. I'm at risk of nerding out here and stealing the microphone from Costas for a long period, which I've been known to do in the past.
Starting point is 00:09:37 In terms of graph, was the transition from sort of third party social data to accomplishing similar things on your first party data on Snowflake, was that a big technological transition? Or, I mean, I'd just love to know from an under the hood standpoint, how did that work? Because the data sets are, you know, there's similarities and differences. No, it's a great point. I mean, for those not sort of familiar with graph technology, obviously, the foundation of sort of, you know, traditional graph databases are founded on sort of transforming, you know, relational database into nodes and edges, right, and looking for essentially connectivity or analyzing the connectivity in a data asset. So our underpinning data technology, which Stephen created firsthand, is this custom-built graph
Starting point is 00:10:33 technology. It analyzes data based on that premise. Everything is a node, everything is an edge. At that primitive level, it enables us of ingest and analyze any format of, you know, customer data without having to do drastic changes to the underpinning technology. And so what I would highlight is that we've, you know, we're the most compelling data assets that we can analyze and the most compelling insights you can gather typically are driven by customer behavioral patterns, right? So unlike traditional, I would say, demographic data, which has its utility and obviously always has in a marketing and advertising application, but I would argue that demographics has traditionally been used
Starting point is 00:11:17 as a proxy to a behavioral pattern, right? And what we see and what we see the opportunity to unlock is that if you're analyzing and able to uncover patterns inside of raw customer behavioral, which ultimately are simply a surrogate to that underpinning behavior you're looking to change. What we're seeing and what we see as an opportunity is across these massive data sets that are basically being pulled into Snowflake and aggregated in Snowflake. When you start to analyze those behaviors at the raw level and unlock patterns across a massive number of consumers at that level, you can then start actioning on that and leveraging those insights for advertising, personalization, you know, targeted campaign, next best offer
Starting point is 00:12:11 in a format that basically is driven by you unlocking that behavioral pattern. So for us, you can think of it, you know, when I speak of customer behavioral pattern, everything that, you know, relates to transactional data, content consumption patterns, search behavior, you know, click data, click stream data. I mean, all those become signals of intent, of interest, and ultimately are sort of a rudimentary behavior, which for us, we can ingest, transform that data into a graph inside a snowflake, analyze those connections and similarity patterns across those behaviors natively in the data warehouse. And then in doing so, create therefore audiences around interest, you know, common interest patterns and lookalikes and build propensity models off those behaviors. And so the transformation uniquely, I mean, I wouldn't understate it.
Starting point is 00:13:01 And Stephen obviously, you know, put a lot of time into that transformation. I think it was more so that we had initially architected the underpinning technology for the purpose of a certain data set. What we unlocked and identified was there was a host of first-party data applications we could apply this tech to. And that was sort of the initial aha moment for us
Starting point is 00:13:20 in terms of moving it into a Snowflake instance and then Snowflake capability so that we can basically put it and apply it to any customer behavioral data across that data set. That's super interesting. I have a question that, I mean, probably Stephen might have a lot to say about that, but you're talking a lot about graph analysis that you're doing. Can you explain to us and to our audience a little bit
Starting point is 00:13:47 how graphs can be utilized to do analysis around the behavior of a person or in general, the data that you are usually working with? Because from what I understand, the story behind Affin is that when you started, you were doing analytics around social graphs, right? Where the graph is like a very natural kind of data structure to use there. But how
Starting point is 00:14:09 this can be extended to other use cases? Yeah, I would say one example of that would be in surfing patterns, like Tim had mentioned, where essentially we can get a data set of basically sites that people have visited and even keywords on those sites and other attributes related to those sites,
Starting point is 00:14:27 times that they visit them. And essentially we can put that all together into sort of a graph of people traversing the web. And then we're able to use some of our scoring algorithms on top of that. So essentially rank and score those surfing patterns so that we can essentially put together people or users that look similar into a separate segment
Starting point is 00:14:49 or audience that then we can essentially pop up and show analytics on top of, so people can get an idea of what that group of people enjoy visiting online or where they go or what types of keywords they're more looking at online based on the data set that we're working with. I guess that would be one example of a graph related that's not social, for example.
Starting point is 00:15:13 Can I just pick up on that, Costa, as well? I mean, I think the thing that we see is that, you know, as Stephen alluded to, at the sort of lowest level of sort of the signals that are being collected, you know, what we're creating in, you know, just to liken it to a social graph, obviously, you have a follower pattern, which defines and creates essentially the, you know, the social graph, what we're doing is sort of taking those common behaviors is basically sort of the nodes and edges. So as Stephen alluded to, whether it be, you know, sites that people visit, whether it be content, similar content that they're consuming, whether it's the transactional history that looks similar to one another.
Starting point is 00:15:48 The application effectively is just how we transform, to your point, those individual events into essentially a large customer graph on first party data within the warehouse. And then, like I said, then from there, the analytics and applications are very, very similar, regardless of sort of whether you're analyzing a social graph, a transactional graph, a web surfing graph. It ultimately comes down to sort of what your definitions are for those nodes and edges at the core. Yeah, and what's the added value of pursuing,
Starting point is 00:16:20 like, or trying to describe and represent this problem as a graph instead of like, I don't know, more traditional analytical techniques that people are using so far? For us, it comes down to, I mean, specifically segmentation at the core of what advertisers and marketers do on a daily basis, the sort of cut and slice and dice data, oftentimes is restrictive to a single event, right? So find me the customers that bought product X, find me the customers that viewed TV show
Starting point is 00:16:52 Y, oftentimes sort of is restrictive in sort of the analytics capabilities within the scope of that small segment. What we're doing is we're able to take that segment, look across all their behaviors beyond them, you know, beyond that sort of initial defined audience segment. And by compiling all those attributes simultaneously inside of Snowflake, we're actually able to uncover the other affinities beyond that. So besides watching TV show X, right, what are the other shows that are of that audience are over indexing or sort of have high affinity besides buying product? Why? What other products are they buying? And those signals from a marketer's perspective starts to unlock everything from recommendation engine, next best offer, new net new personalized customer experience recommendations in terms of recognizing that this group as a whole
Starting point is 00:17:45 has these patterns. And that's at the core, you know, when you think of it, you can certainly achieve that in a traditional relational database, if you have two, three, 10 attributes per, you know, per ID, when you start going into scales, you know, that we're analyzing with our technology inside of Snowflake, you're talking about potentially hundreds of millions of IDs against tens of thousands to hundreds of thousands of attributes. So when you actually try to surface and say, what makes this segment tick and what makes them unique, trying to resolve that and identify the top 10 attributes of high affinity to that audience segment is extremely complex in a relational database or relational format. But using our technology and using graph technology, the benefit is that that can be
Starting point is 00:18:29 calculated in a matter of seconds inside the warehouse so that people like, you know, marketing and advertisers can unlock those over-indexing high affinity signals beyond the audience definition that they first, you know, first applied. And that helps with everything, like I said, understanding the customer all the way through to, you know, things like next best offer, as well as sort of media, you know, media platforms of high interest. Right. That's, that's super, super exciting for me. I have, I have a question that's more of like a product related question, not much technical, but how do you expose this kind of structure to your end user, which from what I understand is a marketeer, right?
Starting point is 00:19:09 And I would assume that like most of the marketeers don't really think in terms of graphs, or it's probably like something a little bit more abstract in their heads. Can you explain to me how you managed to expose all this expressivity that a graph can offer to this particular problem to a non-technical person like a marketeer? Yeah, no, for us, I mean, it's a great question. For us, a lot of what we created eight years ago, and even the momentum on our social application eight years ago, was sort of the simplicity, identifying those over-indexing signals,
Starting point is 00:19:45 the ability to sort of do unsupervised clustering on those underpinning behaviors to unlock what I would deem sort of these data-driven personas. And so we've been, we put a lot of energy into, you know, trying to restrict how much data you surface to your end user and trying to simplify it based on their objective. And so, you know, a key element to that and recognizing that within the framework of these applications that we've built inside Snowflake, our end user actually does not get exposed, you know, to the underpinning, you know, graph based transformation and all the magic that's happening inside of Snowflake. What they do get exposed to and what our algorithm is able to do
Starting point is 00:20:26 is essentially surface in rank order the importance of those attributes and place those into a simple UI experience. And the benefit at the end of the day is that because all these analytics are running natively inside Snowflake, any application that has a direct connector to Snowflake can essentially query
Starting point is 00:20:43 and pull back these aggregate insights. So think of that from a direct connector to Snowflake can essentially query and pull back these aggregate insights. So think of that from, you know, from a standard BI application that has a standard, you know, connector into Snowflake with very little effort, they can essentially leverage the intelligence that we've built inside of Snowflake and pull forward essentially, you know, based on an audience segment definition, you know, the over-indexing affinities in rank order for that particular population. So I think the challenge for us, I think you nailed it. For many in the marketing field,
Starting point is 00:21:16 graph technology is not one of their primary backgrounds and certainly not, if you ask them, how would you use a you know, a standard, you know, graph database, that's not something that, you know, most people are thinking about. What they are, though, thinking about and thinking hard about is, again, it's these simple definitions of like, what are the other things or what are the things that make an audience segment unique, make them tech, make them behave the way they behave. And unless you sort of approach that problem statement with a graph-based technology under the hood, it's extremely complicated, extremely challenging. And for many organizations we work with, you know, they talk about the fact
Starting point is 00:21:57 that what we're unlocking inside the warehouse in a matter of seconds would traditionally have taken, you know, a data science team or an analyst team oftentimes, you know, days, if not weeks to try to unlock. And so it's, for us, it becomes sort of scalability. It's the, it's the repeatability of these types of questions that, you know, guys like Eric, I'm sure live and breathe every day is like, what makes a unit of an audience tech, right. And whether that is like of the people who churn,
Starting point is 00:22:23 what are the over-indexing signals so that we can plug those holes in the product, whether that's of the high value customers, what makes their behavior on our platform unique? Those are the things that we're trying to unlock and uncover for a non-technical end user, right? Because that is their daily activity is they have to crack that nut on a daily basis
Starting point is 00:22:43 in order to achieve their KPIs. And so that's what we're most excited about is we, you know, I think Stephen and I sort of eight years ago, graph technology certainly as it pertained to applications and marketing was really still very, very new. I would still say it's still very, very nascent. But I mean, I think it's sort of coming of age because as we grow the data assets inside of things like, you know, Snowflakes Data Warehouse, unless you can sort of analyze across the entire breadth of that data asset and unlock in sort of an automated way these key signals that sort of make up an audience, the challenge will always be the same. And the challenge is going to get worse, right, because we're not making data sets smaller, we're making them larger. And so the complexity and challenge associated with that just increases with time.
Starting point is 00:23:29 And for us, like, that's what we're trying to, we're trying to trivialize and say, listen, there's repeatable requests to a marketing analyst and to a marketing team and to an advertiser and a media buyer. And dominantly, they're affinity-based questions, whether people recognize it or ask it as such. But a lot of the times, that's exactly what it is.
Starting point is 00:23:48 Of the person who just signed up on our landing page, right? Like, what should we offer them, right? What other signals can we, you know, what kind of signals influence what we recommend to them, how we manage them, how we manage the customer experience, how we personalize content. So those types of questions we see on a daily basis are trying to be addressed by marketing teams, many of whom who don't have direct access, obviously, to the raw data. And that's why a lot of our technology natively inside of Snowflake is sort of unlocking the ability for them
Starting point is 00:24:16 to do that in aggregate without ever being exposed to private or low-level data. That's amazing. I think that's one of the reasons that I really love working with these kinds of problems and engineering in general. This connection of something so abstract as a graph is to a real life problem, like something that a marketeer is doing every day. I think that's a big part of the beauty behind doing computer engineering, and I really enjoy that. But I have a more technical question now.
Starting point is 00:24:47 I mean, we talked about how we can use these tools to deliver value to the marketing community. So how did you go from a columnar database system like Snowclick into a graph processing system? How did you do that? How will you bridge these two different data structures at the end? From one side, you have more of a tabular way of representing the data or a columnar way of representing the data. And on the other hand, you have something like graph. So how do these two things work together?
Starting point is 00:25:20 Yeah, so basically what we end up doing is we have some secure functions in an Arsenal account that we share over to the customer. And then what that does is it gives them a shared database, which includes a bunch of secure functions that we've developed. And then we essentially work with the customer to give them either predetermined functions or queries that they will run on top of their data based on the, I guess, structure of their tables. And the queries that we give to them essentially will pass their raw data in through our encoder is what we call it. And that will output this new data into a new table. And that really just looks like a bunch of garbage if you look at it in Snowflake. It's mostly binary data, but it's a
Starting point is 00:26:05 probabilistic data structure that we store our data into. And then with that probabilistic data structure, they can then use our other secure functions, which is able to analyze that graph based data and output all of the insights that Tim was mentioning before. Essentially, you just feed in a defined audience that you want to analyze, and it will run all the processing in the secure function on top of that probabilistic data structure and then output all of the top attributes and scores for the audience that they're analyzing. Oh, that's super interesting.
Starting point is 00:26:40 Can you, Stephen, share a little bit more information about this probabilistic data structure? Yeah, it's essentially putting it in a privacy-safe format that basically is feeding in all the IDs with different attributes that they want to be able to query against, essentially, and using some hashing techniques to essentially compute this new structure that is then able to be bumped up against other encoded data sets of the same format. And then once you mash them together, essentially, you can use some algorithms that we have in our secure function library. And from there, we can get all kinds of things like intersections, overlaps, unions of all kinds of sets.
Starting point is 00:27:22 It's basically doing a bunch of set theory on these different data structures in a privacy secure way. Yeah, that's super interesting. And there's, I mean, there's a big family of database systems, which is actually graph databases, right? So from your perspective, why it's better to implement something like what you described, like compared to getting the data from a table, like on Snowflake, and feeding it to a more traditional, let's say, kind of graph system?
Starting point is 00:27:55 I think the main benefit of doing it this way is they don't need to make a copy of their data and they don't need to move their data. It essentially all stays in one place. Yeah, and I would just add to that, Kostas, as well, right? I mean, when we speak of, you know, the benefits of sort of Snowflake's underpinning architecture and the concept of sort of not moving data, for us, you know, what we're not trying to do
Starting point is 00:28:18 is sort of replicate all functionality of a graph database. There's obviously applications in which case, you know, that is absolutely suitable and reasonable to do an entire sort of copy of a data set and run that type of analytics inside the warehouse. But what we're trying to do is take the applications
Starting point is 00:28:34 relative to marketing and advertising, productize them in a format that does not require that and still leaves the data where it is inside a snowflake, you know, provides this level of sort of anonymization. And I would also highlight the fact that Stevens code
Starting point is 00:28:48 that does the encoding of that new data structure also enables out of five to one data compression format, which also supports basically more queries for the same price when it comes down to this affinity-based querying structure. Yeah, that's very interesting. This discussion that we are having about the comparison between having a more niche kind of system around graph processing
Starting point is 00:29:13 and the general kind of graph database, it's something that reminds me a little bit of something that happens also here at RadarStack from an engineering point of view because we have built like part of our infrastructure needed some capability similar to what Kafka offers, right? But instead of like incorporating Kafka in our system, we decided to go and like build part of this functionality over Postgres in a system that's like tailor-made for exactly our needs. And I think that finding this trade-off between a generic system towards a function
Starting point is 00:29:49 and something that is tailor-made for your needs, it's like what makes engineering as a discipline super, super important. I think at the end, this is the essence of making engineering choices when we're building complex systems like this, trying to figure out when we should use something more generic as a solution or when we should get a subset of this and make it tailor-made for our problem that we are trying to solve. And that's extremely
Starting point is 00:30:15 interesting. I love that I hear this from you. We had another episode in the past with someone from Neo4j and we were discussing about almost this, because if you think about it, like a graph database at the end is a specialized version of a database, right? Like at the end, database system Postgres can replicate the exact same functionality that a graph database system can do, right? But still, we are focusing more on a very narrowly defined problem, and we can do it even more, and that's what you've done. And I find like a lot of beauty behind this. So this is great to hear from you guys. I think it's also interesting just
Starting point is 00:30:50 picking up on that in terms of the decision around like, when do you optimize versus sort of, you know, leave it generic. I mean, for us, you know, a big part of that, you can also see, obviously, in market, right, there's, you know, machine learning and sort of, you know, machine learning platforms that can, you know, have a host of different models can be used for you know a host of different things through the swiss army knife application within you know an organization for us anyway when when those custom requests come in from teams absolutely like those types of platforms make a lot of sense because your data science team has to go in. It's sort of probably a custom model and a custom question that's being answered. I think for us specifically, when it comes time to actually building an optimized solution, something that actually be building a custom model every time or should you actually push that workload into the warehouse?
Starting point is 00:31:51 And that's for us anyway, that's been a specific focus is like for those applications of those requests that you can have the marketer self-serve and get the answers they need in seconds, as opposed to putting it on the data science team backlog. Those are the applications for us that we're sort of focused on and actually pushing in and optimizing. Yeah, yeah, I totally agree. So last more technical question from my side. You mentioned that the way that the system works right now is you get the raw data that someone has stored in the Snowflake and you have some kind of encounters or transformers that transform this data into these probabilistic data structures that you have. Do you have any kind of limitations in terms of what data you work with? Do you have
Starting point is 00:32:34 some requirements in terms of the schema that this data should have? And what's the pre-processing that the user has to do in order to utilize the power of your system? Yeah, so if it's essentially rectangular form data, it's pretty easy to ingest into the encoder. We have a UI that will do that for you. But if there are some weird things about the data that wouldn't be typical, we can actually work with them. If they give us an example of what the data looks like, we can essentially craft a encoding query for them.
Starting point is 00:33:08 They just feed everything through, and that will still end up in the right way to go into our encoder and still end up in the essentially probabilistic graph format that we use. So we haven't currently run into any data set that we haven't been able to encode, but yeah, it seems to be pretty generic at this point.
Starting point is 00:33:27 And is this process something that the marketeer is doing, or there's some need for support from the IT or the engineering team of a company? We usually work with the IT at that stage, and then once it's encoded, the UI will work with the data that's already encoded. And they can also set up tasks inside of Snowflake, which will update that database over time or that data set over time to add new records or update the data as it comes
Starting point is 00:33:52 in. But yeah, that is not handled by the marketeer. All right. And is Afinio right now offered only through Snowflake? Is there like a hard requirement that someone needs to have the data on Snowflake to use the platform? It is currently cost us. I mean, we obviously went through sort of an exercise evaluating which platform to sort of build on first. I mean, for us, it came down to two sort of fundamental capabilities within Snowflake or probably three. I mean, the secure functions that we're utilizing to obviously secure our IP in terms of those applications that we share over.
Starting point is 00:34:27 The ability to do the direct data sharing capability, it was sort of fundamental to that decision. And then the third for us is obviously the cross-cloud application and the and retail and advertising space is and continues to be a good fit for our applications at this stage. for specific cloud applications. But where we are right now in terms of early market traction, our bet is on Snowflake and the momentum that they currently have. This is great. And Tim, you mentioned, I think, earlier that your product is offered through the marketplace that Snowflake has. Can you share a little bit more about the experience that you have with the marketplace, how important it is for your business and why?
Starting point is 00:35:28 Yeah, so I think the marketplace is still in its early stages, you know, even with as many data partners that are already bought in. For us, I think one of the clear challenges is that we, Affinio, are not data providers. So I think we're slightly nuanced within the framework of what traditionally has been built up on the data, you know, from a data marketplace or, you know, data asset perspective. We, you know, we're positioned inside a marketplace deliberately and consciously, you know, with Snowflake because our applications sort of drive a lot of the data sharing functionality and sort of add to the capabilities on top of that data marketplace, you know, that people can apply, you know, first, second, third party data assets inside of Snowflake and run our type of analytics on top of it.
Starting point is 00:36:15 So for us, it's been unique in the framework of simply being positioned, obviously, almost as a service provider inside of what otherwise is, you know, currently positioned as a data marketplace. But recognizing that I think over time, you'll start to see, you know, that bifurcate within Snowflake, and you will get a separation in a unique marketplace that will be driven by sort of service providers like ourselves alongside of, you know, straight data providers. So I think it's early stages. I think, you know, what we're excited about is that, you know we we see a lot of our technology as being an accelerant to many of those data providers directly and many of the ones that we've already sort of you know
Starting point is 00:36:54 started working with directly see it as see it as a value proposition and a value add to their you know raw data asset that they may be sharing through snowflake but you'll see it as a means with which to get more value from that data asset on the may be sharing through Snowflake, but see it as a means with which to get more value from that data asset on the customer's behalf by applying our application, our technology in their Snowflake instance. This is great. Tim, you mentioned a few things about your decision
Starting point is 00:37:17 to go with Snowflake. Can you share a little bit more information around that? And more specifically, what is needed for you to consider going to another data warehouse, cloud data warehouse, something like BigQuery or something like, I don't know, Redshift. What is offered right now by Snowflake that gives a lot, tremendous value to you and makes you prefer, at this point, build only on Snowflake?
Starting point is 00:37:44 Yeah, I think if we stood back and actually looked at where Stephen and I sort of started off in terms of our applications within first-party, like porting our graph technology into first-party data, much of that was very centered on applications and analytics specific to, you know, and enterprises on first-party data only. As it pertains to that model, if it was only restricted to that model, I think we would have considered more broadly, you know, looking at doing that directly inside of any of or
Starting point is 00:38:14 all of the cloud infrastructures or cloud-based systems to begin with. But, you know, I would say that ours is a combination of the ability to do, you know, analytics directly on first-party data, as well as Steven indicated, a major component of our technology that we've created inside of Snowflake and unlocks this privacy-safe data collaboration across the ecosystem. As a result of that, for us, the criteria in terms of selecting Snowflake was, again, the ability to leverage secure UDFs and secure functions to sort of lock and protect our IP that we're sharing into those instances. But the second major component is sort of the second half of our IP, which is effectively this privacy safe data collaboration, which basically is powered by the, you know, the underpinning data sharing capability of Snowflake. And so if and when sort of reviewing or evaluating other applications or other providers in terms
Starting point is 00:39:13 of context of where we report this next, I would say that that's sort of the lens that we look through, right? Is like, can we unlock the entire capability across this privacy safe data collaboration and analytics capability in a similar way that we've done it on Snowplate? Because to me, that is the primary reason why we picked that platform. Yep. And one last question for me, and then I'll leave it to Eric. And it's a question for both of you guys, just from a different perspective. You've been around quite a while.
Starting point is 00:39:44 I mean, Afinio, as you said, we started like eight years ago. That was pretty much like, I think, the same time that Snowflake also started. So you've seen a lot around the cloud data warehouse and its evolution. How things have changed in these past eight years, both from a business perspective, and this is probably something more of a question for your team, and also from a technical perspective, how the landscape has changed. Yeah, I think it's absolutely interesting, you know, the point that you're making. I mean, I first learned of Snowflake directly from, you know, customers of ours who were sort of at the time asking us specifically about, you know, the request is very simple. They said,
Starting point is 00:40:24 we love what you're doing with our social data. We would love it natively in Snowflake. And that was honestly the first time we had sort of learned of that application many, many years ago. But what I would say is that, you know, as far as the data warehouse is advanced from a technical perspective, I think for us anyway, it still sort of belongs or certainly has its stronghold directly in the CDO, CIO, CTO offices within many of these enterprises. What I expect to see and what I think we're sort of helping drive and pioneer with what we've built on the marketing advertising is sort of the value of the asset being stored inside of the data warehouse has to become more broadly applicable and accessible across the organization beyond
Starting point is 00:41:11 what traditionally has been locked away to high-influencer required data science teams. Because I think the value that needs to be tapped across global enterprises cannot funnel directly through just a single team all the time. And I think what we will see, and certainly I think as early stages are starting to see, is awareness by other departments inside the enterprise of even where their data is stored, quite honestly. I mean, there's still conversations we're having with many organizations in the marketing realm who have no idea where their data is stored, right?
Starting point is 00:41:42 So I think familiarity and comfort level associated with sort of that data asset, how to access it, what they can access, how they can utilize it, it will become the future of sort of where the data warehouse is going to go. But I think we're still a long way there. There's still a lot of education there, but we're excited about that opportunity specifically from the business perspective. Yeah, and on the tech side of things, I would say the biggest changes are probably around the whole privacy stuff that has changed over the years where you have to be a lot more privacy aware and secure.
Starting point is 00:42:13 And basically working with Snowflake makes that a lot easier for us with the secure sharing of code and secure shares of data as well. So using that with our code embedded directly into them, we can be sure that customers using this, their data is secure. And even if they're sharing data over to other customers, it's secure to do that as well. This is great, guys. So Eric, it's all yours. We're closing in on time here, but I do have a question that I've been thinking about really since the beginning. And it's taking a step back. So Kostas asked some great questions about why Snow this episode, sort of the next phase of data warehouse
Starting point is 00:43:09 utilization. And I'll explain what I mean a little bit. So a lot of times in the show, we'll talk about major phases that technology goes through. And in the world of technology and data, warehouses are actually not that old. You know, you have sort of Redshift being the major player fairly early on, and then, you know, Snowflake hitting general availability, I think, in 2014. But even then, you know, they were still certainly not as widespread as they are now. And the way that we describe it is, we're currently living in the phase of everyone's trying to put a warehouse, you know, sort of in the center of their stack and collect all of their data and do the things that, you know, sort of the, you know, marketing analytics tools have talked about for a long time where it's like get a complete view of the customer. And everyone sort of realized, okay, I need to have a data warehouse in order to actually do that. And that's
Starting point is 00:44:05 a lot of work. And so we're in the phase where people are getting all of their data in the warehouse. It's easier than ever. And we're doing really cool things on top of it. But I would describe Affineo in many ways as almost being part of the next phase. And Snowflake is particularly interesting here where let's say you collect all of your data. Now you can combine it with all other sorts of things native, which is, you know, sort of an entire new world, right? There are all sorts of interesting data sets in the Snowflake marketplace, et cetera. But most of the conversation and most of the content out there actually is just around how do you get the most value out of your warehouse by collecting all of your data in it and doing interesting things on top of it. And so I just love your perspective. Do you see sort of the same major phases? Are we right in
Starting point is 00:44:53 terms of being in the phase where people are still trying to collect their data and do interesting things with it? And then give us a peek as a player who's, you know, sort of part of the marketplace, part of the third party connections, but being able to sort of operationalize natively inside your warehouse, what is that going to look like? I mean, marketing is an obvious use case, but I think it's going to be, you know, in the next five years, that's going to be a major, major movement in the world of warehouses. Sorry, that was long winded, but that's, that's my, that's what's been going through. No, no, no, I totally, I mean, i totally i i mean it's sort of it's sort of the stuff that we think about and talk about on a daily basis what i think i think you're you're
Starting point is 00:45:30 right on i think you know obviously the world has already woken up to the sort of fact that like gathering collecting owning and managing all customer data in one location is going to be critical in the future right i would say covet has woken the world up to that in terms of you know as as many of us, you know, have heard and seen is that, you know, COVID is, you know, no better driver for digital transformation than, you know, a pandemic. So, but at the same time, I completely agree with you.
Starting point is 00:45:54 What I think personally, and I sort of just given sort of what we're creating within these sort of, you know, native applications inside of Snowflake, I think you will start to see an emergence of privacy-safe SaaS applications that are deployed natively inside the warehouse. I think you will see literally a transformation of how SaaS solutions are being deployed.
Starting point is 00:46:18 And I think what you'll see is organizations like Affinio who have traditionally hosted data on behalf of customers and provided web-based logins to access that data that's stored by the vendor. I think you'll see and continue to see a movement
Starting point is 00:46:35 where the IP and the core capabilities and the technologies of these vendors will begin to start to port natively into Snowflake. I believe that Snowflake itself will actually start to find ways to find attribution around the compute and value that those vendors like ourselves and the applications that are driving inside of the warehouse. And I think you'll see just naturally extend into, you know, rev share models, where
Starting point is 00:47:05 for the enterprise, you know, you sign on to Snowflake, you have all these native app options that you can turn on automatically, that basically allows you not only to reap more benefit, but just get up to speed and make your data more valuable faster, right. And I think I honestly, you know, Steve and I've talked about this for some time now. We honestly see that, you know, in the next 10 years, there'll be a transition. And certainly maybe it probably won't eliminate the old model, but you'll see a new set of vendors that will start building in a native application format right out of the gate. And that, I think, will transform the traditional SaaS landscape. Yeah, absolutely. And a follow-on to that. So when you think about data in the warehouse, you can look at it from two angles, right? The
Starting point is 00:47:50 warehouse is really incredible because it can really support, you know, any, well, not necessarily any kind of data, right? But data that conforms to any business model, right? So B2C, B2B, et cetera. It's sort of agnostic to that, right? Which makes it sort of fully customizable and you can set it up to suit the needs of your business. So in some sense, everyone's data warehouse is heavily customized. When you look at it from the other angle though, from this perspective of third-party data sets and something that Kostas and I talk a lot about, which is sort of common schemas or common data paradigms, right? If you look across the world of business, you have things like Salesforce, right? Salesforce can be customized, but you have sort of known hierarchies, you know,
Starting point is 00:48:40 lead contact account, et cetera. Do you think that sort of the standardization of those things or market penetration of sort of known data hierarchies and known schemas will help drive that? Or is everyone just sort of customizing their data and that won't really play a role? Yeah, that's a great question. I mean, it's conversations we've had with other vendors, you know, and many of our customers relative to what they perceive as sort of beneficial to, you know, many CDPs and market to your point, Eric, right? Like where the fixed taxonomies and schemas basically enable, you know, an ecosystem and an app ecosystem and sort of partner ecosystem to build easily on that schema on top of that. Yeah, completely.
Starting point is 00:49:26 You know, I would say that, you know, I think it's still early to see how that actually, you know, comes about. What I would say is that I think you will start seeing organizations sort of adopt many aspects within Snowflake and within their warehouse of sort of, you know, best of breed schemas for the purpose of, you know, as I would say, as I see this sort of application you know, best of breed schemas for the purpose of, you know, as I would say, as I see this sort of application space build out, it's kind of the way that it has to scale, right? So both from a partner and sort of marketplace, you know, marketplace play, as well as, you know, the plug and play nature of how you want to deploy at, you know, this at
Starting point is 00:50:00 scale. I mean, ultimately, the game plan would be that, again, all these apps sort of run natively, you could turn them on, they already know what the scheme is behind the scenes, and they can start running. As Stephen alluded to, there's obviously at this stage, a lot of sort of handholding at the front end, until you sort of get those, you know, schemes established and are encoded into a format that's, you know, queryable, etc. So I think, I think what you'll start to see is sort of best of breed, you know, bridging across into Snowflake would be my assumption that I would say the more that you see people sort of leveraging Snowflake as a, you know, build your own format of Snowflake, it's kind of required, right? And I wouldn't be surprised to see that some elements of that be adopted across into sort of, you know, best of class and best of breed within Snowflake directly for that purpose? Sure. Sure. Yeah, it is, it's kind of,
Starting point is 00:50:49 it's fascinating to think about a world where, you know, today you kind of have your set of, your core set of tooling, right. And sort of core set of data and you build out your stack by just making sure that things can integrate in a way that makes sense for your particular stack, which, you know, in many cases requires a lot of research, et cetera. And it's really interesting to think about the process of architecting a stack where you just start with the warehouse and you make choices based on best of breed schemas. And, you know, at that point, the tooling is heavily abstracted, right? Because you are basically choosing time to value in terms of best of breed schemas. Super interesting.
Starting point is 00:51:37 Yeah, completely. All right. Well, we're close to time here. So I'm going to ask one more question. And this is really for our audience and anyone who might be interested in the Snowflake ecosystem. What's the best way to get started with sort of exploring third-party functionality in Snowflake? I mean, Affinio, obviously, really cool tool, check it out. But for those who are saying, okay, we're kind of at the point where we're unifying data and we want to think about augmenting it, you know, where do people go? What would you recommend as the best steps in terms of exploring the world of doing stuff inside of Snowflake
Starting point is 00:52:14 natively, but with third-party tools and third-party data sets? I think it all starts with, from our perspective, you know, many of the conversations we have with prospects and, you know, customers is around sort of what questions are sort of the repeatable ones you want to get addressed and want to answer it. And in combination with that, obviously, a key element to what, you know, these types of applications enable is from a privacy perspective, it sort of unlocks the ability to answer those types of questions by more individuals across the organization. So many of the sort of starting points for us ultimately comes down to what are those repeatable, you know, repeatable questions and repeatable work, you know, workloads that
Starting point is 00:52:54 you'd like to have, you know, trivialized and basically sort of plug and play inside of the warehouse that would speed up what otherwise oftentimes, you know, is a three week wait time or a three week model or a three week answer. And so I think, you know, is a three-week wait time or a three-week model or a three-week answer. And so I think, you know, for us, that's where we start with most of our prospects and discussions. And I would think, you know, for those thinking about or contemplating that, that's a great place to start is sort of recognizing that this isn't for, you know, this isn't the, you know, the silver bullet for to address all questions or all problems. But for those that are
Starting point is 00:53:25 sort of rinse and repeat and repeatable, these types of applications are very, very powerful. Love that. That's just thinking back to my consulting days when we were doing lots of analytics or even sort of tool choice for the, for the stack. Always starting with the question, I think is just a really, I think that's just a generally good piece of advice when it comes to data. Well, this has been a wonderful conversation. Tim, Steven, really appreciate it. Congrats on your success with Affineo, really cool tool. So everyone in the audience, check it out. And we'd love to have you back on the show in another six or eight months to see how things are going. Yeah, I would love to. Thanks very much. As always, a really interesting conversation. I think that one thing
Starting point is 00:54:11 that stuck out to me, and I may be stealing this takeaway from you, Costas, so I'm sorry, but I thought it was really interesting how they talked about the interaction of graph with sort of your traditional rows and columns warehouse in the paradigm of nodes with sort of your traditional, you know, rows and columns warehouse in the paradigm of nodes and edges. That's something that's familiar to us, you know, relative to identity resolution, you know, sort of in the stuff that we're, that we work on in the world familiar with. And so kind of breaking down that relationship in terms of nodes and edges, I think was a really helpful way to think about how they interact with Snowflake data. Yeah, yeah, absolutely. I think this part of the conversation where we talked about different types of representation of the data and how
Starting point is 00:54:55 its representation can be more well suited for specific types of questions, it was great. And if there's something that we can keep out of this is that there's this kind of concept of the data remains the same at the end. What is expressed as part of the data, it's the same thing, right? It doesn't matter if you represent as a graph, as a table, or at the end as a set. Because if you notice like the conversation that we had at the end, they end up representing the graph using some probabilistic data structures that at the end represent sets and they do some set operations there to perform their analytics and that's from a technical perspective is very interesting and i think
Starting point is 00:55:36 this is big part of what actually computer engineering and computer science is about right like how we can transform from one representation to the other and what kind of expressivity these representations are giving to us. Keeping in mind that at the end, all these are equivalent, right? Like the type of questions that we can answer are the same. It's not like something new will come out from the different representation. It's more about the ergonomics of how we can ask the questions, how more natural the questions fit to these models and structures, and in many cases also around efficiency. And it's super interesting that all these are actually built on top of a common infrastructure,
Starting point is 00:56:16 which is the data warehouse, and in this case, Snowflake. And that's like a testament of how often open platform Snowflake is. Although, I mean, in my my mind at least it's like pretty much the only other system that i have heard of being so flexible is like postgres but postgres like a database that exists for like forever like like 30 years or something like snowflake is a much much uh younger product but still they have managed to have like an amazing amazing velocity when it comes like to building the product and the technology behind it. And I'm sure that if they keep with that pace,
Starting point is 00:56:49 we have many things to see in the near future, both from a technical and business perspective. Great. Well, thank you so much for joining us on the show and we have more interesting data conversations coming for you every week, and we'll catch you in the next one. at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rudderstack, the CDP for developers. Learn how to build a CDP on your data warehouse at rudderstack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.