Drill to Detail - Drill to Detail Ep. 103 'Reverse ETL, Profiles Sync and Segment Unify' with Special Guest Kevin Niparko

Episode Date: April 26, 2023

Mark Rittman is joined by Twilio Segment Head of Product Kevin Niparko to talk about trends in the customer data platform market, Reverse ETL and Profiles Sync, the impact of LLMs (Large Language Mode...ls) on digital customer experience and Segment Unify, a consumer scale real-time identity resolution solution that provides complete, real-time, portable customer profiles.Segment Unify is here: complete, real-time, portable customer profilesActivate warehouse data in all of your destination toolsCustomer profiles made portableDrill to Detail Ep.94 'Rudderstack and the Warehouse-First Customer Data Platform' with Special Guest Eric DoddsDrill to Detail Ep. 86 'Reverse ETL, Hightouch and CDW as CDP' with Special Guest Tejas Manohar

Transcript
Discussion (0)
Starting point is 00:00:00 You know, I'm really excited about some of the breakthroughs that we're seeing around large language models, co-pilots, and assistants. And I think one of the key questions that many folks are wrestling with is how do you not just get a generic experience out of these AI assistants, but how do you make that really tailored for your business and your use case. And a lot of that comes down to the data that they're trained on and fine-tuned with. And so I think that's one of the big challenges ahead as we think about getting the most out of these AI breakthroughs. So welcome to the Drill to Detail podcast and I'm your host Mark Rittman. So my guest today is Kevin Niparko from Segment. So Kevin, thanks very much for coming on the show and great to have you here. Mark, thank you so much for having me on.
Starting point is 00:01:06 So Kevin, for anyone who doesn't know you and who doesn't know what Segment is or who Segment are, just do a little intro to yourself. Yeah, absolutely. So Segment's a customer data platform, which means we help businesses collect, unify, and activate their customer data. We make it really easy for our customers to understand their end users and then drive engagement and growth of their business with data. So we work with 25,000 different businesses around the world providing real-time customer data infrastructure for brands like Fender Guitars, Intuit, and even Domino's pizza. Okay, fantastic. So I was saying to you in our pre-conversation really
Starting point is 00:01:48 that you've been looking, you've been at Segment now for a while and you've been looking after parts of Segment that I've kind of found really interesting in the past. So I think it was growth and analytics you're looking after first of all, and then ETL and warehouse. I mean, just tell us about your,
Starting point is 00:02:03 I suppose your career really within Segment and how you've kind of, I suppose, how the product's grown and how you've grown with it. Yeah, absolutely. So I actually started as Segment's first analyst, helping our early team figure out the business model, product direction, and go-to-market strategy through our own data. But as you may know, being a loan data analyst at a data company often means that you play this role of the internal customer. So I was essentially the dog eating the dog food day in and day out, using our product and data every day to make business decisions. And so I remember one of my first projects was actually looking for segments aha moment. So the question was essentially, at what point in the
Starting point is 00:02:51 implementation cycle do customers stick around? I think Facebook had famously set their aha moment at seven friends in the first 10 days, right? So once the user achieved that milestone, they were more likely to continue using Facebook going forward. And I was sort of tasked in figuring out what is the aha moment for segment. And what we realized was that it was about getting to three connections implemented. So you could connect two data sources or one destination or one data source and sync that into two tools. The configuration exactly didn't matter. But what really mattered was the data foundation that you were laying with Segment and your CDP strategy was starting to pay dividends once you connected that third tool. So it was a really interesting exercise. I think it's a really exciting place to be and sort of understand the product is getting really hands on through the analyst and data engineering teams. And what I learned is that this data was actually so critical in helping us form what ultimately became our CDP product, what ultimately became our go-to-market strategy. And this was also right
Starting point is 00:04:06 around the time when cloud data warehouses were starting to really take shape. But people, including us, were trying to figure out exactly where they fit into our stack alongside existing analytics tools or how it could be used to really drive decision making. And so I helped kick off our product initiative around bringing warehouses to more customers with Peter, our CEO at the time, and a few others. And so have been building out the product and product team ever since.
Starting point is 00:04:40 It's interesting because it's certainly the route that you took with kind of segment warehouse and so on, it was probably, it was in retrospect, it was was actually the PM for the product that was monetizing and making available that behavioral data to their customers. But their approach was to monetize that and to put, for example, a semantic model over it using Looker. But you guys went and just streamed it into the warehouse and kind of made it available as part of the product. So it was interesting, the angle you took on that. And obviously it
Starting point is 00:05:29 was very successful, really. What made you kind of think about the warehouse being such a key part of what you were doing at the time? Yeah. So there's a really interesting backstory to this product. Essentially what we found was that in the early version of Segment, customers were syncing their customer data to an AWS S3 destination. This was one of the most popular destinations on the platform, but we didn't really have any visibility
Starting point is 00:06:00 into what our customers were doing after it landed in the S3 bucket. And so we went out, we talked to our customers were doing after it landed in the S3 bucket. And so we went out, we talked to our customers, and again and again, we uncovered these one-off scripts that teams were writing, getting data from S3 into a queryable format, largely in Redshift and to some extent BigQuery. And so these scripts were fragile, would often break additional overhead for engineering
Starting point is 00:06:27 teams to maintain. And most of our customers and the market hadn't really figured this out yet. And so it was really about learning from our most advanced customers, what they were actually doing, how they were leveraging this data and then productizing that and bringing that to you know a few clicks in a web app to get data flowing into your data warehouse okay okay and we'll get on to i think you're also looking after personas one point we'll get on to that in a second um because it's obviously leads into the conversation around cdps but um but segment is now part of Twilio. So what does that mean really for the product? And what does that mean about what you do and the direction of Segment as a sort of
Starting point is 00:07:13 like a platform really? Yep. So in 2020, Segment was acquired by Twilio. Twilio, as many of our listeners may be familiar with, have historically provided communication APIs, making it really easy to programmatically send an email or text message or make a voice call if you're a developer. But one of the things that Twilio realized was that the way to send more relevant messages is by bringing customer context into that message. And so you
Starting point is 00:07:48 need to know what channel on what message to send and when to send that message to get the most out of the interaction. And so by bringing customer data closer to channels, you not only get better communications, but you also get a more complete picture of your customers because the channels can help you understand how you're interacting across different touch points. And so by bringing these two different products together, we're essentially building towards this vision of a Twilio customer engagement platform where customer data sits at the center, powering all of the communications that brands have with their customers.
Starting point is 00:08:29 Okay. So really the point of this episode is to talk about segments, I suppose, strategy and positioning in the CDP market, but also two new features of segment that have come out recently that you've been involved in, Profile Sync and Reverse ETL. Okay, so that's really what I want to cover in this conversation. But let's start off by taking a sort of a step back really. And I suppose give me your definition of what a CDP is, a customer data platform. And I suppose Segment's kind of initial positioning with this
Starting point is 00:09:03 and kind of why segment was a player in this market at the start? Great. Yeah. So there's this really long and exciting history around customer data platforms that I hope we can get into through this conversation. And last week, we launched segment unify, which we think of as a really big step forward for CDP. An easy way to think about customer data platforms is if you were running a business before the internet, you generally get to know all of your customers personally, right? They'd come into your store, you'd develop a relationship with them, they'd tell you things about their lives, what styles they were into, whether something seemed too
Starting point is 00:09:43 expensive or too cheap, whether they like the pink pants or the green overalls. But as businesses have moved online, that same interaction continues to happen. But now it's in the digital space and it's happening thousands of times per second across mobile apps and websites and CRMs and marketing automation and contact center systems and on and on. Each one holds a little fragment of understanding your customer and their customer journey. On the other side of this, there is this explosion of ways in which you can put customer data to work to understand your customers and drive growth. So you now have advertising, marketing, texting, on-site personalization. The latest is fine-tuning large language models and co-pilots and assistants.
Starting point is 00:10:33 And so the uses for customer data are large and continue to grow. And I think the key insight that segment had was that collecting this data and making this data useful was this hard engineering challenge that requires lots of advanced infrastructure. It's not like the exciting infrastructure that engineering teams show up eagerly ready to build, right? You're not building this shiny glass building on the city skyline. It's a lot of boring and hard problems to solve. It's like the plumbing behind the scenes, the pipes that connect everything together, that engineers and data folks are generally happy to get off their plate. And so that's really where we started with this product called Connections. You can think of this as a set of APIs and
Starting point is 00:11:24 infrastructure to get data from wherever it's generated into wherever it needs to go. Okay. Okay. And so where did personas come into this then? Because it was personas that was my first introduction to CDPs really. And, you know, we had Tejas Manahar on the show many episodes ago and saying personas was an interesting product so what what did personas bring to things and in particular where did its role around identity resolution come in really why is that a problem it was solving and what did personas bring to things really yeah it's a great question this was really the next evolution of the product was one of the things that we learned along the way is it's not just enough
Starting point is 00:12:05 to get raw data from one place into another. There are a lot of additional hard problems in making that data usable and actionable. And so on top of the raw data, we started to provide identity resolution capabilities to turn it from sort of raw events or dimensions of a user into what we refer to as golden profiles. Golden profiles are essentially this up-to-date trusted digital record of who your users are and where they are in their journey with your business. Profiles are definitely not a unique concept in the data world, but there are a few things that we do differently. First is that we think about golden profiles as really needing to be complete, meaning that it's the collective understanding of a user across touchpoints. So we've recently extended that to include data that sits in the data warehouse with reverse ETL.
Starting point is 00:13:05 So the insights that data science and analytics teams are generating can easily hydrate this profile. The second is that these golden profiles are portable, meaning that you can sync and access them in different tools. They're not locked away in some cloud or suite. They're accessible in all of the places you need them with the tools that you're familiar with. And then the third is that they're real time and kept up to date, meaning that these profiles
Starting point is 00:13:33 really represent the latest state of your users. And I think one of these interesting stats that continues to blow my mind, we're resolving 250,000 data points to profiles every second and are executing more than 50,000 complex profile attribute computations in seconds of receiving new digital signals. So this is just a type of example of the level of scale and performance that's required to deliver on this vision of golden profiles okay so and i suppose a digression there really but so there's a school of thought that says that trying to get a custom 360 view is a an impossible task now because of because of privacy for sort of like in regulations
Starting point is 00:14:19 and because of sort of browser privacy features and so on yeah How much of a goal do you think this is? How achievable do you think this is? Or what is the level of, I suppose, complete golden profile that people should expect to get these days from visitors and from customers on their websites out of interest? It's a great question. And consent and compliance is definitely top of mind for many businesses. And I think the key here is really being transparent with your data strategy with your end users and helping them understand and providing tools to understand how you are using this data and these golden profiles. And so we provide a lot of tools to manage end user privacy and collect consent and manage that responsibly. So your end users are always approving and
Starting point is 00:15:16 supportive of the ways in which you're using their data. Okay. Okay. So, so another reason I was really keen to get you on the show was, you know, we've had high touch on the show. We've had Rudderstack and we've had, there's obviously there's a conversation in the industry about the ideal architecture for a CDP. Or I suppose the ideal architecture for a customer that's bringing this kind of thing together. So I've always had Segment down as being a really kind of high quality marketer focused package CDP. And as opposed to maybe sort of building yourself using a warehouse. So is that a fair characterization? And is that the case still?
Starting point is 00:15:58 I suppose what would be the ideal architecture for you for trying to build this? And what led to you adding these warehouse features into, say, sort of like into the personas packaged application that's been there in the past? Yeah, I think that's a really interesting framing. And, you know, one of the things that I'm really, really excited about with, you know, in the CDP market is there are a lot of new flavors and approaches that are taking form. And so I think it's really interesting to learn from each other and continue to push the bounds of CDP. I think there are two things that are sort of top of mind here. The first is built for the marketer, built for the developer, the data engineer. And I think
Starting point is 00:16:42 the reality is that CDP requires a lot of different stakeholders coming together and working together in new ways. And so I think that can often fall short if you're forcing data folks to operate in ways in which they're unfamiliar or use new types of tools, and similarly, vice versa for marketers and salespeople. And so really, our approach is around portability and empowering teams to use the tools that they're most familiar with, while building towards this collective understanding of who your users are and how you can better serve them. So that's sort of point one on who the target customer is and how you can better serve them. So that's sort of point one on who the target customer is, and how we think about these different personas working together. The second, I think, is around
Starting point is 00:17:33 this really interesting conversation of composability and extensibility. And, you know, I think that has always been very fundamental to our product philosophy. So you can connect and sync data from hundreds of data sources via Segment. You can activate data across a really powerful catalog of marketing and analytics and data warehousing. You can now sync data out of Snowflake or BigQuery or AWS Redshift with reverse ETL via segment. And so customers will adopt these piecemeal over time. And we really see that option as providing great flexibility and allowing CDP to meet our customers and the market where they are. But I think at the end of the day, and you highlighted compliance and privacy as being one of the shared things that you need across your data platform.
Starting point is 00:18:31 I think there are others like identity resolution, you want to have a shared understanding of who your users are, regardless of what tool you're operating out of governance. You need to have a shared understanding and canonical definition of your data, where it comes from, how it's defined. So everybody can be leveraging that same understanding. And so as you look across at the horizontal and shared components, you realize that there is a lot of value in bringing together these different components of CDP into a single solution. Okay. So I'd like to get onto one of the new features, Profile Sync. Okay. So I've built what you might call composable CDPs or warehouse-based CDPs, and identity resolution, doing it well, is hard, really.
Starting point is 00:19:29 So maybe just talk to us about why it's hard to do that. What's involved in doing identity resolution and why it might make sense in this case to do it within a segment and then sync that back to the warehouse. Yeah, absolutely. And I'm sure there are some great stories that you can share into sort of where you've succeeded and where there are hard problems and skeletons
Starting point is 00:19:47 in the closet. I think there are a few hard problems that we had to solve with Segment Unified. The first problem was around identity resolution. And that's really how do you know who your customers are across touch points. So let's walk through an example to understand where some of the difficulty may come from. Let's talk about a customer, they are purchasing in store, and they provide their email address at checkout. Later, they call or text into support to get help installing this product. So now you have a phone number. And once the product is installed, they sign up for their service and are provided with a user ID. And so now imagine that same journey is happening hundreds or thousands of times per second, each journey occurring in a slightly different order
Starting point is 00:20:40 with slightly different identifiers, each which need to be managed and mapped to a single profile. This is really the hard problem of identity resolution. And trying to do that in real time at scale is often out of reach for a lot of teams. And so there were particularly hard challenges that we had to solve in building a scalable identity resolution solution here. One is going from anonymous to known. So how do you relate top of funnel anonymous activity with down funnel purchase behavior? There's shared identifier detection. So identifiers can have
Starting point is 00:21:19 varying levels of uniqueness. If you think about a phone number that can be shared across a household and devices can be shared among many individuals. And so you need to account for all of these nuances around each identifier that's introduced into the identity graph. Okay. That's interesting. And I suppose the point of Unify and profile sync is that you can do that within segment, but then sync that to the warehouse, really. So is that basically how it works? That's exactly right. I think the second sort of challenge and architecture conversation that's going on across the market right now is how do you bring real time streaming architecture alongside data at rest that is living in your data warehouse. And so you hear a lot about customer 360, the single source of truth, the single data source to rule them all. And as you mentioned, I think that is largely a myth for most businesses. In reality, customer data infrastructure is this
Starting point is 00:22:19 tapestry of different databases and Kafka stream and data lakes and SaaS tools, each one holding this subset of understanding about your customers. And so there's this challenge of bringing real time streaming data and data at rest together. And that's really where ProfileSync and ReverseETL come from. ProfileSync loading this golden record into a queryable format directly in your data warehouse, giving your data teams the same access to this data that your marketing and support and sales folks are operating off of. And then with reverse ETL, giving data engineers and data analysts really superpowers of bringing the insights and models that they're developing on top of the data warehouse, unlocking that across the organization
Starting point is 00:23:13 and being able to sync those insights directly onto that golden record and into every tool that's required. Okay, so reverse ETL. So how does that relate to, I suppose, SQL traits you get within what was personas before? They both can read from data warehouse and use that to send data up to segment, really. But what's the two different, I suppose, use cases and uses of SQL traits and reverse ETL? How would they be kind of like used together or how do they use some projects? Yeah, that's a great question. So SQL traits is really exclusively
Starting point is 00:23:51 about hydrating this golden profile. Reverse ETL is that and a little bit more, which is being able to sync really any data model into a downstream destination. It doesn't necessarily need to sync to any data model into a downstream destination. It doesn't necessarily need to sync to the golden profile. So an example here could be you have a set of orders that you need to get into customer I.O.
Starting point is 00:24:17 You can very easily write a model and sync those orders directly from your data warehouse into Customer I.O. with a few lines of SQL and a bit of configuration. Okay, so how is this all packaged up? Because obviously Segment since first came along was just say, just connections, for example. But now you've got this, you've got customer,
Starting point is 00:24:41 you've got journey orchestration, and other features in there. So how do these two new features fit in with things? got this you've got customer you've got journey orchestration other features in there so how does how do these two new features fit in with things and generally how is segment yep um packaged up really i suppose great question so you can really think of this again as modular uh you can start with connections you can add segment unify reverse etl is available both on our free tier, as well as our self service tier. And then unify is added on top of connection. So this would be something that you can add on top of so you can get started with, you know, customer data infrastructure, just syncing data from one
Starting point is 00:25:21 place to another, add identity resolution and profile sync on top, as well as being able to use reverse ETL as a standalone offering or in conjunction with golden profiles. Okay. Okay. So, so last, I suppose, last thing to talk about for me really is that you've got, I suppose, segment now can do what we might call a composable data platform. It can do warehouse-centric. It can do API-based and whatever. What, in your view, would be the canonical? Say you had a classic, I don't know, D2C sort of business that was largely online with some offline activity and so on.
Starting point is 00:25:58 What's the kind of canonical reference architecture now for a CDP from Segment's point of view, really? Do you have any sort of thoughts on that yeah it's a great question or how or how or how would they yeah or how would they go about actually making that decision because obviously all the answer is it depends but what was what's the thought process you'd go through to cut the architecture it's a great question i think we're i see this world heading where our perspective on this is really thinking about real-time event streaming and warehouse syncs via reverse ETL operating in conjunction, all hydrating some
Starting point is 00:26:34 shared understanding of golden profiles, who your end users are, where they fit in your journey, and then thinking about the application layer with line of business tools for marketing, analytics, and sales and support, all reading directly off of this golden record. All of that needs to be working in concert with your consent management strategy, your data governance strategy, and requires high levels of observability and reliability baked in. And so, you know, I think there are lots of ways in approaching this problem, but I think the end state is pretty clear and requires all of those components working together. Okay, interesting. And how much, I suppose, how much of, because obviously now Segment being part of Twilio, I'm sure there's a kind of overarching
Starting point is 00:27:30 product strategy there. How much of, how much should we think of Segment and Twilio being the solution as such, you know, or are they still kind of separate and atomic? Do you generally think of the typical kind of deployment now as being using both sets of technologies yeah so as i mentioned earlier we're building towards this broader view of this engagement platform cdp and customer data really sit at the center of that universe and requires getting data in hydrating these profiles. And then we have great offerings for marketing journeys with Twilio Engage, for Contact Center with Flex. But we also are interoperable and continue to expand our integrations across 400
Starting point is 00:28:21 and growing amazing tools where you can leverage these golden profiles. And so we're really excited about that opportunity in front of us, both in terms of extending into the application layer and putting data to work with really great apps, but also continuing to allow our customers to leverage the tools in their tool set and use highly specialized tools to get the job done. Okay, excellent. And I suppose, again, the last question for me. So you've been involved in all the interesting features of Segment through the years. So what's the next thing for you then? What's the next problem to be solved, really, do you think, in this space? Yeah, that's a great question. You know, I'm really excited about some of the breakthroughs that we're seeing around large language models, co-pilots and
Starting point is 00:29:09 assistants. And I think one of the key questions that many folks are wrestling with is how do you not just get a generic experience out of these AI assistants, but how do you make that really tailored for your business and your use case? And a lot of that comes down to the data that they're trained on and fine-tuned with. And so I think that's one of the big challenges ahead as we think about getting the most out of these AI breakthroughs. Fantastic. That's a really interesting answer. Excellent. So how do people find out about these two new features and how do people get a trial of Segment and find out how it works really? Yeah, absolutely. You can go to our website, segment.com. Happy to chat with any folks who are listening here. Feel free to reach out, knaparko at twilio.com
Starting point is 00:30:02 and we can set up some time. Thank you. That's brilliant. Well, thank you very much, Kevin. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.