The Data Stack Show - 227: The Art & Science of Marketing Attribution: From UTMs to Machine Learning with Lew Dawson of Momentum Consulting
Episode Date: February 5, 2025Highlights from this week’s conversation include:Welcome Back, Lew (0:14)Recap of Previous Discussion (1:03)Benefits of Hashing Information (2:33) Using Hashes for Data Context (4:24)Hashing and ...Query Parameters (7:24)Static Values for Hashing (11:10)Identity Resolution in Data Attribution (14:36)Methodologies for User Tracking (16:37)Combining Data Sources for Attribution (21:13)Understanding Data Gaps (25:25)Defining Objectives and KPIs (27:50)Identity Resolution Challenges (28:46)User and Session Stitching (32:01)Trusting Ad Platforms (35:23)Defining Attribution (38:09)The Credit Dilemma (40:18)First Touch Attribution Explained (41:47)Linear Attribution Model (43:21)B2C and B2B Attribution Scenarios (45:22)Timeframes in Attribution (47:29)Understanding Lookback Windows (49:34)Google Analytics Changes (51:20)Attribution After Conversion (53:26)Online vs. Offline Attribution (55:49)Discipline in Tracking (58:52)Challenges in Coordination (1:00:12)QR Codes and Data Integration (1:01:55)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
 Transcript
 Discussion  (0)
    
                                         Hi, I'm Eric Dotz.
                                         
                                         And I'm John Wessel.
                                         
                                         Welcome to the Data Stack Show.
                                         
                                         The Data Stack Show is a podcast where we talk about the technical, business, and human
                                         
                                         challenges involved in data work.
                                         
                                         Join our casual conversations with innovators and data professionals to learn about new
                                         
                                         data technologies and how data teams are run at top companies.
                                         
                                         Lou, welcome back to the Data Stack Show.
                                         
    
                                         We ran out of time last time talking about attribution stuff,
                                         
                                         although we did make a lot of progress.
                                         
                                         But we're going to dive right back in.
                                         
                                         So yeah, thanks for giving us even more of your time.
                                         
                                         Yeah, thanks.
                                         
                                         It's good to see you again.
                                         
                                         And this is the first three-part show for the Datastack show?
                                         
                                         This is the first ever?
                                         
    
                                         Yes, this is the first ever three-part show.
                                         
                                         All right.
                                         
                                         Yeah, the first one we had to,
                                         
                                         we knew it was going to be a big multi-part show,
                                         
                                         which is super exciting.
                                         
                                         I'm excited, yeah.
                                         
                                         Congratulations, Lou.
                                         
                                         It's an honor.
                                         
    
                                         So last time we walked through,
                                         
                                         I was reflecting on this a little bit,
                                         
                                         really an immense amount of work
                                         
                                         to get to the point where, you know,
                                         
                                         we have these various data sources coming in.
                                         
                                         So we talked about data,
                                         
                                         structured data from, you know from advertising platforms that have information about
                                         
                                         your campaigns, your ad groups, your ads. Those all contain UTM parameters. We talked about
                                         
    
                                         behavioral data coming in so that you can see when a user lands in your website or mobile app,
                                         
                                         and then all of the actions they perform, ultimately culminating ideally in
                                         
                                         some sort of conversion event. We also talked about how UTM parameters are what now feels like
                                         
                                         a fairly primitive way of packaging information about your campaigns into a URL so that that
                                         
                                         metadata is observable by other systems. And we talked about a really clever
                                         
                                         methodology for hashing information so that you can overcome some of the limitations of that system.
                                         
                                         So why don't we start there? So we had just started to dip into the world of talking about
                                         
                                         the hash. Just give us a quick refresher on why that is so useful as compared with using the standard, you know, sort of, let's say, traditional taxonomy of the five UTM parameters.
                                         
    
                                         Yeah, absolutely.
                                         
                                         So in short, if you recall, there were a number of challenges that, you know, highlighted with the traditional UTM parameters, like let's say UTM campaign.
                                         
                                         Campaign is a big offender because
                                         
                                         it's freeform so you could have a scenario where your campaign taxonomy has a space or it has some
                                         
                                         sort of utm character or some sort of character in it that browsers at times mangle or browsers
                                         
                                         do differently so spaces like they can be represented as percent 20.
                                         
                                         Sometimes they're represented with pluses,
                                         
                                         different ecosystems do different things.
                                         
    
                                         And so you run into the scenario, first of all,
                                         
                                         where your UTM parameters get mangled, shall we say.
                                         
                                         And so now you have to go to the trouble
                                         
                                         just to have full and proper attribution
                                         
                                         to actually standardize those names.
                                         
                                         So two, three, four, five, 10, 15, 20 variations to have full and proper attribution to actually standardize those names.
                                         
                                         So 2, 3, 4, 5, 10, 15, 20 variations you'll sometimes see on a particular campaign name.
                                         
                                         You have to go through and figure out, how am I going to standardize those? So those actually point to a single identity for that campaign.
                                         
    
                                         So that's a big problem right there.
                                         
                                         So the way that one of the primary ways that's solved is with an ID.
                                         
                                         So you roll up all those distinct values, so UTM source, campaign, term, et cetera,
                                         
                                         and do a single unique identifier that also is not easy to mangle by any sort of platform.
                                         
                                         And that's two benefits.
                                         
                                         One I just described, less likely to be mangled.
                                         
                                         The other one is that's now your join key too.
                                         
                                         So instead of having to do that resolution
                                         
    
                                         and figuring out like the standardization of that
                                         
                                         as you back into your join key,
                                         
                                         now your join key is just coming in as part of your data
                                         
                                         and it's much easier.
                                         
                                         So that's the main reason why you'd want to
                                         
                                         like look into that new set.
                                         
                                         Yep.
                                         
                                         Okay, I have a couple of questions here.
                                         
    
                                         I actually have one point that I realized we did not get to last time,
                                         
                                         which is another major benefit of using a hash.
                                         
                                         Because you can package a bunch of information into the hash,
                                         
                                         and so the concept there would be this is limited.
                                         
                                         Your mileage may vary using a spreadsheet to do this.
                                         
                                         Most companies actually do use a spreadsheet.
                                         
                                         But for the sake of, you know, the example, let's say,
                                         
                                         I guess what I'm saying is there are more stable ways to do this
                                         
    
                                         and a lot of great tools out there that, you know,
                                         
                                         you can build the hashing system with that, you know,
                                         
                                         is a little bit easier to govern than a spreadsheet.
                                         
                                         But let's say you have this in a spreadsheet.
                                         
                                         You can actually add as much information as you want,
                                         
                                         or as we talked about last time,
                                         
                                         as much information as is helpful
                                         
                                         based on what you're trying to discover
                                         
    
                                         as far as attribution.
                                         
                                         And so you're not limited to those five UTM parameters.
                                         
                                         You can actually, I mean, theoretically,
                                         
                                         you could actually just have the hash if you wanted to.
                                         
                                         But irregardless, you can start to use, it creates a context where you can free up one
                                         
                                         or more of the UTM parameters to use for other things.
                                         
                                         One of those is actually that a lot of ad platforms support dynamically pulling in the
                                         
                                         ID of the individual ad itself
                                         
    
                                         when the ad is clicked, which can be super handy.
                                         
                                         You could just append that into another UTM parameter,
                                         
                                         but then it wouldn't be packaged in the hash necessarily.
                                         
                                         What we've seen a lot is you actually will use UTM content, for example,
                                         
                                         to pull in the advertising id using curly
                                         
                                         brackets because you can package the actual content that you want in other columns in the
                                         
                                         spreadsheet that are not represented as values in the utm keys but are in the hash and so that
                                         
                                         actually can speed up ad level reporting downstream because you have the hash and then you have the actual ad ID
                                         
    
                                         in the UTM itself, which is represented on the click
                                         
                                         and all that sort of stuff.
                                         
                                         So I forgot that we didn't talk about that, Lou,
                                         
                                         but that's another really clever thing
                                         
                                         that can speed up some downstream modeling.
                                         
                                         Yeah, absolutely.
                                         
                                         I mean, it's a great call out.
                                         
                                         And even another one, like just iterating through all
                                         
    
                                         those it makes it much easier to have stable um identifiers for particular campaign ad set ad
                                         
                                         because that also allows you to make changes to that ad in your metadata table so like where you
                                         
                                         do that mapping while keeping the same slash stable identifier.
                                         
                                         So that's another huge one too
                                         
                                         that we didn't touch on yet.
                                         
                                         Yep.
                                         
                                         So yeah, there are multiple benefits for sure.
                                         
                                         That's a great call.
                                         
    
                                         Yeah.
                                         
                                         Yeah, the quality control on the input is awesome.
                                         
                                         Like being able to actually control
                                         
                                         a stable campaign name with a sequence number,
                                         
                                         for example, you know, can be really helpful.
                                         
                                         So I have a question. Yeah. So when I think of a sequence number, for example, can be really helpful.
                                         
                                         I have a question. When I think of a hash here, I'm thinking of taking some arbitrary amount of
                                         
                                         information, creating a unique ID that's specifically linked to that information.
                                         
    
                                         In this case, we're doing that in a metadata
                                         
                                         table and then taking that and putting that into a query parameter
                                         
                                         or we're doing that somehow on the front end like through a tool or yeah you want to speak to that lynn yeah sure so
                                         
                                         kind of twofold to answer your question let me know if i don't completely address it but basically
                                         
                                         what you're doing is you're going to pick the parameters up front that you want to hash and
                                         
                                         that's generally going to be like your identifier,
                                         
                                         some sort of unique identifier,
                                         
                                         whether that's you created from scratch
                                         
    
                                         or we can get into solutions later.
                                         
                                         But what you do is you generate that unique identifier.
                                         
                                         So like a SHA-256 or something of,
                                         
                                         or even an MD5 of a set of parameters.
                                         
                                         And then what you do is you take that
                                         
                                         and you put that into, so like in facebook ads you put that
                                         
                                         and for that particular ad as the as a utm pram in that particular ad right and so whenever the user
                                         
                                         will click on that particular ad as part of the query prams those will audit that will automatically
                                         
    
                                         be sent that particular identifier does that answer your your question? Yeah, so we're not also somehow...
                                         
                                         Because I was thinking, how do you get dynamic parameters
                                         
                                         into the B5 hash?
                                         
                                         No, no, no.
                                         
                                         Yeah, the dynamic parameters you would actually get
                                         
                                         from the advertising platform through their mechanism,
                                         
                                         like on click, they can insert dynamic parameters.
                                         
                                         It's just the point there was that it's nice to just,
                                         
    
                                         once you start, you really don't want to, in my opinion,
                                         
                                         for the purposes of attribution,
                                         
                                         Lou, tell me what you think about this.
                                         
                                         We're probably aligned, but if you can stick
                                         
                                         within the five UTM parameters,
                                         
                                         there are a lot of benefits of that.
                                         
                                         You don't really want to go way outside of that
                                         
                                         because then those aren't honored by every single system.
                                         
    
                                         You're adding more complexity.
                                         
                                         And so the way that you can pull in
                                         
                                         dynamic parameters without having to add
                                         
                                         a bunch of additional parameters
                                         
                                         is that you use existing UTM values
                                         
                                         because they're not required anymore
                                         
                                         because you can hash all of the metadata.
                                         
                                         So in value one,
                                         
    
                                         I've got a bunch of stuff crammed into a hash.
                                         
                                         In value two, I may have some dynamic stuff
                                         
                                         as well as three, four, five potentially.
                                         
                                         Yep.
                                         
                                         And it doesn't really, like it does,
                                         
                                         but it does not matter really what you put in the hash
                                         
                                         because as you both pointed out,
                                         
                                         it's static and it's going to be stable
                                         
    
                                         and it's chosen by you
                                         
                                         in your system where you're going to track all these Ashes to campaigns.
                                         
                                         And then again, as you both pointed out, there's the aspect of dynamic parameters, which the
                                         
                                         ad platform will put, you'll configure it in your ad and then they, so like Google ads
                                         
                                         has value track.
                                         
                                         It's a bracket and then whatever parameter, dynamic parameter you want to put associated with that UTM
                                         
                                         param now, the way at the end of the day, all those dynamic prams you'll resolve.
                                         
                                         Those is in addition on your ad, you also have that identifier, right?
                                         
    
                                         So you're going to use that identifiers, your join key, and then all those dynamic
                                         
                                         parameters, which will come in on your tracking pixel,
                                         
                                         like your click screen.
                                         
                                         So like if you use Redistack,
                                         
                                         you'll be able to get those at runtime
                                         
                                         because the ad platform will substitute those in at runtime.
                                         
                                         Then one other question, again,
                                         
                                         this is probably just from a software background.
                                         
    
                                         A lot of times, like if you generate a hash,
                                         
                                         like it changes if you change the other data.
                                         
                                         In this case, you generate one time.
                                         
                                         And if you made some change, you'd probably,
                                         
                                         it'd be better, you'd want to keep it static, right?
                                         
                                         That's exactly what you're spot on.
                                         
                                         What I was referring to earlier is
                                         
                                         that's another benefit of using the hash
                                         
    
                                         is you pick a static set of values
                                         
                                         and then you never change those values.
                                         
                                         And since it's your metadata and the covers,
                                         
                                         like you can keep those static while
                                         
                                         adding all sorts of other metadata that you can change while still having a staple id so exactly
                                         
                                         yep yeah and usually two other quick thoughts on that and then i want to move on to talking
                                         
                                         about identity resolution because it's a lot juicier but there are like these are it sounds
                                         
                                         so simple but i mean the nuances here like they're yeah yeah the url tricks are fascinating to me. Generally, I think it can be a good practice
                                         
    
                                         to use an arbitrary UTM value for the hash.
                                         
                                         Lou, actually, we haven't discussed this specifically.
                                         
                                         In the past, I've used an arbitrary
                                         
                                         URL parameter for the hash itself
                                         
                                         so that there's more flexibility in using the ones
                                         
                                         that most systems honor out of the box.
                                         
                                         Yeah, I think it's at least wise to use wherever possible,
                                         
                                         whenever possible, use a non-standard one
                                         
    
                                         so a platform doesn't step on it.
                                         
                                         Yes.
                                         
                                         If you're putting it in like utm campaign what is
                                         
                                         the platform that they put it on steps on it it's like yeah yeah just lost your attribution right
                                         
                                         exactly so one other nice thing i think this is the last the last point about the hashing in the
                                         
                                         urls one other benefit and this is something that i just I didn't think about a ton until we just dug into this problem a bunch,
                                         
                                         but the URL length can become an issue in certain cases.
                                         
                                         If you have a really long URL, it can often get truncated
                                         
    
                                         or even the way that certain browsers or applications may capture it,
                                         
                                         they may capture a truncated version of it.
                                         
                                         It can create challenges.
                                         
                                         The other thing that hash allows you to do is keep a pretty trim URL
                                         
                                         so that you don't run into string length issues.
                                         
                                         That's not a big deal if you're capturing URLs as a string in a Rutter stack payload,
                                         
                                         for example, but if it's going into other systems or if a website
                                         
                                         or application is doing something
                                         
    
                                         where it's parsing or interacting with it.
                                         
                                         Or another thing that you don't think about a ton is
                                         
                                         if an application, or I didn't think about a ton,
                                         
                                         if you have an application or a website that appends a bunch of additional
                                         
                                         parameters to the URL to actually do things like filtering
                                         
                                         and other things in the application, you can get these really long, gnarly...
                                         
                                         Like e-commerce search.
                                         
                                         Yeah, exactly.
                                         
    
                                         So again, it just sort of gives you
                                         
                                         the ability to have these really nice...
                                         
                                         Basically as long as you need,
                                         
                                         but as short as possible
                                         
                                         to sort of mitigate that.
                                         
                                         Right.
                                         
                                         Okay, I think we have unhashed all of the hash.
                                         
                                         Okay, Luke, let's talk about identity resolution.
                                         
    
                                         And we started to touch on this last time,
                                         
                                         but I want to dig a little bit deeper.
                                         
                                         And so if we think about where we're at in this journey,
                                         
                                         we are in the data store at this point, right?
                                         
                                         So let's say we have all of our data in,
                                         
                                         we're using all the URL tricks
                                         
                                         to sort of enforce quality control,
                                         
                                         have good URLs, have our join keys,
                                         
    
                                         pick up extra bonus information, if you will,
                                         
                                         that can be inserted dynamically from the ad platforms.
                                         
                                         And so we have all this data in our warehouse,
                                         
                                         and really what we arrive at
                                         
                                         is that before we can start really doing attribution,
                                         
                                         and we'll talk about what that means because that means a lot of different things.
                                         
                                         I'm so excited to chat about that and hear your definitions.
                                         
                                         But we have a ton of data and there are actually multiple identity resolution problems
                                         
    
                                         that we have to solve in order to produce let's
                                         
                                         just call or what would you call it like a baseline data set or what how would you describe like
                                         
                                         the sort of let's say the end point of prep and the starting point of like now i can actually
                                         
                                         begin the work of doing some you know insights around attribution Is that a baseline table or data set?
                                         
                                         Yeah, there's kind of, not kind of,
                                         
                                         there are two things effectively you need.
                                         
                                         And sometimes they're packaged into one,
                                         
                                         but you need a, you basically need a,
                                         
    
                                         here's a session.
                                         
                                         So something that tells you,
                                         
                                         here's a session that occurred
                                         
                                         and how the user came in on that session. and then you need a, how did that session convert
                                         
                                         or not convert effectively is the other question you need to answer.
                                         
                                         So that can be in the same dataset.
                                         
                                         So like writer using writer stack, for example, like their e-commerce spec,
                                         
                                         you have a page, right?
                                         
    
                                         So it's your initial page view event.
                                         
                                         And then order completed would be
                                         
                                         our conversion event. Like if it's an e-commerce company who's selling stuff, right? So you need
                                         
                                         those two at a minimum. Now that's not to say you have to have those two. You could have like a
                                         
                                         writer stack page view, and then you could join that with Shopify orders as long as there's a way like some sort of join key
                                         
                                         he joined those two right yep but basically those are at a minimum the two things you effectively
                                         
                                         need to attribute yeah or can I ask it yeah go for it I was I want to ask a question there because
                                         
                                         I think this is and John I'm interested in your opinion too because you've done lots of that
                                         
    
                                         both of you have done an immense amount of that type of joining in the warehouse.
                                         
                                         There are two ways to do this
                                         
                                         and I want to know the best way to do this
                                         
                                         because I have tried multiple ways in the past
                                         
                                         and I don't have as much experience with e-commerce.
                                         
                                         Maybe there are other ways to think about this.
                                         
                                         But you have, let's call it,
                                         
                                         the session-based methodology,
                                         
    
                                         which is where I have some way
                                         
                                         of following the same user across multiple sessions.
                                         
                                         You can persist the attribution data,
                                         
                                         so let's call it the hash and whatever else.
                                         
                                         So I could persist that across sessions, somehow store that.
                                         
                                         I could have some other way to tie
                                         
                                         the user's behavior together.
                                         
                                         RutterSack provides an anonymous ID.
                                         
    
                                         Maybe you want to do both.
                                         
                                         But you essentially follow that user
                                         
                                         and perhaps the actual attribution data itself through the sessions until there's a
                                         
                                         conversion. But there's another way, another methodology, which would be relying on user level
                                         
                                         identity resolution so that it's like, okay, as long as I can get the attribution data on the
                                         
                                         first, on whatever session, I don't necessarily have to persist it through
                                         
                                         if I have a way to tie the user to a Shopify order.
                                         
                                         And so then I'm actually looking for
                                         
    
                                         some instance of attribution data in a session, and then I can see
                                         
                                         there's an order at some point downstream
                                         
                                         with its own timestamp, right?
                                         
                                         And so then I can say, okay, well, that user came in here
                                         
                                         and then they eventually made this order.
                                         
                                         But in that case, I actually would be trying to join
                                         
                                         on tying the attribution data to the user
                                         
                                         from the page view event
                                         
    
                                         or whatever that behavioral event is.
                                         
                                         Then I need to tie the actual user data to the Shopify from the page view event or whatever that behavioral event is. Then I need to tie the actual user data
                                         
                                         to the Shopify orders table,
                                         
                                         which means I'm using email or some trait of the user
                                         
                                         and I'm running the join that way.
                                         
                                         Does that make sense?
                                         
                                         Those two like rad methodologies,
                                         
                                         like I follow a session through
                                         
    
                                         or like I capture the attribution data
                                         
                                         at some point in time,
                                         
                                         but I have a way to know that it's a user
                                         
                                         and then I tie that user to some conversion data
                                         
                                         like an order table downstream.
                                         
                                         Am I thinking about the broad methodologies of doing that right
                                         
                                         or is there another way to?
                                         
                                         I can speak a little bit to Shopify
                                         
    
                                         and I'm sure this changes on a regular basis.
                                         
                                         We were using it fairly early on
                                         
                                         when Shopify was first bringing on large businesses,
                                         
                                         large view count, lots of traffic.
                                         
                                         So over, I don't know, seven years ago, five years ago, something like that.
                                         
                                         Oh, that's right, yeah.
                                         
                                         It was interesting because, and this might be different now,
                                         
                                         but Shopify will attempt to do this for you.
                                         
    
                                         And they will be pulling UTM parameters, give you some session information.
                                         
                                         But it always felt pretty incomplete.
                                         
                                         And Lou, I don't know if that's been your experience too.
                                         
                                         So there's that part of it.
                                         
                                         And there's a part like, well, I can do my own,
                                         
                                         like a writer stack type thing,
                                         
                                         grab an anonymous ID, etc.
                                         
                                         Do some SQL gymnastics
                                         
    
                                         to get it to work.
                                         
                                         So we went more that route
                                         
                                         where, like, we did both, actually. We did both actually for a while.
                                         
                                         It's like, okay, we'll just pull the data out of Shopify.
                                         
                                         How did it attribute it?
                                         
                                         Let's try that.
                                         
                                         And there's gaps and we're not sure how to fill the gaps.
                                         
                                         And then we went in the other way.
                                         
    
                                         It was like, okay, fire anonymous ID,
                                         
                                         collect the email address and writer stack at checkout,
                                         
                                         which associates with anonymous ID.
                                         
                                         And then we did pretty simple attribution models would run first or last usually click usually first and and just
                                         
                                         use that information essentially yep so kind of both yeah yeah honestly just kind of my hypothesis
                                         
                                         but lou yeah yeah so you definitely you don't have to connect user data, right?
                                         
                                         It doesn't have to be user level identity resolution.
                                         
                                         As you pointed out in your first one, it's like, it can be just session and let's say
                                         
    
                                         order, but you are correct.
                                         
                                         There also can be session user order.
                                         
                                         It depends on what kind of metrics you're trying to derive, which can be a topic
                                         
                                         we could talk about later, or I told these separate data stack conversation, right around,
                                         
                                         you know, like your customer feature table. So that's one thing. And then the other thing I'll
                                         
                                         point out, John, you kind of alluded to, like, I don't know, it felt kind of incomplete.
                                         
                                         If you, if one wants to do attribution as well on sessions that don't convert.
                                         
                                         Right.
                                         
    
                                         So let's say come in really at the end of the day, like you want to include direct,
                                         
                                         right?
                                         
                                         Like you want to know how much direct traffic you're getting through
                                         
                                         specifically both for attribution.
                                         
                                         Like you can attribute to direct traffic conversions, but if you want to see how
                                         
                                         much direct traffic is coming through, including traffic, that's not converting
                                         
                                         for like your row as calculations or whatever, you can't get that through Shopify alone.
                                         
                                         Because you can only really tie sessions to conversions versus seeing all your traffic come through.
                                         
    
                                         So that's one area where like Shopify sometimes falls over like that Shopify attribution. I think I'll point out too, is the attribution recently discovered is different depending on where you get the attribution data from.
                                         
                                         So REST landing site.
                                         
                                         So the Shopify's REST API, the landing site entry, like in the order is attribution
                                         
                                         is different than if you looked at the graph QLP API and you look at business.
                                         
                                         Oh, that is interesting.
                                         
                                         So, so nothing to keep an eye on too, is like attribution actually is different
                                         
                                         and it's not clear what models are being used, like what attribution models.
                                         
                                         Like it's not clear what models are being used like what attribution models like it's not
                                         
    
                                         clearly documented so that's the pitfall you run into when you want to start getting more advanced
                                         
                                         is it's unclear how things are being calculated right in different scenarios and lastly like if
                                         
                                         you combine them or if you try and combine them or like compare them they're always going to be
                                         
                                         different which is same the same is going to be true for Google Analytics
                                         
                                         or let's say even Facebook ads.
                                         
                                         The conversion metrics are going to be very different
                                         
                                         in those platforms versus if you were to properly calculate them
                                         
                                         on your own, which is another thing you can chat about.
                                         
    
                                         Yeah, we definitely are going to chat about that.
                                         
                                         So I just thought of something
                                         
                                         I think we skipped
                                         
                                         in our URL stuff,
                                         
                                         but I think you're talking
                                         
                                         about direct traffic.
                                         
                                         We didn't talk at all about
                                         
                                         like ad blockers
                                         
    
                                         or other reasons like
                                         
                                         why we might be missing attribution.
                                         
                                         I mean, you talked about
                                         
                                         mangled URLs, but yeah.
                                         
                                         We did, you're right.
                                         
                                         But I feel like that's one that,
                                         
                                         I mean, the ad blocker thing comes up a lot, especially like if you're right i but i i feel like that's one that i mean the ad blocker thing comes
                                         
                                         up a lot especially like if you're in tech or you know advertising something with very technical
                                         
    
                                         users so i guess let's just apply it to the id resolution any thoughts around that i mean because
                                         
                                         shopify is not going to be immune to that however they attribute nor is most solutions yeah it's a
                                         
                                         really good point and this is this is this actually plays into what
                                         
                                         the comment i just made of it's a little challenging to like use multiple sources of
                                         
                                         data like shopify combined with clickstream because again like they they attribute differently
                                         
                                         but your point which is so devalid shopify will capture will capture more data so sometimes there's absolutely a level of data you're going
                                         
                                         to lose with clickstream to your point ad blockers pixels will get dropped like they just they won't
                                         
                                         fire they don't render they'll fire too late in the page life cycle like as the user's leaving
                                         
    
                                         developers didn't know there's like a you know a pixel api that will fire in the background if you
                                         
                                         use it properly so So things like that.
                                         
                                         So if you truly want like the most accurate picture, yes, you really do need to meld those
                                         
                                         data sets and you will be able to get, you'll see a subset of Shopify orders that do not
                                         
                                         have click streams or completion.
                                         
                                         Yeah, absolutely.
                                         
                                         And you should be able to generally get the attribution data from those because that data
                                         
                                         generally will be in the UTM frames,
                                         
    
                                         which will go to Shopify.
                                         
                                         So that's if the request is going to Shopify,
                                         
                                         so they'll be able to capture those.
                                         
                                         Right.
                                         
                                         But yeah,
                                         
                                         it's a really good point.
                                         
                                         That's a even bigger challenge on top.
                                         
                                         Go ahead.
                                         
    
                                         Yeah.
                                         
                                         Cause that's essentially what we ended up doing was one,
                                         
                                         like resign the fact of like,
                                         
                                         okay,
                                         
                                         we don't know how Shopify is doing this.
                                         
                                         We don't know what model they're using. The other side, too, to your point, we've got
                                         
                                         more data, especially on non-conversions with Rudderstack.
                                         
                                         And I was like, okay, well, if we have gaps that Shopify can fill,
                                         
    
                                         we don't have from Rudderstack, would we rather it be blank or rather it be what Shopify said?
                                         
                                         And that was pretty clear. Even though we don't know the exact model Shopify is using,
                                         
                                         we'd rather know and have Shopify's data than nothing.
                                         
                                         And I think, Lou, I appreciate so much
                                         
                                         how you, throughout this whole conversation,
                                         
                                         have returned us to the just wonderful reminder
                                         
                                         that it kind of depends on what metrics
                                         
                                         you're trying to produce.
                                         
    
                                         And so I'll give two examples here.
                                         
                                         So one would be, let's say you're a business
                                         
                                         that doesn't have a lot of repeat purchasers.
                                         
                                         People tend to come in, they buy one item
                                         
                                         and they don't ever...
                                         
                                         You're not necessarily building a relationship with them
                                         
                                         because it's highly transactional.
                                         
                                         There are businesses out there like that.
                                         
    
                                         Sort of at one end of the spectrum.
                                         
                                         The other end of the spectrum might be a game
                                         
                                         where sessions really matter
                                         
                                         because you want to understand
                                         
                                         what was unique about a session
                                         
                                         in which someone clicked on an ad
                                         
                                         and then eventually made a purchase.
                                         
                                         And so the difference in how necessary
                                         
    
                                         persisting all that stuff throughout the sessions
                                         
                                         and getting really tight on multiple session over session
                                         
                                         is really important.
                                         
                                         That's also a lot heavier duty modeling
                                         
                                         to do user-level session reporting
                                         
                                         that includes attribution data.
                                         
                                         There are considerations on the front end
                                         
                                         around persisting that data across the sessions and that you know, that that sort of gets heavy handed,
                                         
    
                                         right? You don't necessarily have to do that. But there are situations where that can be extremely
                                         
                                         helpful, because it reflects, you know, the insight that you want to actually uncover about
                                         
                                         your particular business, you know, in the context in which someone, you know, click something or
                                         
                                         does a conversion. Yeah, absolutely. Yeah. And, you know, one other does a conversion yeah absolutely yeah and you know
                                         
                                         one other thing which is small but we didn't really talk about we don't need to highlight
                                         
                                         too much but like you might have multiple properties that funnel into this data too
                                         
                                         so like you have a mobile app sure web app yeah multiple web apps right like so there as you said
                                         
                                         there are so many variables so i think think this goes back into the people part.
                                         
    
                                         And again, it's before people do all this,
                                         
                                         my strong urging is to define what you're trying to accomplish
                                         
                                         and define how deep you want to go.
                                         
                                         But most importantly, define the KPIs that you're trying to look at
                                         
                                         and you're trying to measure against.
                                         
                                         And then you can back and do the best solution
                                         
                                         to measure those KPIs.
                                         
                                         Yep.
                                         
    
                                         And last time we talked about this concept of altitude,
                                         
                                         which I think is really helpful, right?
                                         
                                         Like determine your cruising altitude before you, you know,
                                         
                                         before you start barreling down the runway.
                                         
                                         I mean, I think that's the problem with all of this
                                         
                                         is like I could totally picture jumping into this and then somebody getting really deep. you start barreling down the runway. I think that's the problem with all of this.
                                         
                                         I could totally picture jumping into this and then somebody getting really deep on like,
                                         
                                         we're going to solve device stitching between desktop and mobile app.
                                         
    
                                         We're going to solve that and just really myopically focus on that
                                         
                                         and miss an end-to-end solution for just attribution.
                                         
                                         Aside from that.
                                         
                                         Okay. I want to talk, I don't want to dig too deep into identity resolution because that is,
                                         
                                         we could do a three hour show literally just on that, which actually is not a bad idea because
                                         
                                         that is a really fascinating topic in and of itself. And that gets back to the, you know,
                                         
                                         the customer feature table I mentioned, which might be a different segment altogether, right?
                                         
                                         As big part of that is identity resolution.
                                         
    
                                         So that we don't go down a rabbit hole
                                         
                                         because I want to make sure that we dig into,
                                         
                                         we haven't even gotten to attribution models.
                                         
                                         And so we've got to go there.
                                         
                                         We'll get to the baseline data set first,
                                         
                                         but give us just a quick rundown, Lou,
                                         
                                         of how are you, we have all this data in,
                                         
                                         you know, we have all these disparate data sets.
                                         
    
                                         Not only do we need to join them using the join key,
                                         
                                         which at a very high level, again,
                                         
                                         as you have a behavioral event that's tied to a user
                                         
                                         with a hash value, you have the hash value
                                         
                                         in your data from the ad platforms.
                                         
                                         And so you have a join key where you can pull this together.
                                         
                                         But the reason identity resolution is a big deal is actually, I'll say
                                         
                                         this, the most immediately apparent reason it's a big deal is because you have the initial visit
                                         
    
                                         from the user that contains the hash that represents, okay, they clicked on an ad,
                                         
                                         or came from some source. And then often the distinct timestamped behavioral event that
                                         
                                         represents a conversion is separate from that, right? It happened, you know, there's some
                                         
                                         purchase event or add to cart or subscribe or whatever that, you know, downstream event is.
                                         
                                         And so you need to make sure that you can say, like, this is actually the same user in order to associate whatever value the conversion is to, you know, that campaign and that it was actually the same user who performed that to avoid, you know, double counting and all that sort of stuff. but I would classify as a related identity resolution problem
                                         
                                         is that if you are running a campaign
                                         
                                         across multiple different platforms
                                         
                                         and the concept of a campaign transcends,
                                         
    
                                         which is usually the case, right?
                                         
                                         Let's just say Spring sale 2025 is my campaign.
                                         
                                         And I actually want to push that campaign out
                                         
                                         across multiple different channels.
                                         
                                         You have to build an identity for that campaign
                                         
                                         from multiple different data sets.
                                         
                                         Again, that's one of those things where
                                         
                                         if you don't think about that going in,
                                         
    
                                         you think about, okay, I need to tie these user events together.
                                         
                                         But you also, in a lot of cases,
                                         
                                         have to tie disparate data sets for campaigns together
                                         
                                         to create, let's call it a campaign entity
                                         
                                         that includes data from multiple different platforms
                                         
                                         and that kind of has to be normalized.
                                         
                                         Because let's say you want to look at how much,
                                         
                                         what was our return on ad spend across every single platform
                                         
    
                                         for spring 2025 sale?
                                         
                                         And so you have to aggregate that.
                                         
                                         So that's my conception.
                                         
                                         What am I missing?
                                         
                                         And just give us a high level of how do you begin to approach this,
                                         
                                         again, without taking us down another three-episode rabbit hole, if that's possible.
                                         
                                         Yeah, totally.
                                         
                                         It's totally possible.
                                         
    
                                         Great observation.
                                         
                                         There are a couple of things I'll clarify there.
                                         
                                         So for a complex user, yes, you're right.
                                         
                                         That definitely starts becoming a challenge,
                                         
                                         stitching multiple data sets.
                                         
                                         So a more advanced user, like you said,
                                         
                                         is going to want to know effectively
                                         
                                         a campaign across multiple platforms,
                                         
    
                                         possibly retention, acquisition, engagement.
                                         
                                         It could be all of those.
                                         
                                         Yes, they're going to want to have different.
                                         
                                         First come in on web and then purchase later on mobile
                                         
                                         and all those different ways of challenging.
                                         
                                         Yeah, exactly.
                                         
                                         So just to give a concrete example,
                                         
                                         you're going to want to know how many emails did I send in months I VO,
                                         
    
                                         how much ad spending did I have on that campaign campaign etc right so yep at the end of the day
                                         
                                         you're right you don't want to stitch multiple data sets together so that is challenging but i
                                         
                                         would say for the simpler users this is a little bit less of a challenge and this again goes back
                                         
                                         to which we won't beat a dead horse but goes back to what are you trying to accomplish and for
                                         
                                         simpler users i don't think you necessarily need to stitch together all of those channels.
                                         
                                         In most cases, it can be mainly orders, click stream, and possibly, depending again on what exactly you're trying to measure, possibly a couple like ad channels to look at like you're spending.
                                         
                                         Now, one other thing I'll point out too is you're absolutely correct that this problem is one or more identity stitchings.
                                         
                                         And that is, you talked about stitching a user, which in some cases, yes, like you're stitching a user together and a session.
                                         
    
                                         You don't have to always stitch a user together.
                                         
                                         It can just be a session.
                                         
                                         Oh, yeah.
                                         
                                         To your point, again, it still is the identity resolution problem even for session and that
                                         
                                         it's a temporal problem so you're stitching one to end sessions over time so there's your temporal
                                         
                                         part together so you're effectively going what's you know what are all the sessions that point to
                                         
                                         a single version right so that's your node you're pointing all yours. Yes.
                                         
                                         That you're resolving.
                                         
    
                                         Right.
                                         
                                         So it definitely is still an identity resolution problem,
                                         
                                         but it's somewhat of a different identity resolution problem
                                         
                                         depending on how you're looking at,
                                         
                                         how you're looking at, sorry,
                                         
                                         depending on what you're looking at to measure.
                                         
                                         Yep.
                                         
                                         Is what I would say.
                                         
    
                                         Yeah.
                                         
                                         Yeah.
                                         
                                         Go ahead.
                                         
                                         I was just going to say,
                                         
                                         the way you described that is great
                                         
                                         because you have the campaign,
                                         
                                         let's say a campaign platform IDRES problem,
                                         
                                         you have the user IDRES problem,
                                         
    
                                         then you introduce the idea of sessions.
                                         
                                         You could actually just look at sessions or user,
                                         
                                         but then in some cases you may want to look at both,
                                         
                                         and that's when things can get really gnarly
                                         
                                         because then you're looking at tying sessions,
                                         
                                         not only tying sessions to the attribution data
                                         
                                         and to a conversion,
                                         
                                         but then also tying users to sessions themselves.
                                         
    
                                         You're getting into some pretty serious modeling.
                                         
                                         Which I think, to zoom out,
                                         
                                         is why it's easier said than done to just say,
                                         
                                         oh, well, just tell the marketing team
                                         
                                         that they can switch over to use the data that we have in the warehouse, right?
                                         
                                         Because they're doing some like really helpful things under the hood.
                                         
                                         You know, we could argue about the accuracy of that, but the sort of session level, user
                                         
                                         level, campaign level stuff you get out of the box is like, you know, it's very hard
                                         
    
                                         to hand roll.
                                         
                                         Yeah.
                                         
                                         And I think that's part of the reason why people
                                         
                                         a lot of times will fall back to the platform to get conversion,
                                         
                                         which I think is, okay, like for a user who's just starting out,
                                         
                                         they don't, there's a point in time
                                         
                                         and the life cycle of a business for sure, that's fine.
                                         
                                         You just, you broadly care about how much you're spending,
                                         
    
                                         how much you're converting your business, it's super small.
                                         
                                         But there's a point pretty early on where it's like okay i can't trust ad platforms
                                         
                                         anymore because i don't know if facebook is you know attributing over the last year we'll talk
                                         
                                         more about in a second in our attribution models but like treating over the last year if that user
                                         
                                         ever came to my site it's counting as a conversion right like yep yep you just don't know so yeah
                                         
                                         that's a it's very easy pitfall to fall into? Like, you just don't know. So, yeah, that's a,
                                         
                                         it's a very easy pitfall to fall into
                                         
                                         when you're like, oh, this is too challenging now.
                                         
    
                                         We have the data, but it's too challenging.
                                         
                                         Let's just fall back to the platforms.
                                         
                                         Right, yeah.
                                         
                                         So, go ahead.
                                         
                                         Okay.
                                         
                                         Identity resolution is hard.
                                         
                                         We'll do a separate episode on that.
                                         
                                         By the way, amazing job threading the needle
                                         
    
                                         on not, you know, getting us down a 30-minute rabbit hole there.
                                         
                                         I'm so glad we're here. It only took us two hours to get to the point
                                         
                                         where we have, let's call it a baseline
                                         
                                         data set for attribution. We have joined
                                         
                                         campaign data from a platform
                                         
                                         with some user-level data
                                         
                                         and or perhaps some session-level data.
                                         
                                         And we've done the appropriate level of identity resolution
                                         
    
                                         across those different areas that we talked about,
                                         
                                         appropriate to our cruising altitude
                                         
                                         for the metrics that we want to produce.
                                         
                                         Okay, so now we have a table,
                                         
                                         or maybe more accurately, like a couple of tables, you know, that are, that can be joined to produce different metrics and different reporting for attribution. But now I think we have a bunch of decisions to make,, but this question's for both of you.
                                         
                                         Where do you start once you have this data set?
                                         
                                         Of course, where you want to measure,
                                         
                                         but you mentioned first and last touch.
                                         
    
                                         We haven't even really talked about multi-touch.
                                         
                                         There's a machine learning aspect.
                                         
                                         Actually, maybe we start here.
                                         
                                         Lou, can you give us a breakdown
                                         
                                         of what are attribution models?
                                         
                                         I know that may sound silly,
                                         
                                         but especially for the listeners
                                         
                                         who haven't done a lot of research on this
                                         
    
                                         or haven't built a lot of this,
                                         
                                         what are attribution models?
                                         
                                         Take us from very basic to maybe the more extreme end of
                                         
                                         the spectrum in terms of complexity. Absolutely. Yeah. So just to recap real quickly, attribution,
                                         
                                         it's at the end of the day, you're trying to figure out what channel or channels and my
                                         
                                         marketing ecosystem contributes to the conversion.
                                         
                                         So like in the e-commerce, for example,
                                         
                                         what channels contributed to the sale of a product to a user.
                                         
    
                                         So you converted them for a prospect to an actual customer.
                                         
                                         So establishing that.
                                         
                                         Now let's set up a scenario of we have multiple different channels that we have campaigns going on right now.
                                         
                                         So let's say, for example, we have Google search ads.
                                         
                                         Then we also have Facebook ads.
                                         
                                         And then maybe we're using Klaviyo.
                                         
                                         So a user, setting up a scenario, a user searches for my cool company's product and sees a Google ad.
                                         
                                         Google ads are super prominent these days.
                                         
    
                                         They're somewhat hard not to click.
                                         
                                         So you accidentally or you intentionally click on one, right?
                                         
                                         So now you go to that website and you establish that, okay,
                                         
                                         me as this anonymous user, I've come to this website.
                                         
                                         I didn't click on the Google ad.
                                         
                                         And you're like, ah, crap.
                                         
                                         You go back. I didn't mean to click on that then later you're in facebook and you see for my company and adigan
                                         
                                         for the same campaign that they're running on facebook and you actually click on that well
                                         
    
                                         now you've come to the website again but this time instead of coming from google ads you've come from Facebook ads.
                                         
                                         And you're like, okay, actually, maybe this product is cool.
                                         
                                         I'm going to buy it.
                                         
                                         Right.
                                         
                                         And so you actually do go and buy it.
                                         
                                         Well, now who gets the credit is the ultimate issue.
                                         
                                         That's the, that's in that shell behind, you know, like attribution models.
                                         
                                         So like, you know, to your point, so there's been a conversion now,
                                         
    
                                         but there's been two distinct events on two platforms that have contributed to the sale of this product.
                                         
                                         Yep.
                                         
                                         So it gets the credit.
                                         
                                         That's where attribution, the various attribution.
                                         
                                         Yep.
                                         
                                         I just wanted to say, of course, marketing gets the credit.
                                         
                                         Totally.
                                         
                                         We're that simple.
                                         
    
                                         Yeah, that's cute but i mean think about it like it can get really wild
                                         
                                         if you've got like if you have like a sales team involved too and like we're talking not ecom
                                         
                                         anymore but maybe like sass like well the sales talk to them and the marketing did this and like
                                         
                                         i mean you can yeah yeah wild with an attribution model so this goes back to the people problem i
                                         
                                         alluded to yes again people are defensive about their KPIs when they're tied to their budget.
                                         
                                         Everybody wants credit, yeah.
                                         
                                         Right?
                                         
                                         Yeah.
                                         
    
                                         When they're tied to their budget and their bonus.
                                         
                                         So first touch, what are the various basic levels?
                                         
                                         Yeah, first and last touch, which is sort of the most basic.
                                         
                                         Yeah, and I can unpack those, but yeah, go ahead.
                                         
                                         Yeah, so can you unpack those in the context of the scenario that you just
                                         
                                         the example you just gave yeah exactly so uh last touch is the more common of the two it's
                                         
                                         probably one of the most common but basically in the scenario laid out the user first clicked on
                                         
                                         google ads then last second right before the conversion they clicked on facebook ads so in a last touch
                                         
    
                                         paradigm facebook ads would get 100 of the credit for that conversion for that sale because that was
                                         
                                         the last thing that the user clicked conversely if it was first touch google ads was the first
                                         
                                         thing they clicked on yep so that will get 100 of the credit for the conversion because that was the first thing they clipped on. Yep. So that will get 100% of the credit for the conversion because that was the first thing they clipped on.
                                         
                                         And so just to play that out,
                                         
                                         when we're calculating return on ad spend or ROAS,
                                         
                                         in Last Touch, you would basically say,
                                         
                                         okay, Facebook has a really good ROAS,
                                         
                                         but Google doesn't
                                         
    
                                         because we are running a Last Touch model
                                         
                                         and Facebook's getting 100% of the credit.
                                         
                                         Yeah. So in that particular scenario, just like if you were just doing those
                                         
                                         two things for that, that one user. Yeah, exactly. Facebook would have 100%
                                         
                                         and Google ads would have 0%. Yes. Okay. Now multi-touch.
                                         
                                         Here's a funny question that I've never heard any stats on so you know you know
                                         
                                         like back in the day that like almost everybody did the little question how'd you hear about us
                                         
                                         question right so what do you think the stats are if i asked that user saw google saw facebook
                                         
    
                                         clicked on facebook and he said how'd you hear about us google's a choice facebook's a choice
                                         
                                         and maybe you could be fancy and dynamically
                                         
                                         only populate those two choices.
                                         
                                         What do you think the stats
                                         
                                         are on something like this? Do you think most people are going to
                                         
                                         go with, well, Facebook
                                         
                                         where they won't know? Other.
                                         
                                         Other? Well, that's just out of laziness.
                                         
    
                                         I'm saying you dynamically populate
                                         
                                         Facebook or Google.
                                         
                                         Yes, yes, yes.
                                         
                                         This is maybe a product that we've just
                                         
                                         invented here. This is a new product
                                         
                                         that's a really interesting i bet it would no i'm willing to bet money it would not be
                                         
                                         accurate to what actually happened yeah right right yeah people are notoriously yeah and even
                                         
                                         when they're trying to be like inaccurate about that yeah totally multi-touch so yes yes yes yes is linear attribution is probably one of the
                                         
    
                                         more common of the slight less commons and linear attribution is everything that was touched gets
                                         
                                         equal credit so in this case now with linear attribution google ads would receive 50 and
                                         
                                         facebook ads would receive 50 so the thing I'll add to this is that seems,
                                         
                                         that seems like the way to go on the surface. Like, it's like, oh, well, that's way better.
                                         
                                         Right. And actually I believe that was, I had a conversation with Eric a long time ago about this
                                         
                                         and asked him like, which one do you recommend? I think it was you, Eric. And you're like,
                                         
                                         we recommend, we don't recommend
                                         
                                         linear attribution i'll just throw that in up front because that ultimately leads to infighting
                                         
    
                                         among businesses people yes i do remember this conversation yes yeah right and i was like oh
                                         
                                         that's like as in the other ones don't lead to infighting well bold right like they don't exactly
                                         
                                         like they all do but this one in particular because people start like
                                         
                                         people start thinking they don't get the proper credit in certain scenarios more than ever
                                         
                                         and people start fighting over it and sure enough yeah like i have seen that happen before now where
                                         
                                         it's like even though it seemed good on the surface like at the end of the day like it's
                                         
                                         not such a good idea yeah and it's way
                                         
                                         more complex to calculate too which go ahead yeah yeah well i want to get into that but just a couple
                                         
    
                                         examples i i remember in this conversation and let's take a b2c and a b2b example so
                                         
                                         in b2c let's say you have you know a paid search team let's say you have you have a team that is doing paid social, and let's say you have an email team.
                                         
                                         And so you can imagine that the paid search team,
                                         
                                         let's just imagine a sequence where the paid search team
                                         
                                         is getting a bunch of initial clicks following what you said.
                                         
                                         Maybe paid social is actually driving signups for the newsletter
                                         
                                         or signup for a coupon.
                                         
                                         And then the lifecycle team or the email team
                                         
    
                                         is actually sending messages to this user to stay top of mind.
                                         
                                         And they eventually click on a link in an email and they make a purchase.
                                         
                                         And so the challenge is
                                         
                                         the Google team saying,
                                         
                                         they wouldn't have purchased if they didn't know about us and we're creating all this awareness
                                         
                                         and we gave them the first brand experience.
                                         
                                         And the email team's like, well, we're optimizing to the point where they actually convert.
                                         
                                         And if we weren't doing that, they wouldn't actually make a purchase.
                                         
    
                                         And it's like, well, the challenge is both of those things
                                         
                                         are technically true, but if you have different teams
                                         
                                         optimizing towards different KPIs within that framework
                                         
                                         that's hard on the B2B side, it can be tricky,
                                         
                                         especially when you have a sales-supported motion
                                         
                                         where maybe you are serving a bunch of ads,
                                         
                                         maybe you have a free trial in your product experience
                                         
                                         that's driven by the product team,
                                         
    
                                         but then you have an SDR that reaches out
                                         
                                         and actually books the meeting
                                         
                                         with the salesperson who closes it.
                                         
                                         It's the same scenario, the exact same scenario.
                                         
                                         Let's talk about calculating.
                                         
                                         You said it's really hard to calculate.
                                         
                                         So dig into that a little bit for us.
                                         
                                         Yeah.
                                         
    
                                         So it definitely creates a lot more work and it's a lot easier to get wrong and creates a lot more testing to try and do multi-touch because you're no longer just at a high level you're no longer looking for
                                         
                                         i'll blow this down do you're no longer looking for a min timestamp or a max timestamp right
                                         
                                         effectively that's such a good way to describe how it gets more complex yeah exactly So now you're looking for a distinct set.
                                         
                                         Remember, this is a temporal problem.
                                         
                                         So you're looking for a distinct set of attribution traits over time.
                                         
                                         And then you have to aggregate all that together.
                                         
                                         And this is a temporal problem again, so you're doing that over time.
                                         
                                         So it really just is a lot more complicated to calculate.
                                         
    
                                         And you introduce a lot of decisions. So I just, I hear you talk about that and you say,
                                         
                                         you know, you have to have, you have to pull together a sequence of distinct timestamps
                                         
                                         over some period of time. Right? And so the immediate question that
                                         
                                         comes to my mind is, what period of time, right? Is that a day? Is that an hour? Is that a year?
                                         
                                         Right? And I mean, that, so talk through that a little bit, right? Because that's non-trivial,
                                         
                                         both in terms of, you know, the actual reporting that you're going to produce,
                                         
                                         but also if you think about longer time periods, you could have an immense number
                                         
                                         of touch points, which you're talking about large data volumes, all that.
                                         
    
                                         So walk us through those questions.
                                         
                                         And actually, Lou, walk us through those questions in terms of there are some established time periods in the ad platforms themselves, which can be initially helpful, but generally becomes problematic pretty quickly.
                                         
                                         Yeah.
                                         
                                         So the biggest one, which I believe is you were being kind enough to set up and lead to was the look back window for a particular model, right?
                                         
                                         So it's, as Eric was alluding to okay it's a time
                                         
                                         based problem so how far do you look back so i you know on that conversion from the facebook ad
                                         
                                         to the conversion i converted a specific point in time so how far back do i look to attribute
                                         
                                         because let's say for example that google ad i clicked eight days ago right and then
                                         
    
                                         that facebook ad obviously i clicked when i converted so do you include or exclude that
                                         
                                         in linear do you include or exclude that facebook attribution again like it's you have to answer the question of what's my timeframe because you included
                                         
                                         if it's within the timeframe and you exclude it,
                                         
                                         if it's outside the timeframe.
                                         
                                         And I think I said Facebook there, but I meant Google.
                                         
                                         Sorry.
                                         
                                         Oh yeah.
                                         
                                         Yeah.
                                         
    
                                         And the original,
                                         
                                         my apologies.
                                         
                                         Yep.
                                         
                                         So that's the biggest issue is look back.
                                         
                                         Now, if you think about that in terms of,
                                         
                                         you think about that in terms of linear attribution,
                                         
                                         now you have to figure out what are all distinct points of attribution
                                         
                                         within that time window.
                                         
    
                                         And you have to take a snapshot at each conversion.
                                         
                                         You have to look back that many days, right?
                                         
                                         So it's a private conversion.
                                         
                                         You have to look at the time window for that conversion, right?
                                         
                                         So it becomes computationally pretty complex very quickly.
                                         
                                         Yep.
                                         
                                         And so the ad platforms, like you said,
                                         
                                         you can go in and look at conversion data
                                         
    
                                         in the ad platforms themselves.
                                         
                                         And maybe this is a good opportunity to talk through,
                                         
                                         one, there are sort of built-in look-back windows.
                                         
                                         And then two, why do you eventually not want to rely
                                         
                                         on the conversion data in the ad platform?
                                         
                                         Yes, great call.
                                         
                                         Sorry, you mentioned that.
                                         
                                         I didn't touch on that yet.
                                         
    
                                         I packed a bunch of stuff into that one question.
                                         
                                         And you had trouble going and doing attribution on the original first touch question.
                                         
                                         Yes, there are some more common ones.
                                         
                                         So that 7, 14, and 30 days are the more common ones I believe I've seen.
                                         
                                         I think probably 14 or 15 days are usually more common ones I believe I've seen. I think probably 14 or 15 days.
                                         
                                         They're usually the ones I've seen most people settle on.
                                         
                                         So like last few weeks.
                                         
                                         Yep.
                                         
    
                                         There are benefits and pitfalls to each one of those.
                                         
                                         So the further back you go.
                                         
                                         So it's like, and one caveat, one side note real quickly.
                                         
                                         This was one of the reasons why google universal analytics so google three
                                         
                                         google analytics 360 was terrible at computing is by default it was six months right so it's
                                         
                                         basically covering everything yeah by default well yeah they changed i didn't know that ga4
                                         
                                         yeah so it wasn't yeah wow no yeah it isn't six months i'm pretty sure it was pretty long i think
                                         
                                         ga4 went to 30 days of feminine correctly so it's better but basically you could argue and everyone
                                         
    
                                         has a different opinion on this but like there's a certain point in time where like you should not
                                         
                                         be attributing a like three six nine months back visit to a bot.
                                         
                                         So you have to make that decision and calculate that.
                                         
                                         And I would say that's the decision and those are some of the more common ones.
                                         
                                         And then in e-commerce,
                                         
                                         you can have multiple conversions, right?
                                         
                                         So if you're set to first
                                         
                                         and then got a first impression
                                         
    
                                         or first click from Google,
                                         
                                         then they buy like 10 things in six months
                                         
                                         like you're just racking up on that one google you know impression as far as your like return
                                         
                                         on investment right right yeah yeah that's a great point yeah which i mean actually it's an
                                         
                                         interesting point you may want you may actually want to have that view when you think about
                                         
                                         something if we think about and maybe i'm I'm getting a little ahead here, but
                                         
                                         if you think about answering your question,
                                         
                                         which channels bring in more users
                                         
    
                                         who are high lifetime value users over a longer
                                         
                                         period of time, right? So we're not trying to answer what's driving the conversion, we're just saying
                                         
                                         okay, when someone first experiences our brand, which channels are the ones that tend to
                                         
                                         produce high lifetime value users over time, right? You actually do have to look over a long window.
                                         
                                         You know, that can be problematic in the ad platform itself. But again, I'm probably jumping
                                         
                                         the gun on like metrics and reporting. That can be a challenge in the ad platform itself, right?
                                         
                                         If you're trying to look over a longer period or even get the lifetime value data you know you really have to do that in your own data store yeah for sure and this really
                                         
                                         good i'm glad you brought that up john i did i didn't even highlight that one too and that's
                                         
    
                                         actually another decision point right there is you have the option to include or exclude
                                         
                                         attribution once a conversion has occurred right so like that's another that's yet another decision
                                         
                                         points deciding what are the distinct timestamps right right yeah so like if i convert and then
                                         
                                         you know to john's point again like i convert in a day or two or i buy another product in a day or
                                         
                                         two and i technically have nothing new in there like do my old attribution points count if they're
                                         
                                         still within the window like am i still attributing that to, you know,
                                         
                                         the Google ads and the Facebook ads?
                                         
                                         Or is it once I get a conversion,
                                         
    
                                         now that would be direct because there was nothing new in there.
                                         
                                         So you also have to make that decision when writing your model too.
                                         
                                         Now I will say last point real quickly,
                                         
                                         like I've generally seen it where once a user converts like that,
                                         
                                         you don't attribute things in the past again to that but
                                         
                                         you can right it depends on the business but go ahead john yeah i was yeah that's super interesting
                                         
                                         because i was thinking by channel and and i guess i'm just wondering out loud have either of you
                                         
                                         seen any like robust studies on like multi-touch attribution where somebody's actually trying to study like
                                         
    
                                         consumer behavior and understand like you know per channel or per you know time frame like what
                                         
                                         actually you know makes more of a difference versus an aggregate yeah right yeah an aggregate
                                         
                                         yeah not yeah so i just don't know if there's any models out there that claimed of like we
                                         
                                         studied consumer behavior and this model is like, you know,
                                         
                                         more accurate because of that.
                                         
                                         Yeah.
                                         
                                         I don't know.
                                         
                                         I don't know,
                                         
    
                                         but I feel like letter stacks in a pretty good position to study that if
                                         
                                         they can get access, you know,
                                         
                                         like work with enough of their customers to look at that data.
                                         
                                         Like you could, you'd probably can start figuring that out.
                                         
                                         Yeah.
                                         
                                         You know, got 20, 30, 50,
                                         
                                         a hundred customers on board to study that. That'd be interesting. Yeah. You know, got 20, 30, 50, 100 customers on board
                                         
                                         to study that.
                                         
    
                                         That'd be interesting.
                                         
                                         Yeah, that is really interesting.
                                         
                                         We're just coming up
                                         
                                         with product ideas,
                                         
                                         you know,
                                         
                                         all over the place here.
                                         
                                         I will say it does get,
                                         
                                         also get interesting,
                                         
    
                                         you know,
                                         
                                         when, you know,
                                         
                                         generally if it's worth it
                                         
                                         to understand that
                                         
                                         for a company
                                         
                                         on a fairly detailed level,
                                         
                                         they tend to be
                                         
                                         a larger company
                                         
    
                                         and they have a lot of channels
                                         
                                         and then you introduce a lot of channels.
                                         
                                         And then you introduce a host of other challenges around things like television advertising.
                                         
                                         Which you start layering in those components
                                         
                                         and the situation gets even more complex.
                                         
                                         Well then at that point,
                                         
                                         from a consumer behavior standpoint, do you care?
                                         
                                         Or do you just go into ML and AI stuff?
                                         
    
                                         Yes, okay, that's a great segue or actually i mean you know before you go
                                         
                                         off that yeah real quickly i mean now you're getting into which is a good point right so it's
                                         
                                         like online versus offline attribution is would be the official term and you're right like there's
                                         
                                         uh marketing mixed modeling in them and it tries to model for some of that.
                                         
                                         That's a whole other paradigm
                                         
                                         which companies potentially
                                         
                                         try and get into too if they do print
                                         
                                         ad, customer walk-ins
                                         
    
                                         at their physical stores, they try
                                         
                                         to keep track.
                                         
                                         That's a whole different, that has a whole
                                         
                                         another layer of complexity to this
                                         
                                         whole paradigm too.
                                         
                                         We talked about linear multi-touch attribution.
                                         
                                         Let's quickly talk about weighted multi-touch and then,
                                         
                                         and then dig into like machine learning and,
                                         
    
                                         you know,
                                         
                                         more probabilistic components.
                                         
                                         Yeah.
                                         
                                         So,
                                         
                                         so weighted,
                                         
                                         weighted BU generally what I've seen is you would weight the
                                         
                                         more recent ones in terms of percentage with you'd give them a higher percentage so the last click
                                         
                                         wouldn't get 100 but it would also get a larger percentage a higher weighted percentage than you
                                         
    
                                         know so like facebook ads would potentially get a higher weighted percentage than you know so like facebook ads would potentially get
                                         
                                         a higher weighted percentage than google ads in our going back to our example again and in a
                                         
                                         weighted percentage and you know that becomes challenging once again if like you have two three
                                         
                                         four five six different channels or campaigns right like yep well like people will get angry if they were earlier in the cycle
                                         
                                         but got less credit so i think that's again just highlight like one of the challenges of some of
                                         
                                         these more exotic shall we say calculations in addition to the fact that like that's yet again
                                         
                                         that's even more complex to calculate because now like how do you choose percentages for each
                                         
                                         point in time right like it's you have
                                         
    
                                         to come up with some sort of mathematical model or buy one yep okay one brief side note i did
                                         
                                         think of another url tip which is actually i mean i guess tip isn't necessarily we talked
                                         
                                         because i think it's the most you you know, it's on the surface
                                         
                                         just the most straightforward use case
                                         
                                         because you have to put a URL
                                         
                                         and the parameters into the ad platform
                                         
                                         so that you can track that when someone clicks an ad.
                                         
                                         But there's also a huge benefit
                                         
    
                                         to being disciplined about doing that
                                         
                                         on all of your own channels, right?
                                         
                                         So the two main ones are email or SMS,
                                         
                                         where you're sending a message to a user through your own platform.
                                         
                                         Now, a lot of those tools have some level of attribution with it,
                                         
                                         but if you want to do multi-touch attribution
                                         
                                         or explore machine learning,
                                         
                                         having the same join key makes things way easier.
                                         
    
                                         And then another big one that is so easy to miss
                                         
                                         is things that are in-app type things as well,
                                         
                                         where you may consider that an experiment
                                         
                                         or a touchpoint or something like that you can include as well.
                                         
                                         That's another thing.
                                         
                                         Like a push notification.
                                         
                                         Sure, yeah, something that's going out from the app itself
                                         
                                         or maybe it's some section of the app that's promotional
                                         
    
                                         or whatever that is.
                                         
                                         Ubiquitous tagging, I guess, would be the concept there.
                                         
                                         Yeah, that's actually a really good point.
                                         
                                         I didn't touch on that at all.
                                         
                                         Fantastic point.
                                         
                                         When I was talking earlier, the IMR mentioned doing it unique to like ad campaign, ad set level.
                                         
                                         I guess I briefly touched on it with the campaign.
                                         
                                         That would be the campaign level, right?
                                         
    
                                         Pretty much.
                                         
                                         Yes.
                                         
                                         Okay.
                                         
                                         I have a unified campaign, like new product X that I want to advertise across both retention.
                                         
                                         So email, SMS, et cetera, and new customer acquisition.
                                         
                                         So like to prospects.
                                         
                                         Yeah.
                                         
                                         You might want to, you're right.
                                         
    
                                         You might want to attract that as a single identifier across multiple channels exactly and then join that later um yep affiliate as well
                                         
                                         can be helpful too right because again like it kind of goes back if you think about a campaign
                                         
                                         as abstracted across it as agnostic to channel having the hash join key is really helpful
                                         
                                         but it's easy to forget
                                         
                                         and it takes a lot of discipline
                                         
                                         but if you are disciplined about it
                                         
                                         it can be really helpful.
                                         
                                         And the challenge too
                                         
    
                                         is in a smaller
                                         
                                         scenario like Lou was saying
                                         
                                         you probably just start with the platforms
                                         
                                         but then you get to a larger scenario
                                         
                                         then you have more teams
                                         
                                         so now you're trying to coordinate the stuff across teams.
                                         
                                         You're not just standardizing one team
                                         
                                         like your team that's working on email.
                                         
    
                                         You're standardizing a bunch of teams
                                         
                                         to all do it the same way.
                                         
                                         That in and of itself is a challenge.
                                         
                                         I'll tell you one thing that I've done in the past
                                         
                                         that's really, I mean,
                                         
                                         and this is probably a good insight
                                         
                                         into me as a person
                                         
                                         and probably actually both of you as well
                                         
    
                                         because I know both of you pretty well.
                                         
                                         But events are actually pretty tricky
                                         
                                         because it is actually something that happens
                                         
                                         at a distinct point in time
                                         
                                         but is very manual.
                                         
                                         It's essentially manual data
                                         
                                         even if you digitally scan someone's badge
                                         
                                         or whatever it is.
                                         
    
                                         I mean, they put their name in an iPad or whatever.
                                         
                                         But I've actually generated synthetic events
                                         
                                         to send into the data store
                                         
                                         that has a tagged link with a hash
                                         
                                         because it's so much easier
                                         
                                         to represent that as a timestamped event, right?
                                         
                                         Because if you think about what we just talked about with multi-touch attribution, it could be they click on an ad, maybe they get an email, maybe they come to an event.
                                         
                                         And so synthetic events can actually be really useful for representing
                                         
    
                                         things that are really hard to timestamp or offline data
                                         
                                         that doesn't come in a format that is easy to timestamp.
                                         
                                         And so, yeah, that's another.
                                         
                                         That's the beauty of QR codes that everybody discovered in 2020, right?
                                         
                                         That is true. And it's so funny. Yeah, QR codes. QR codes can also have, you know, hashes and URL parameters added to them.
                                         
                                         Okay, that concludes part two of our deep dive on attribution with Lou Dawson of Momentum Consulting.
                                         
                                         Tune in next week for the third and final installment where we go deeper into multi-touch attribution, talk about reporting and measurement, and of course, discuss AI's impact on attribution.
                                         
                                         The Data Stack Show is brought to you by Rudderstack, the warehouse-native
                                         
    
                                         customer data platform. Rudderstack is purpose-built to help data teams turn
                                         
                                         customer data into competitive advantage. Learn more at rudderstack.com.
                                         
