The Data Stack Show - 05: The Convergence of Data Engineering and Marketing with Nic Discepoli of Ruby

Starting point is 00:00:00 Welcome back to the Data Stack Show. Today we have a really interesting guest lined up for you. We usually talk with people who are on the development side of the Data Stack, actually doing the engineering work. But today we're going to talk with Nick DiCepoli, who is at Ruby. They build financial products. We will ask Nick all about what they're building. But they're an early stage startup and really had a strong need for data in multiple parts of the organization. But as we'll hear, a lot of the engineering needed

Starting point is 00:00:41 to be focused on building the core product. And that presented kind of a problem in terms of building out the stack. And Nick really picked that up and did some amazing work as a non-developer to build out those pipelines. Costas, as an engineer, I'm interested in what you want to hear from Nick on, you know, as someone who's built out a data pipeline and some infrastructure around that, who, you know, he's not a data engineer. So what are you interested in from your perspective

Starting point is 00:01:08 as a developer? Yeah, Eric, just as you said, I mean, I think it's very interesting to hear, like to see like the other side of data and how they're used in the company today. I mean, so far in all the episodes that we had, we discussed mainly with the engineering side of things. So it will be super interesting to see how the rest of the company, like other functions like marketing in this case,

Starting point is 00:01:32 they can utilize this data and actually try to extract value out of it. So that's one thing that I think is going to be very interesting. And it's going to become even more interesting as we are going to have like a second part of this podcast where we're also going to interview the technical side of the company. Another thing that I think is very important is, I mean, traditionally we're talking with data engineers and data engineering as a function in general, like in bigger companies, right? It's like usually it's one thing to go and talk to the squares out there of the world where they have like a big army of data engineers

Starting point is 00:02:11 supporting like the rest of the company in their data needs. But it's super interesting to see how this can happen in a startup because as time passes, I think utilizing early on your data is becoming more and more important for the success and the survival of the company.

Starting point is 00:02:31 So I think that these are like the two main, let's say, tracks of questions that I would like to have and learn more and what makes me really excited about the chat that we are going to have today. Great. Well, let's dive in and talk with Nick. Nick, welcome to the Data Stack Show. Yeah, thank you so much for having me. This is part one of a two-part series,

Starting point is 00:02:53 so I just have to say so that we don't get in trouble, kind of be careful what you say, because when we talk with the CTO, you're going to be on the record, and he'll be able to call you on anything that doesn't line up with reality. Yeah. I mean, I'll, I'll take that into consideration. You know me, I'm, I'm usually lying about our data tech stack in a public way. Well, tell us first, what is, what is Ruby? What are you building? Where are you at as a company,

Starting point is 00:03:23 just stage wise, we know your early stage, but just a little more detail there. And then would love just a quick summary of, you know, what did you start doing there? And then how did you end up building data infrastructure? Yeah, I'll just kind of start from the beginning, because how I got to where I am and the origins of the company are intertwined. So I actually started working on this project with our CEO, Troy Woolley, before it was even an idea. I was looking for an internship in college, actually, and found this idea and product consulting firm called Flow Thinkery here in Nashville, Tennessee. And they were approached by First Horizon Bank to help them figure out what the

Starting point is 00:04:07 future of digital innovation looked like for them. So at the time, there was a lot of money going into financial technology apps and banks were trying to figure out how to compete. So we came up with, I think, 50 different ideas and it slowly whittled them down until they became Ruby. So starting out, I was just a research analyst. I was looking at industry trends and conducting like preliminary user research about, you know, what problem is out there that we feel like we could do a good job of solving. And we landed on this idea of helping people through some of their most challenging financial moments. A lot of times that tends to be around end of life or when you're older and your parents are getting older as well.

Starting point is 00:04:56 They might need some help dealing with financial matters, this whole passing of the torch. And we wanted to step in and help with that with a digital solution that could just give people guidance and help them get their heads around this situation. So from there, we found that there's a real need in the medical space. Most of the times when people are dealing with these issues, you know, they're dealing with some sort of medical event that happened. So between observing that, and then the COVID-19 pandemic hitting, we just really felt that we could best serve our audience by building a medical product that helps them reduce their medical bills by teaching them about negotiation and, and looking up the right codes and checking all of that kind of stuff. So that's where we are right now. We are early stage.

Starting point is 00:05:50 We're working on our first mobile app for that, but we have released a series of web apps in the past. And my role has been identifying tech integrations on the marketing and operations side that will work for the entire business and making sure that those integrate with our stack and then writing event code to track users and then analyzing that on the back end and reporting that to stakeholders. Got it. So I want to get a rundown of the stack, but first I would love to hear about the experience. So, you know, the,

Starting point is 00:06:27 the normal story with, with all startups is you start doing some things, you start building some things and you're basically, you know, sort of doing all your reporting and spreadsheets because you're so early. Talk about the experience of sort of realizing that what you were doing wasn't sustainable and that you needed to actually like build some tooling to automate some of the tracking so that you could like, what were the needs there? Were you trying to understand users or the marketing funnel or what were the real drivers that made you step back and say like, we're going to need to actually build out some infrastructure right yeah so um one of the the things that we had in the back of our mind from working with the bank was they had talked about this problem of having a 360 degree view of the customer um which is really hard when you run a

Starting point is 00:07:20 regional bank because you know you have not only all of the digital channels to keep up with people on, but you have people coming into your bank branch and, and dealing with you that way. So one of the things that they've always struggled with was, you know, when somebody walks in and they want to be able to know, you know, have they been a customer for 10 years or, or is this their, their first day? You know, how many accounts do they have? That kind of thing. But that's really hard to do with the legacy banking tech,

Starting point is 00:07:49 which is just sort of the nature of banking systems. So we wanted from the start to be sort of an example where we could build from the ground up and have this 360-degree view always in mind. And the thought was that if we took the time to set that up on the front end, it would not be as hard to retrofit something to a stack that wasn't designed to work that way. So I think the inciting moment where we really started to kick it into high gear was we had a little bit of marketing material out there. We were promoting some posts and running a few campaigns

Starting point is 00:08:30 and our conversions were not matching what we were seeing in our email database. So we were like, what's the deal? Why are we getting double and triple counts of conversions in Facebook or Google when we have, you know, half as many of those in our email database? And I think that's when we actually reached out to Yield and started talking with you, Eric, about how we can start making sure that we have quality data through every part

Starting point is 00:09:01 of the stack. Yeah, that's actually a good disclosure to make. So prior to joining Rudderstack, I had a hand in helping Nick build some of this stuff out at a consultancy called Yield, which is actually where we found Rudderstack as well. Unfortunately, we did not find Rudderstack before we started working with Ruby, but we built some great stuff as well. Give us a quick rundown of your tool set. We think of it in terms of collection, validation,

Starting point is 00:09:40 transformation, routing, warehouse, and then all the cloud tools that you're using. How about collection? Where are you ingesting data sort of cloud tools that you're using so how about collection where do you where are you ingesting data what sdks you're using yeah um so primarily for our web apps it's um analytics js managed through segment um and then uh that's for all of the front end code and that sits on top of our marketing websites. We use Google Tag Manager, actually. It has a really robust triggering system, so I take advantage of that to trigger different events,

Starting point is 00:10:14 and then I just write the code that fires it off so that segment can read it. And then our developers work on our back end events. They use the.NET SDK. And then we will be implementing with Xamarin. That's our mobile tool of choice. So we're working on that integration right now. Got it.

Starting point is 00:10:37 And just a question on Google Tag Manager. So Google Tag Manager is one of those tools that is sort of like, in the market is loved and hated. You know, for many reasons, and there are lots of blog posts out there. But I just love your perspective on I mean, I would say like, coming, you know, being in the same position as you i don't come from the background of being a developer so google tag manager allows me to do a lot of things that i just wouldn't be able to do without engineering resources but what um i'd love to know like what are the biggest benefits of it for you and what are the things that have been most challenging about it for you what are your love and hate google tag manager well i think what it enables you to do is um

Starting point is 00:11:26 insert code when you don't necessarily have access to the the underlying html so what i mean by that is you know we use wordpress and unbounce for our marketing website and uh landing pages um and you know i don't want to dive into the, to the, the HTML of our WordPress site every time I want to just fire a new event. And it also saves us from having to deal with like custom JavaScript, JavaScript triggers. So essentially, you know, we implement this one tag in the, in the header of our, of our site, and then I can take advantage of Google Tag Manager's really robust triggering system. I mean, it's got a few simple triggers that it uses,

Starting point is 00:12:11 but by filtering things, you can get really, really dynamic with how you're firing events. One of the other power features is variables. I didn't know about this for a long time. It was actually hard coding a lot of values, but you can use custom JavaScript to grab things off the page dynamically and insert those variables into your scripts so you don't have to rewrite things every time. So, you know, the classic use case for me is just pulling the UTM parameters from the URL string, which contains all of our attribution information,

Starting point is 00:12:48 and then putting that into whatever event we want to fire. So if it's a signup event, that's super important. And what's nice about Google Tag Manager is that when you're using the data layer effectively, it's really reliable and it doesn't break as much. And the debugger is really good for testing new code out. So that helps me as a non-developer write code that's going to save developers time because they're not having to work on it,

Starting point is 00:13:21 but also dynamically create new funnels and tracking systems for those pretty much as fast as you can write the code. Nick, I have a question. I mean, you're describing a pretty robust and quite standard, let's say, setup for how you can track and maintain infrastructure for tracking and analyzing user events. Can you share some more information with us about quality of the data?

Starting point is 00:13:55 How do you think this is affected by the processes and the tools that you have? How they help you as actually the person who is going to consume the data. So the quality of this data is even more important for you. I know that's a big thing also for engineers, but engineers have to do it mainly because people like you, they want to make sure that whatever results you get from the data are accurate and on time. So what's your experience so far with that? What are some common issues that you see, some common problems with quality,

Starting point is 00:14:26 and how you deal with them? Yeah, great question. As someone who's implementing the code that runs a lot of the front-end events and someone who's using them, I've definitely been there where it's like, I don't know whether I'm seeing this number right now because I wrote the query wrong

Starting point is 00:14:43 or because I wrote the original code wrong. Sometimes it's a little bit of both. But I mean, this is actually something that again, Eric really helped me with, which was just like making sure that naming convention was really consistent. And once you picked a certain event to represent something just sticking with it um you know for a while i would just like make up event names like for each unique thing that i was doing um which made it you know hard to uh uh to find those because the way that our database works is each event gets its own table so that just just means that you're, you're joining more tables together, which there's more cause for, for error there. So just, I think a lot of it is,

Starting point is 00:15:30 is sitting down with your team and outlining. These are the events we're going to attract. This is what they mean. Here are the properties underneath those. This is what they're going to be called. And they're not going to, you know, it's not,

Starting point is 00:15:42 it's going to be first underscore name, not first name. And making sure that that kind of naming convention is strict across every piece of technology that you integrate with. So what I'm talking about, like, yes, it applies to our warehouse, but, you know, the events going to our mixed panel instance are structured the same way.

Starting point is 00:16:02 So that way, you know, our product team uses mixed panel to analyze events there. instance, are structured the same way. So that way, you know, our product team uses Mixpanel to analyze events there. They don't, I can give them the same implementation sheet, and they can know exactly what to query on without, you know, having to say, okay, well, it's this here, but it's, you know, it's something else in another location. So I think it's really about self-discipline, at least on a small team, you know, and like I said, you want to map out that, that plan beforehand so that you know what you're doing going, going forward. And then anytime you make, you know, a change, noting that on your implementation guide or your

Starting point is 00:16:43 implementation plan sheet. Because obviously things are going to change once you get in there and you realize, you know, you can't fire something the way you wanted to or whatever. Does that answer your question? Yeah. Yeah. So if I understand correctly, you use some kind of like Google sheet where you track like the schema of the events and you communicate, use this as a tool to communicate with the rest of the stakeholders inside the company, right? Yeah, we did use Google Sheet for a long time. I actually use a Notion database. We're big fans of Notion at our company, and it's something everybody uses, so they all have access to it. But what's nice about that is you can relate different tables to each other. So I have a properties table and an

Starting point is 00:17:26 events table, and I just can relate the properties to the events, which keeps things consistent from a property standpoint, you know, since there's only one unique property in the table. Am I getting too in the weeds here um oh it's okay it's okay it's quite interesting yeah so so that that's just like another like tip to um because the sheet you know we would we had a property value under each event and so again things there was more opportunity for error in like misnaming things or misdescribing things. So having this Notion database has helped with that. The other thing that I use pretty frequently, especially for non-technical users, is I just

Starting point is 00:18:14 built a diagram. It's like a schema where it says data starts here and then goes here. And then it gets ETL here, and then it shows up in our warehouse. And it's just super colorful. So my CEO will use that all the time to talk to investors and say, this is what we've built.

Starting point is 00:18:36 So that way, they're not just looking at a data schema and it's like information overload. So that was helpful for me to just get my head around how data moves through our stack, but also, you know, it's proven to be very helpful for anybody who is not familiar with it, to just get a high level view of what's going on. Yeah, that's great. Actually, it feels like I'm talking with a data architect, to be honest, and not someone who's coming from marketing, which is very, very interesting. Is there any kind of common issues with the

Starting point is 00:19:05 quality of data that you have encountered so far? The biggest thing I've had issues with is our UTM parameters. We use a sheet that dynamically creates new UTM values and a unique ID for each link. Sometimes that can get messed up in the process. And I think it's a link, it's a break in the chain because at some point we just have to copy and paste those links into the ad platforms. So sometimes people copy and paste the wrong link. And so, you know, we'll have a campaign that's misattributed. But we've tried to be really strict about that. We have an outside consultant that works with us to launch all of our campaigns.

Starting point is 00:19:57 So we've had to sort of develop a system where I can give him the links in a super organized way. And he's actually started using scripts to dynamically update our ads. So that's helped a lot. We've gotten down to about like, I think it's like 1% of page views have some sort of attribution error of all of our paid page views.

Starting point is 00:20:17 But that was a big issue in the past. And yeah, it's just about being diligent and really making sure you're, you're, you're checking every, every box and crossing your I's and dotting your T's twice. Nick, I'd be interested. It's in part just because I have some inside knowledge about how this works, but I think it would be interesting on the attribution question, like explain why that has been so important to you as a company. And then how have you like,

Starting point is 00:20:47 just walk us through the flow of you tag a link with UTM parameters that goes into Facebook. That's very, you know, everyone sees that, but then just like follow those UTM parameters through the stack. And then how do you use them, you know, in like any analytics tools, it'd be cool to just get a walkthrough of that. Yeah, sure. So like you said, we like any analytics tools, it'd be cool to just get a walkthrough of that. Yeah, sure. So like you said, we create the link, and then it has a unique ID in there. So one issue we had was we were trying to stuff all of the every bit of information that we wanted to know about a particular campaign or a particular ad into the link, which created a problem because

Starting point is 00:21:24 we had these huge long strings with all of these codes that nobody really understood what they meant. We had keys to read them, but you often had to go back and look at the key and say, okay, ASG32, that's this campaign, this audience group. Instead, we have some minimal information in there like source and medium.

Starting point is 00:21:46 But this unique ID is stored in a Google spreadsheet, actually. So this is what allows us to dynamically create new links. But on the back end, we can later join events based on that unique ID and pull in all of the metadata from our Google Sheet using the BigQuery Google Sheet integration. But anyway, to go back to step-by-step, so we tag the link, somebody clicks on it. Usually that fires off a page view event.

Starting point is 00:22:21 Well, it always fires off a page view event on our landing page. And that page view event has the attribution information in there. And then usually we're trying to get them to click a button or fill out a form. So whichever it is, the conversion event will also have the attribution information in it. So we can see, you know, how many people land on the page, how many people converted. And then, um, depending on whether or not it's just a marketing campaign for a new product, uh, the, it might end there, you know, they'll get an email, um, which our events can also fire off emails from, uh, the, the tool we use is autopilot, um, for, for, uh, email marketing campaigns. Um, but, uh, if it's a signup campaign,

Starting point is 00:23:03 then they'll go to our app and they'll be prompted to put in all of their information there. And then we'll see the attribution information forwarded from the landing page on the signup event as well. So everything pre-conversion should have attribution information in it. And it gets routed through segment

Starting point is 00:23:24 out to our data warehouse, which is BigQuery. And like I said, it gets slotted into various tables there and I can pick up the events and query the databases and join them together to sort of build a funnel analysis. But we also have funnel analysis in Mixpanel as well, but that's mainly for product stuff. Very cool. And can you explain just a little bit more about the Google sheet is connected

Starting point is 00:23:52 to BigQuery. So you're actually joining like the tag links spreadsheet with other, like with actual event data that includes the parameters from those events, like the page views and other things? Right. Since we store the unique ID as its own UTM parameter, it comes through as a field in our warehouse on the event level.

Starting point is 00:24:14 And then because that maps to a unique value in this sheet, we can join together on that value and bring in all of the the metadata from the other sheet so that's everything from the audience to the the ad image size the like all of that kind of stuff any like sort of creative or strategic marketing information that you would want to know and that's been super helpful my CEO is famous for asking me, okay, here's a trend, but how does that change when we only look at this one audience or we only look at this one source?

Starting point is 00:24:53 So that makes it really easy to filter on any of those values because all I have to do is just add a new column to the data set and then run a filter on it or group by that field field and we're good. That's amazing. Are you looking at this in spreadsheets or like what are you, where are you actually

Starting point is 00:25:15 providing that analysis? Yeah. So typically I'm building out data sets in BigQuery. You know, it sort of has the most power. And especially if we're looking at all anonymous page views, that's really the only way you're going to effectively look at 70, 100,000 plus rows of data. But a lot of times what I do is I leave it somewhat unaggregated and then I pull it into Data Studio,

Starting point is 00:25:41 which is Google's free visualization software, so that I can run other analysis on it and give people charts to work with and filters and dynamic date controls. Like I said, our CEO, he's very analytical. He always wants to dive in and look at the data himself, but obviously, he doesn't have the time or the interest in going into SQL and building out these data sets.

Starting point is 00:26:06 So this has been a good way for me to give him the flexibility he needs to ask his own questions of the data, as well as anybody else in the company who can click a dropdown. So that's been great. We also sent it straight back to Google Sheets. BigQuery has an integration where it can run a query basically any time you click a button, and it'll refresh a sheet.

Starting point is 00:26:35 So if someone is more comfortable working in a spreadsheet format, they can use that as well. There are some issues with that, though, since I think there's only i think there's a 10 000 row limit uh so to get some of those queries to work i have to aggregate them in a way that you know it'll be under under 10 000 rows i have to say not bad for going from being a research analyst to uh to building out like pretty stinking advanced uh infrastructure that can push data to all these different places so that's yeah it's pretty it took a it took a little bit of time and i had

Starting point is 00:27:11 a lot of help um but uh but yeah it's it's been really fun to learn i i always tell people it's like it's a very interesting puzzle to solve interested i'm interested in the, I have a lot of questions, but one challenge that I know a lot of companies face, especially in the early stage, which is usually a technical issue related to like doing a lot of acquisition work and then needing to have, having people sign up for some early version of the product, there's usually some sort of challenge in connecting the dots between, you know, I have a landing page where people are like, signing up and putting in their email, and then they go into some sort of login thing. Sometimes that's third party, and then they're experiencing some early version of the product. Did you face any challenges around putting those pieces together at Ruby, especially

Starting point is 00:28:07 in the early phases when you were sort of in MVP mode with the product? Oh, yeah, for sure. Yeah, I mean, we ran into the same traps as everyone. I mean, we would have a Zapier connection that took form data from one landing page to a different page. I mean, at one point, we actually did not have a self-service sign-up, so we had to capture form information, use a Zapier connection, send it to Salesforce so that our customer support team could create an account manually for them.

Starting point is 00:28:41 But thankfully, that was short-lived. That's true startup style. Yeah. Yeah. So I mean, one of the things that has been helpful is, is, you know, segment is used for all of these analytic events, but I also use them to trigger other actions I mentioned earlier. You know, our signup event is something that we measure, but it's also used to put them into the signup customer segment in autopilot, which triggers a series of onboarding emails. So that kind of stuff has helped a lot. We still, we still use some of those stuff, some of that stuff from time to time, like if there's a time crunch. But for the most part,

Starting point is 00:29:25 I try to figure out a way to write a bit of analytics JS code that can handle what we want to do because it is the most reliable way to do it and it's also the most portable since we're routing all of our information through segment right now. We can, we can then take that event and send it anywhere we want.

Starting point is 00:29:48 And we've tried to build our stack in a way that we're not leveraging too many tools that don't integrate with either segment or some other tool that we're using. It happens from time to time, but that's like, that's one of the ways that I evaluate tools is, is this going to integrate with, with things that we already have?

Starting point is 00:30:06 And if not, are we willing to deal with the pain of having to deal with manual uploads and downloads and that kind of thing of CSVs? I mean, your product works in a very interesting industry. I mean, you are talking about financial data also, the healthcare industry. So these are like traditionally two industries that are very, very serious around privacy and how you manage the data and all that stuff. So this is a big discussion. I mean, we all know about GDPR and all these initiatives in general.

Starting point is 00:30:44 But what's your experience with that so far? And how this affects the way that you design your data stack? And yeah, what's your experience? Yeah, this will probably be a better question for Sam to answer. He's specifically spent a lot of time on this. I will just say that is something that is paramount to us and something that we don't take lightly. So all of our marketing data is typically attribution information. And did they sign up for this or did they not?

Starting point is 00:31:16 And we're always analyzing things based on either their anonymous ID or their user ID. So I make sure not to expose any sort of email information unless obviously it's going to the email database. But typically, we try to keep that as anonymous as possible. Yeah, I think a big part of this is also the stack that you have. And I think that the approach that you have of keeping it simple and having the minimum possible number of tools and actually choosing the

Starting point is 00:31:45 right tools for that job is also what is quite important. So based on all the tools that you have mentioned so far, they're all privacy-conscious tools, like Segment, for example, when you're working on BigQuery. So yeah, I'm pretty sure we'll have a lot to discuss with your CTO about that. But yeah, it's very interesting. Nick, I'm interested in just from the sense you're a consumer of the data as well. Has there been like a report or a set of reports

Starting point is 00:32:18 that you really feel like moved the needle for the company in a big way, you know, where, cause I think, you know, just having done a lot of this work over the years, it cost us and I both know one of the challenges is you work, you sort of work on the data, massage the data, work on the data. Right. And then you start working on reporting and it's the same thing, right? Like you have to keep working on the reporting to sort of get it to a point

Starting point is 00:32:43 where it's manageable, it's understandable. You know the questions you want to answer. So just being an early stage startup and being someone who's both like built the data flows and then is consuming them. Is there a report that you and other people in the company sort of produced that was like, okay, this is like really helpful or, you know, game changing or significant? There is.

Starting point is 00:33:06 So we, I think I do sit in an interesting spot in that I'm consuming and writing the code to track events. And that has taught me a lot about like how to structure my events on the front end and then how to structure my data sets in a way that allows people to ask multiple questions from it. So it's very easy to write some SQL that gives you one number that's just a roll-up of whatever it is,

Starting point is 00:33:35 but that's not really that useful to anybody if they can't also ask questions from it. So that's changed how I build my data sets in general. But the one like really powerful set that we've been using is actually around the idea of trying to understand each anonymous user's individual customer acquisition costs. So typically, when you think of customer acquisition costs, you're, you're taking the sum of of the amount of money you spent on a campaign and dividing it by the number of conversions that you got. Right. So it's sort of an, it's an aggregate function, but our, my CEO said, well,

Starting point is 00:34:18 you know, if we think about it, like we can, we can assign costs to each person because we know we have all of the data on spend know we have all the data on spend and we have all the data on conversion. So the way that we've done that is we sum all of the spend and conversions in a given day by every unique ad ID that comes through our system. So we then essentially assign that back to a unique table with user anonymous IDs on it. on this day. And then if they clicked on any other ads, it'll also add that additional cost to the table or to that, that their row in the table. And because we use that dynamic ID that I talked about earlier with the, in the Google sheet, we can also pull in, okay,

Starting point is 00:35:21 what was their, their first touch and their last touch? What are all of the attribution data about that? So I can see everybody who's first touch, what campaign it was, what audience they were in, and then their last touch, the same thing. A lot of times those are the same, but sometimes people interact with multiple campaigns and those are different. And we can see the timestamp difference between those values. So we can see how long it took for them to convert, how many page views it took for them to convert. And that data set has been particularly useful and particularly flexible, like I said, because of the way I built it.

Starting point is 00:36:07 So one example of that is I had been using this to identify what audiences were the most efficient for us to acquire new leads and new users in. But there came a time where we wanted to start testing landing pages against each other. So I was able to very quickly layer in the landing page name and variation so that we could test different variations against each other. So yeah. And that's awesome. That's crazy. So are you, so I guess that produces actually a more accurate view of the total cost per acquisition because if someone interacts with multiple campaigns, you're actually calculating the cost of that even though they didn't convert on earlier campaigns that they may have interacted with.

Starting point is 00:36:56 Yeah, exactly. We could layer in each successive event and all of the metadata associated with that. But it gets to be a pretty unwieldy table at that point. Most people are only coming in with one or two events. So first touch and last touch is sufficient. Anything in between is just sort of extra detail sure sure all right well um last couple questions for you what what current projects are you working on related to the data stack and then what kind of plans do you have for the future as you look at you know your growth and the direction that you're taking as a company yeah so the the current uh push is for our, our mobile tool that we're building our

Starting point is 00:37:47 mobile app. This is the first app that our company has built. So the entire company is learning about how to do that, how to market that most efficient, efficiently. And, you know, I'm currently working on layering in that information to the stack that we already have built and all of the data sets that we've already built. We're also building out a new website in Webflow. Like I said, previously, we were using WordPress and Unbounce, but I'll have to basically switch out all of our information from those services to to this one um future plans um i think one of the things i i've wanted to do for a while is um build a unique uh user lookup

Starting point is 00:38:38 not because i want to look at individual uh people but um because I want to essentially use that as a guide to bring together all of our data into one unique view where we can type in an anonymous ID and see all of the campaigns that someone's come through and their individual customer acquisition costs. I think that would be a great way to bring all of our data sets together. And again, use that for further analysis. The thing I found is like, when you start building some of these data sets, if you do it in a way that's flexible, it allows much quicker analysis. Like you don't have to go back to the drawing board to analyze a new funnel if you've connected

Starting point is 00:39:27 everything in a way and documented all of those connections in a way that is flexible. Awesome. Costas, any other questions for Nick before we jump off? Not really. I think it was very interesting to go through how things have been consumed from the marketing side of things. I think there are some stuff that we can discuss also from the data engineering aspect

Starting point is 00:39:59 and get a little bit more of technical details on how things are like implemented there. But I think it was a great story and a great journey of how like whatever data engineers do and whatever data have been collected like can actually be used and what are like some really great

Starting point is 00:40:20 actual best practices around doing that. Nick, thank you for joining us on the show today. And thank you for telling your story. Again, pretty incredible what you've accomplished there. And we wish you and Ruby great success. Yeah, thank you again for having me. It was a lot of fun. That was a great discussion.

Starting point is 00:40:42 It was our first discussion with someone who is actually a consumer of the data inside the company. I think, Eric, it was also quite amazing because we actually, with Nick, we had the person who is like two different roles in one. I mean, you could hear like a marketeer, but at the same time speaking like a data engineer or a data architect. That was like quite amazing. And I think this has a lot to do with being part of a startup where you have to be scrappy and you have to play many different roles there.

Starting point is 00:41:15 But I think one of the things that I found extremely interesting is how important it is for all the stakeholders inside the company to be aligned with how to work with the data and how this affects a lot of the quality. And of course it's like something that's probably easier to be done in a smaller company where you have no smaller teams, people having multiple roles. But I think that the same, the same principles apply also to bigger companies.

Starting point is 00:41:47 And I'm pretty sure that as we continue and we talk with more people and also when we are going to be talking with the CTO of the company, of Ruby, we will see how important this is in terms of delivering the right data and also high-quality data to drive your decisions. What do you think? Yeah, I think it was really interesting. I mean, I've had a chance to work with Nick on some of that stuff. And I can, yeah, I can say that, you know, he's a pretty amazing individual in that he can play both roles really well. I mean, he writes SQL, he writes JavaScript, you know, and that's pretty rare for someone who's consuming the data in a fairly early stage startup. I will say, though,

Starting point is 00:42:32 I think that people with that skill set are going to become more and more common just because they're in such high demand. Like you said, at a company with an actual data engineering function, you have people dedicated to the role. But in the early stage of a company, the product team is just focused on getting customer feedback and building features that are actually going to drive adoption. And it's, I mean, as important as data is, the reality is it's really hard when you're trying to achieve product market fit to slow down and build like a comprehensive data pipeline. So I think people like Nick, who can sort of understand the data needs of the organization, and then do a lot of things on the front end to collect and route the data is going to be more common. I also think it's interesting, you know, the need for alignment that you said around data governance or data quality, tools like Avvo and Iteratively are popping up to help solve that problem

Starting point is 00:43:31 because I think most companies are doing that in a Google Sheet or some sort of other shared resource. So just a lot of interesting things to consider for anyone working on data in an organization and makes me really interested to hear the perspective of the CTO in part two next time. Absolutely. I'm also looking forward to hear the perspective of the CTO. So let's see what happens on the next episode. Great. Thanks again for joining us.

CODACE Plant Stand

The Data Stack Show - 05: The Convergence of Data Engineering and Marketing with Nic Discepoli of Ruby

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

CODACE Plant Stand

The Data Stack Show - 05: The Convergence of Data Engineering and Marketing with Nic Discepoli of Ruby

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.