The Data Stack Show - 45: Open Source and Attribution with Ophir Prusak of Codesmith

Episode Date: July 21, 2021

Highlights from today's conversation include:Ophir's decision to switch from software engineering to marketing and riding the startup train (2:39)Open sourcing in the world of software (5:55)How open ...source has changed Ophir's life as a marketeer working at startups (10:28)Chartio's sunsetting drove Ophir to search for a data tooling replacement (27:27)Discussing trends in adoption of tools for small scale and large scale companies (35:01)Data challenges related to attribution--how wrong do you want to be?  (44:07)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.

Transcript
Discussion (0)
Starting point is 00:00:00 Welcome to the Data Stack Show. Each week we explore the world of data by talking to the people shaping its future. You'll learn about new data technology and trends and how data teams and processes are run at top companies. The Data Stack Show is brought to you by Rudderstack, the CDP for developers. You can learn more at rudderstack.com. Welcome back to the show. We have another interesting guest this week, Ophir Prusak.
Starting point is 00:00:33 And he started his career as an engineer, but has worked in scrappy marketing roles doing data-driven growth at early stage startups. And I love that role because I have played it myself and I know how much you have to really have a hybrid approach, both in terms of actually doing the tactics of marketing, but getting very technical with the data. So I think he'll have a really great perspective. I think the thing that I really want to talk to Ophir about is he has a lot of experience with attribution. So if we have time in the show, I would like him to give us just a little primer on marketing
Starting point is 00:01:13 attribution because I think about working with engineering and just the discussions around attribution and so many things that I know now that I wish I could have articulated to the engineering teams I was collaborating with. So that is my burning question. Kostas, what do you want to talk to Ophir about? I know that Ophir has done a lot of research around data-related tools, and it would be great to hear about this experience and the reason that he did that, especially because he's a marketer right like his background is in marketing of course he has started as an engineer but i really want to see like the perspective of someone who is not let's say involved in maintaining or having to set up and
Starting point is 00:02:01 all that stuff like the actual software but he's the main user and see like how what's his experience around that and how he sees the landscape today and what changes he has in like happening these past couple of years great well let's dive in and talk to Ophir let's do it Ophir thank you so much for joining us on the show. In prep, we had way too many topics to cover that were interesting. So why don't we just start at the beginning where we always start with love, a brief background on you, your history working with data, and then what you're up to today in your day job. Sure. Thanks a lot, Eric. So I got into computers at relatively young age. I got a computer when I was 13, studied computer science in college and started my career as a software engineer. I
Starting point is 00:02:52 worked for a few years and then I made the interesting switch to marketing back in 2005. I simply wanted something more creative. So I've always considered myself kind of more of a technical oriented marketer and looked at marketing problems from more of an analytical perspective. And then around nine years ago, I joined a small startup as head of marketing and really have been riding the startup train ever since. So for the past, I want to say nine years, really been working as kind of head of marketing or initial marketing hire at a lot of startups. And as part of that process, it's always been about, well, how do we make sure that we have in place the infrastructure to be able to track what's going on, measure what's going on, and ultimately become data-driven? Especially for me as a computer science person, it's always been important to be data-driven. And along the way, just have had to solve so many problems about, well, how do I make this organization data-driven? It's obviously
Starting point is 00:03:54 different for different types of companies and learn the different options in terms of tools out there. Currently, I am head of growth at Codesmith, which is a software engineering boot camp. And what's led me more recently to really get more into this is that we have been using a tool called Chart.io, which is going to be sunset in a few months. So while looking for replacement, I simply found there is just so many new tools and options out there. And it's a little overwhelming. So I created the website datatoolreview.com. And it's just been kind of a fun ride. Very, very cool.
Starting point is 00:04:33 And so much to talk about. And I cannot wait to talk about your search for data tooling, just because it is a crazy world. And I think hearing your perspective as someone who's recently gone through an evaluation process will be really cool. One thing I want to hit before that, though, is something we chatted about before we hit record. And that is a trend that you've seen around the progression of software being open sourced in different parts of the tool set. And I think your perspective is really interesting as a marketer doing growth at early
Starting point is 00:05:15 stage startups, especially in technical marketing, or some people might call it growth hacking, et cetera. It's all about being scrappy and using tools to get the data in and have visibility into what your experiments are doing. And you sort of have to play the role almost of data engineer and marketing ops and the one leading marketing because you have to be small and scrappy. And I just think that's such a wonderful experience. But getting back to the topic, open sourcing in the world of software, tell us about the trend. I think this is such an interesting conversation.
Starting point is 00:05:56 Sure. So when I am thinking back to 2005, actually, when I got into marketing, and at the time, the only open source tools that I remember were definitely kind of for developers. And it's always been, I feel, the case where developers help each other. And it's really the more people using the same tool, the better it is for everyone. But on the marketing side, and even more so on the sales side, I want to say it's somewhat of a zero-sum game, I like to call it, where if one person is getting a sale, that means another company isn't making the sale. So there's always going to feel more competition and a lot less sharing of, well, we should all
Starting point is 00:06:36 use the same tool, but there has been more of a looking for the differentiators and what's different and what I can do to get ahead opposed to the other person. But what we've seen, or what I think I've seen at least over the years is more and more tools, which ultimately are becoming open source. And I think the natural progression of purely developer tools to kind of analyst tools or data stack tools makes a lot of sense for two reasons. First of all, I think more people want to help each other in terms of having great options. And I think looking back, I remember Redash and Metabase being two of the first tools that I remember
Starting point is 00:07:17 being open source, which were actually really good in terms of solutions. And at one company, I even used Redash for a bit. It's great that it's open source and anybody can use it. And what I think we've seen is that over time, there's kind of a shift more and more into the world of, let's say, whether that be marketing or I don't know any sales tools that are right now open source, but really there's a shift more of let's try to help each other. And the other aspect I think also is in terms of purely a distribution model from a sales perspective. If I'm going up against some established players in terms of whatever software is I'm trying to sell, having an open source solution allows, I think, a way to get into
Starting point is 00:08:01 the market much easier than just being another competitor. So in some ways, if there's, let's say, maybe Looker as an example, which I think is an example of a very established open source tool, and there is a quote unquote Looker open source alternative, I know that came out recently, again, in order to kind of to gain more momentum. And it just makes so much sense from the people trying to create these new tools, both from a, it's a win-win for the developers or for the analysts, and also for companies trying to kind of get into the market. It's just a great way for people to, they can start small. Usually it's a service which you can pay a little, there's a cloud version.
Starting point is 00:08:40 Once you get a little bigger, you can use the open source version. And once you get really big, you'll usually use the enterprise version of it. So it just makes sense to me that there's this kind of movement. And I think over time, we're going to see more and more tools having the open source model because it's kind of one of those win-win situations. It just makes so much more sense. Ofer started the conversation and he mentioned Chart.io and that the tool is like right now in like a period where it's going to sunset
Starting point is 00:09:07 pretty soon. So I think that another benefit of having open source something out there, it's also what happens when the team behind or the company behind the product decides to take a different course, right? Companies have the opportunity to use the open source, self-host it, and even keep maintaining it. There are a couple of cases like this, especially like in the database space, because especially database are like pretty hard pieces of software to build. But very good example of this is like RethinkDB, for example.
Starting point is 00:09:40 That's exactly what happened. The command decided that they cannot move forward. They open sourced the code, and the community decided to continue supporting CouchDB, the same thing. So that's another thing. And I think, Eric, you have also seen, especially large enterprises, which sounds a little bit controversial, to actually be interested in a company that has an open source project because they know that they can pay for the product and there will always be continuity in the product if something goes wrong with the company. So I think that's another important reason why open source is important. But Ophira, you as a marketeer, right? And you've been like in this space for quite a while now. How do you feel that your life as a marketer has changed because
Starting point is 00:10:27 of open source? That's an excellent question. So I want to say as a marketeer, or I should say as a marketeer who has been working at startups. So for me, it's not just a question of being a marketeer, but also having to be scrappy and find solutions where either me or me and kind of one other developer can really lift this off the ground. I want to say a lot of the more advanced tools really have two options. You can either pay for an enterprise tool. And I've seen a lot of tools which do some really cool things, but the entry level for paying for them is, you know, it's annual contracts only starting at 10K a year and above, which is just out of the budget for a lot of the smaller startups. So your option either to go with a very simplistic tool, there's a lot of things which might cost you let's say $99 a month or something, but again, you're limited. And so in many ways, going down the open source route is kind of your
Starting point is 00:11:29 only solution of getting a more flexible or a more mature product without having a pain in the arm and a leg. And because it's also open source, having a model where there is a cloud version, which is relatively inexpensive, makes a lot of sense. They're not going to charge a ton of money if there's an open source version because simply they can't. So I do think it's definitely, and I think Redash and Metabase are two tools that I've used in the past, which were good examples of this. I know even though Redash was acquired, I think, by Databricks, it's still, it's a great solution where you can kind of start easy, you can play around with it yourself. And also, as you had mentioned, Kostas, there's none of that worry about, well, what happens if they raise their prices or something? I can always use the open source version. So I definitely think
Starting point is 00:12:13 it's helped me a lot in terms of seeing the different things out there. And definitely for me, I do see more and more over time seeing open source, especially I want to say open source projects, which have a cloud version. I think another good example is preset and superset Apache. We say, you know, that's an open source project. Superset, which is a relatively new service is basically preset as a service, but it gives you that, I want to say upgrade path of, you can play around with it, start relatively easy to see if it works for you. And then if you do grow, you can always have the choice either to pay a little more or to host it yourself.
Starting point is 00:12:50 So I think that's another thing in terms of new tools being out there that you always kind of have the option. You simply have a lot more options, I think, when you go down the open source route. Yeah, makes sense. I think you put it very well. I think one of the main values
Starting point is 00:13:04 of open source in general is choice. I think you put it very well. I think one of the main values of open source in general is choice, either as you described it, or as the choice as a developer from the other side, to have access to the source code and even extend it if you want. What's a big part of the value that open source brings is this kind of flexibility and the choice and options. Eric, you as a marketeer also, do you remember what was the first open source project that you used? That's a great question. Let me see here.
Starting point is 00:13:39 Any sort of technology would probably be WordPress, but that's one of the most pervasive open source projects in the world and drives a huge amount of the internet. In the data space, actually, it had to have been Analytics.js from Segment way back in the day when they open sourced it and were doing data integrations. And they obviously built a very large, successful company in a really short amount of time. But there was certainly a period there where Analytics.js was a very interesting, useful, technical tool for data integrations back when they were segment.io.
Starting point is 00:14:22 That's interesting. So it was a data-related related product which i think makes sense i think in general marketing is let's say somehow the function that drives a lot of innovation or it's like let's say the first function to adopt and try things right like even with new things like reverse etl i would say that like the most common use case around it is marketing and then might be like sales ops, right? So that's something very beneficial that marketing is bringing to the industry. And I don't think we recognize it enough. If it may cost us, I do want to add a little point to what you're saying.
Starting point is 00:15:02 Going back to what I was saying beforehand, I think marketing has to try new things. I think the whole point of marketing, it's so crowded today that if you're just doing what everybody else is doing, you're not going to get awesome results. And I think the only way is to try something new. And very often to try something new, you do need those new technologies, which is the opposite, I want to say, for developers, where if the more people using technology, the more stable it is, the less risk it is. So definitely marketing is always going to be at the forefront of let's try something new because it's different. And definitely whatever new tools are out there, we're going to try. Yeah, absolutely. But Ofer, I have a question.
Starting point is 00:15:43 We are talking about open source and we are talking about marketing being like in the, let's say, the forefront of like trying new things and all that stuff. Are there any marketing platforms that are open source right now? Like something, the open source alternative of MailChimp or the open source alternative of Pardo. Does this thing even exist? The short answer is yes, it does exist. But the fact that I don't remember the name of the platform just goes to show you how, unfortunately, I think for an open source project to really succeed, you need a lot of people whose day-to-day job depends on that technology who are themselves highly technical. So if you're a
Starting point is 00:16:28 developer, then yeah, it's fine if I do some of my development on an open source project. If I am a data analyst, but I know Python or I know software development, then sure, if I work on that project, it's fine. If I'm a marketer, then I'm not going to be contributing to that project. So I think the big problem with that, and I forget the name of it, because I looked into it and it just was not very mature and it didn't seem to really be taking off, it's because of exactly that problem.
Starting point is 00:16:56 I don't believe we'll ever see a truly open source solution, which is going to replace a HubSpot or a Salesforce simply because there are just not enough people whose day-to-day job is working on that. Yeah, I think Pymcore is one of the big ones that comes to mind. And I know that off the top of my head just because I recently did some research on open source, like marketing and sales type SaaS tools. And there are a handful of other ones out there,
Starting point is 00:17:25 but I think you're right over here. If you think about one way that you frame this, and I really liked the way that you described the arc of what's happened is a lot of open source stuff was initially tooling for developers, right? So you think about things like Git and the whole ecosystem around that. And it was really developers working together to figure out, okay, we're all doing very
Starting point is 00:17:50 similar things here. How can we create some sort of standard and frameworks and then tools that result from that, that benefit everyone and make everyone's job easier so that we can focus on stuff that matters more. And then you start to see that happening in the data space. And I think it makes a lot of sense in the data space because by nature, data requires a lot of integrations. There's a big ecosystem around data, which I think lends itself to open source. And then also, especially when it comes to analytics, everyone's reporting is a little bit different depending on
Starting point is 00:18:22 the business, but people are trying to build the same fundamental reports to understand how their business is operating. And I think those conditions create a healthy environment for open source. Whereas if you think about an email marketing tool, it just doesn't seem to me that there's ever going to be a commercial or an open source tool that achieves a level of commercial success that a Salesforce or Marketo or other sort of traditional marketing and sales SaaS tools. But what do you think of your, I mean, you seem to think that that's never going to happen. Yeah.
Starting point is 00:18:57 I think it kind of goes back to what I was saying, where in certain fields, differentiation is a competitive advantage, which I think sales and marketing are those fields while in something like software development or even your data stack. I don't believe that's a competitive advantage in terms of the way to develop people working on it, actually see it from their perspective is the more people working on it, the better. So I think that's why we're not going to see something like sales. Well, let me rephrase that. There is SugarCRM and there have been people who have tried this, but I don't think it'll ever be able to compete with kind of the elephants of the industry
Starting point is 00:19:38 the way that other tools have been able to compete. Yeah, it's really interesting. And I think your point about marketers and salespeople not being able to contribute back to the core product, I think, is a really defining characteristic there. Okay, one more question on open source. And I'm going to direct this both a few towards you and Costas, because Costas, I think you have some strong opinions here about some patterns that we've seen. But when you have an open source tool that also commercializes at scale, it can kind of become controversial. So the one or a couple examples that come to mind of late, you have Elasticsearch and then MongoDB changed their licensing, both hugely, huge adoption among the products, but also experienced some turbulent times trying to navigate being a large commercial
Starting point is 00:20:33 entity or connected to a large commercial entity and also open source. So it makes total sense of the small end, like you said, especially in the startup world, but it's not always an easy path to navigate when you're at scale. So what say you, Ophir and Kostas, on the challenges of being open source as a huge commercial enterprise? Yeah, that's a great question. And coming more from just the startup world, I want to say of marketing and sales and product development in general, I personally do think having, it depends, I want to say to your point, the whole MongoDB and changing of different models.
Starting point is 00:21:10 I do believe having a, what I want to call a core product, which is free and kind of will always be free, but differentiating, I want to say on the functionality side. I mean, I think that's totally legit. I mean, I think it makes sense. Will there be problems sometimes? Sure. But if the core product is always free and you can do whatever you want with it, and then adding on top of it, and this is what I almost always see the case is things like single sign-on or granular access levels to who can do what, as well as scalability to some extent. That makes sense. Ultimately, the people who are working on the project, they do want to be able to
Starting point is 00:21:51 make money. And I think it's fine as long as the core product is still open source. And I do feel it's kind of the best of both worlds. Nobody's forcing you to use the kind of enterprise version, you could figure things out yourself. And I've even seen some open source projects which actually try to replicate some of the stuff that kind of the enterprises are doing or kind of the enterprise version. So I still think it's a win-win to have open source, even for enterprises and for companies
Starting point is 00:22:20 to ultimately take funding and try to contribute the product. Yeah, I agree. Eric is a bit of a complex situation, to be honest. It heavily depends. Yeah, of course. That's why I asked you in Ophir. Yeah.
Starting point is 00:22:38 I mean, it heavily depends on the product itself and also on the monetization path that the company wants to have. If you think, for example, about both the cases that you mentioned, on the product itself and also on the monetization path that the company wants to have. If you think, for example, about both the cases that you mentioned, Elastic and MongoDB, the problem these two companies had, it's not that you or me, we would start a company and we use MongoDB as the backend of our system and use the free one.
Starting point is 00:23:02 That wasn't the problem that they had. The problem that they had was that amazon could come and be like okay now i'm giving elastic search as a service right and that creates a conflict with the business model that elastic has so it's a bit of like let's say a game and the battle between like the the big companies and the bigger companies in a way i don't think that like any startup right now starting an open sourcing something they are going to have to face that problem on the other hand we have cases of companies like databricks for example or confluent that both of them have open source projects and the core project is open source. They are offered as a service from big cloud providers,
Starting point is 00:23:50 but at the same time, they also manage to be successful, right? Databricks is super successful. Okay, they haven't gone public yet, but they are on a track to do that pretty soon. And one of the main rivals of companies like Snowflake, Confluence just IPO'd. So it depends. I mean, I don't think at the end that this kind of behaviors
Starting point is 00:24:16 that the big cloud providers might have are going to hurt the companies that much. Maybe they have to change their business models a little bit or their licenses, which is fair. I mean, it's not going to hurt the companies that much. Maybe they have to change their business models a little bit or their licenses, which is fair. I mean, it's not going to hurt you that you are going to use the product for your backend at the end. So yeah, I think that things are a little bit better than we tend to think about that.
Starting point is 00:24:38 And don't forget that open source is literally like the core of the internet and all this digital revolution, right? Like Linux is open source. Without that, we wouldn't have like servers, right? Yeah, nothing. I just remembered. Okay, that's a little bit irrelevant, but I find it funny.
Starting point is 00:25:00 So Linus Torvalds, the guy who started Linux, right? He's famous for being very aggressive and almost, let's say, a little bit abusive towards developers. He's very opinionated and very protective of his child. And I was reading lately that at some point he decided to go and do therapy for that reason. And now he has changed his mind completely and tries to have more empathy. Well, that's great.
Starting point is 00:25:33 By the way, the guy's the inventor of Linux and the main maintainer of the kernel of Linux and also of Git, right? I mean, we're talking about a person who has contributed a lot using open source. That is unbelievable. I want to add one other thing, Kostas, you raise an excellent example of kind of different licensing that in a lot of different worlds, and even in the software world, to some example,
Starting point is 00:25:58 when I was working at different SaaS companies, we had a different pricing model for if you use a product in-house or you're an agency and you were basically reselling it in some way. The same thing is applicable for media rights. If you're in the media world, if you're selling a picture, it's different pricing where if you use it yourself or you're reselling it. So I think it's totally fair to say, hey, just because it's open source doesn't mean you can do anything you want with it. And all companies are equal in that sense. So I think it's fair to say if I'm creating technology and use it in-house, then the rules A apply to you. But if you want to actually resell it, then it's different rules. And I think that's totally legit.
Starting point is 00:26:39 I was going to say, Kostas, that was a very, I think, thoughtful and balanced response to a complex question, maybe a little bit more so than the most impassioned commenters on hacker news when some sort of open source, when some sort of open source news like Elastic or Mongo hits the press. Okay. And one thing I did want to mention, there's a really interesting site out there called opensource.builders. So opensource.builders. And you can go see alternatives to tons of different types of tools from CRMs and email tools and analytics and you name it.
Starting point is 00:27:22 That came to mind, Ophir, when you were talking about open source email marketing tools. Let's switch gears a little bit. Ophir, you recently with the announcement about Chartio sunsetting, went on a search for data tooling to sort of rebuild your go-to stack and ultimately concluded that it's kind of complex and you ended up putting a website together that collected a lot of your findings. Can you tell us what were the requirements around your search? And then you have a very fresh set of eyes looking at all sorts of components of the data stack from pipelining type solutions all the way through to BI solutions. And we'd just love to know what did you learn in that process? Yeah, no, thanks a lot, Eric. So yeah, I've been a user of Chart.io since
Starting point is 00:28:11 literally they launched and have just been a huge fan of the product, not to mention seeing how it grew over the years. And I think for me, it really solved quite a few different parts of the puzzle that I needed. Its ability to pull data from different places and blending the data or federated queries, as it's called, and being also very easily within kind of a nice GUI interface to be able to really kind of do the key part of ETL, not through queries per se, but ultimately give you a really nice solution. And it's funny because I'm a SQL guy and I love writing SQL, but what I found was actually having kind of a gooey front end to doing the SQL manipulation just gave me the ability to make modifications really quickly and easily.
Starting point is 00:29:05 And even though Chart.io is a closed tool, I still was able to do probably 80% of what I needed to do, like as long as the data was in some SQL database. So for me, it was in terms of stuff I needed to do was, first of all, make sure the data is in SQL database. Chart.io does also pull from Google Analytics in terms of one of the very few kind of non-SQL based as well as CSV files and Google Sheets. And so the only other part I really need to take care of was to get data into a SQL database. We use HubSpot. So for us, we were using Segment too. And I played around. There's a lot of tools which do it. I was using Segment though, just to pull the data into a SQL database. But really Chart.io, I want to say,
Starting point is 00:29:49 gave me the ability to do most of what I needed to do. And I do want to bring up one other thing that I found is that within the realm of kind of BI tools and Chart.io definitely is in that realm, I find there's two different types of problems that people are looking to solve. One is I want to say the day-to-day, week-to-week reporting. And that is, I just need to see, okay, how many people have signed up to my service? Where are they in the pipeline? What's happening with all the internal database I have? And it's really just to get an idea of what's going on and still be able to slice and dice the data to some extent. I want to do certain segments or certain timeframes, but it is about that kind of ongoing reporting. And then there's the discovery slash ad hoc. Well, I have a
Starting point is 00:30:35 question that nobody's asked before, or I've never asked analytical people before, and I want to answer that question. So Chart.io is definitely in the first group where it's great for reporting, but I do agree very often people have come to me and say, well, I have a question. So Chart.io is definitely in the first group where it's great for reporting, but I do agree very often people have come to me and say, well, I have a question. And I'd kind of just do a one-off report for that simply to answer it because it doesn't give the analysts kind of truly an easy way, I want to say, to just go in and ask any questions. So just a little more about the kind of tool and what I was looking for. I was definitely looking for a solution for more of that reporting side. And what I found when I started looking for other
Starting point is 00:31:08 tools is, first of all, I didn't find any other tool. I want to say that was kind of a one-to-one solution that I could easily do. And I really probably would need to now put together a few different things. I did see some specific tools. I even talked to Holistics, I remember, in one of your other episodes, which actually looked very close to what Chart.io does. But I was a little concerned also to go down the, well, this is another tool, which is not open source. So I was definitely looking more towards, can we solve this with just open source tools? For what it's worth, I'm still looking for a perfect solution. And I haven't decided yet. But definitely, to your point, it's just really taken off, I want to say the past couple years, I haven't really looked at new tools. So recently, because I've been using
Starting point is 00:31:54 chart IO, this the whole explosion of like, reverse ETL tools. And just I want to say, just a lot of tools, which each do a really, really good job of solving one specific part of the problem. I mean, there's like some tools that just pull data from whatever data source you want and put it into a flat file. There's some tools which just take care of pulling the data from whatever sources you want, like a Fivetran or whatever it be, and push it to whatever other solution you want. And then there's like CDPs like Redder Stack and reverse detail. I was saying, I feel like there's so many tools which are really trying to solve one specific part of the problem, which is great when you have a slightly bigger team.
Starting point is 00:32:36 But as a team of one, for me specifically, it's been a little more challenging because I've realized I need now to, okay, maybe I need now to actually look at a few different tools and see how we can put them all together. So I think we mentioned this also before the recording, it's a little more challenging when you need to kind of do everything open source because it is a little more complex and it does require a little more work to kind of get things up and running. So I would say you have a lot more choices than before
Starting point is 00:33:03 and a lot more specialization in specific tools. But in some ways, we don't have that kind of one thing does everything solution that a lot of smaller companies need in order to get started. So that's kind of where we are today. No, that makes total sense. And it is really interesting to think about the data space and Costas would love your thoughts on this as well. As tools have progressed, it makes a lot of sense that there has been some specialization, right? Where data pipelines are really hard. Pulling data in is a non-trivial problem and there's always new sources and everything to maintain. And then doing analytics really well is also hard in its own right. And so it makes sense that there's specialization, but to your point, when you're a really small data and analytics organization,
Starting point is 00:33:57 having tools that can accomplish multiple things in one system is way more convenient potentially because you're not dealing with multiple vendors multiple processes etc etc but costas what do you think about that i mean do you see any trends i mean of course specialization with different pipelines etc but would love your thoughts on that as well yeah that's uh that's very good question first of all i have to say to offer that he made me really happy with what he said about Chart.io. I'm a very good friend with Dave, the CEO of the company, and I'm pretty sure that he's going to be very happy to hear what you said about the product. He's one of the most obsessed, in a good way, product-driven person that I have met. And he put a lot of energy to build this product.
Starting point is 00:34:47 And it's good to hear that what his vision was, at least for the experience outside of the company, he managed to deliver it at the end. So that was great. And I'm sure you will make Dave really happy if he listens to the episode. Now, going back to your question, Eric, you know, they say about software that a common pattern to build a new product in the new company is go to start with the small
Starting point is 00:35:17 enterprises or medium enterprises, iterate the product on them and then go to enterprise, right? That's like a very common pattern of innovating on an existing problem creating something that it's better as an experience use the smaller companies as a vessel let's say to figure out what's the right way of solving the problem today and then at some point go and sell it to the enterprise and increase your margins and all that stuff. Now, that might be true for the SaaS space, right? Where we are building a CRM or a marketing platform. Now, in the data space, I think that what is going to happen is the opposite.
Starting point is 00:36:03 And there is a reason behind that. And the reason is that building technology around data is really, really hard. You have to scale from day one. And it's very, very expensive. Going out there and building a new database system, for example, it's crazy hard. And the big companies have both the scale and the money to fund this product. So my feeling is that in data, we are going to see the opposite, actually. We are going to see products that are going to be built primarily for the enterprises, and then they are going to scale down in a way to the smaller companies. And I think we see that happening in a way, especially with companies like Databricks and Confluent.
Starting point is 00:36:52 They started, first of all, with an on-prem solution, the traditional enterprise sales going there. These things that a small company would never pick up the phone and call them for a quote for the price, right? And then you see them going down markets instead of going up market. And they open something like a product as a SaaS solution. And then you have tiers that they are consumption-based. And it's very easy for someone, even as a small company, to go and afford and use the solution there is one exception there and this is snowflake which started from like smaller
Starting point is 00:37:30 companies and then started penetrating the larger enterprises but i think that this is a kind of pattern that we are going to see a lot happening in the data space. At least that's my opinion. Interesting thought, Kostas. I actually want to say something which is, I've seen something which I understand where you're coming from, though I want to say in terms of just product growth in general, what I've seen often the case where
Starting point is 00:38:01 what's the difference between products that really, at least in the SaaS world where what's the difference between products that really, at least in the SaaS world, what's the difference between products that really, really kind of take on and go viral versus products that just never make it to that kind of super large adoption? Is it something which I can just go in and within 30 seconds, start playing around with it. I don't need to talk to anybody, especially if you're talking to developers or analysts who don't want to talk to a salesperson. I have to say, if I need to actually talk to somebody to even play around with a product, then I don't think it's going to kind of really gain huge adoption. And again, that's at least in the SaaS world for developers and for analysts. And what I've seen is if you start by creating a product, solving for enterprise, you're not thinking about the self-service model first and foremost.
Starting point is 00:38:57 And I'm seeing a lot of, at least when I've talked to a lot of companies, they actually do what you're saying, Kostas, that they solve kind of problems for the enterprises, but they don't have an actual demo you can play around with. It's like, oh, I need to set it up for you. So I actually want to say the, at least from a product perspective, if you're not able to accommodate the self-service model, I think you're going to have problems with growth, even if you are an enterprise product,
Starting point is 00:39:22 because it's the individual developer, the individual analyst who wants to go in and play around with it and doesn't want it, doesn't have the resources to install it themselves, but wants to play around with the cloud version that ultimately is what causes a lot of products ultimately to go viral. A hundred percent. I'm 100% with you on that. And I think that that's where open source is also extremely important. Like the companies that I mentioned, like Confluent and Databricks, they started first of all as an open source project.
Starting point is 00:39:56 And because there are tools that are used by developers, like Databricks is not something that, I don't know, like a marketeer or a salesperson will take, although the output of the work done there might be used by them. Developers are fine to try solutions that are not that easy to use, right? Like they can take it, set it up, play around, see how it works, make it work. All that are like part of like the developer experience, which is different from a SaaS user. And yeah, I totally agree with you, this kind of experience is important. The difference is that it is a little bit different with data related products, especially
Starting point is 00:40:37 infrastructure products, because these are going to be used and maintained by developers, right by engineers. So the experience there is a little bit different. So that's at least my experience so far. And I think that that's another also added value of open source at the end on the side of the business models and how you can use it to actually build a company and the product experience. Yeah, it's interesting. There's also another way that innovation at the enterprise impacts technology. And that is when a problem is solved in the enterprise.
Starting point is 00:41:15 And then either the pattern for the solution or the actual solution itself is published, oftentimes it's open source. So if you think about, I mean, Netflix is a classic example of this. Really interesting technologies have come out of Netflix in ways that they've solved data infrastructure problems at scale, and they've open sourced some of those. And I think it's also interesting to think about how, per what you said, Costas, like open source being a way that you drive adoption to the bottom of the market, which is also your point of view, that some of the patterns for that can actually emerge
Starting point is 00:41:50 from the way that enterprises are solving data infrastructure problems. So what a fascinating ecosystem. Yeah, and also, Eric, adding to the open source, because you mentioned Netflix. For Netflix, open source is also a tool to recruit the best possible talent and that's another again value because if you think about it like netflix
Starting point is 00:42:13 is not a software company right it's not their primary product their primary product is like content they are content creators but they operate at such a huge scale where they need the best people out there to build and maintain their infrastructure. And open source gives you a path to go and get these people, which is another benefit of open source. Absolutely. I feel like I'm evangelizing open source a lot today. It's the open source show. Yes, very eloquent evangelism. Well, we're getting close to time here.
Starting point is 00:42:51 There's one more subject I wanted to cover. And Ophir, you have a lot of experience doing attribution in marketing. And attribution in marketing is a tricky thing, right? It's basically trying to answer the question in any number of ways. I try something and I get these results. And then how can I tie the results back to this specific effort, right? A classic example is paid advertising, right? When I spend money on paid advertising, I want to see whatever it is, how many customers actually came from that.
Starting point is 00:43:30 And attribution is a classic challenge when it comes to data for a number of reasons. Paid advertising is just the tip of the iceberg there. But I would love to know, I'm thinking about especially our listeners who are on the technical side, probably work with a marketing team or have projects related to marketing, but maybe who just aren't as familiar with attribution on a tactical day-to-day basis. Since you've played the role of marketing and marketing ops and data engineering, could you just give us a breakdown? Tell us, give your basic definition of attribution and then what are the data challenges related to different types of attributes? Sure. I'll start with just a quote that I heard that I simply love about attribution is that
Starting point is 00:44:13 attribution is simply a question of how wrong do you want to be? And the reason of that, I actually learned about attribution the hard way almost a decade ago, I want to say, when I was running a campaign and there was, at the time, this was like, I wasn't doing multi-touch attribution. It was very straightforward, what Google Analytics was telling me. And there was one campaign, we were spending a lot of money and we were seeing activity, but we were just not seeing conversions. And ultimately we made the decision just to drop it. And two months later, our sales dropped drastically. And that was the only explanation. And looking back, it was clear
Starting point is 00:44:49 that it was simply the multi-touch attribution part of it. But I think attribution at the end of the day comes, and there's an analogy used very often for soccer players. When you make a goal, you can say it's the person who technically hit the ball into the net, which made the goal. But if you look at who gets credit for making that goal, it's not just the person who actually kicked it in, it's the whole team and everything coming up. But it's really hard to say, well, what percentage of each one of those people ultimately played a role. Another way to look at it is that if I would have taken out one of those people from the series, or if you look at all in the world of marketing, if somebody had, let's say, five different touch points, if I were to take one of them out, what can I say about
Starting point is 00:45:41 how much it would have impacted the ultimately attribution? How much would it impacted the ultimately attribution, how much would impact the actual revenue? So I think a few things which are, I think, important to understand kind of for people is you're never going to get 100% attribution, first of all. In other words, you're never going to know for sure exactly where somebody came from, or even if you're doing multi-touch attribution, you're never going to know for sure the impact of each touch point. I want to say attribution is something which is really
Starting point is 00:46:09 directional. If you think about things also like marketing in general, if you're tracking people on your website and a lot of people are going to have blockers for being able to track, I think attribution is good to understand not how many people are coming from exact numbers of this ad versus that ad, but maybe when I compare this channel to that channel, what do I see? When I compare first touch, last touch, what do I see? So I think it's really kind of a directional. It's a great way to do it. And the other thing I want to say, which is something that I'm seeing more and more people do recently in terms of attribution is something called incrementality, which is, you
Starting point is 00:46:51 know, similar to just split testing or AB testing in general, but instead of having version A versus version B of a specific type of copy, you basically kind of have a control group, which doesn't get the ad at all. And that way you can kind of say, well, all things being equal, if I didn't serve a specific ad up to a group of people, how much did it impact the percentage of those people, which ultimately made a purchase or made a conversion? So a few things, ultimately attribution is hard. I will say there are definitely companies which are doing a decent job and if anything better than nothing, especially today in the AI world and machine learning, there's a lot of companies which are able to put together the data using things like linear regression
Starting point is 00:47:38 and able to give you more than just what a tool like Google Analytics is going to give you. And I would even say if you're spending a million dollars or more on paid advertising a year, you should definitely look into kind of a dedicated solution and not just depend, not don't do it yourself with just SQL and try to figure this out or Google Analytics. You definitely want a dedicated attribution solution. If you're doing less than a million dollars a year, I feel like you're going to get some benefit, but it's just not going to be as much of an impact. And also there's a big question of how many channels. What we found is once you go beyond just Facebook and Google and you start doing
Starting point is 00:48:19 things like maybe TV advertising or OTT advertising, or you have a lot of coupon codes or whatever, then using a third-party tool definitely helps. Sure. Yeah. One thing that's, I think back on my background in marketing, and I don't know if a lot of people explain it this way, but marketing and engineering have historically had somewhat of a tenuous relationship, in large part due to marketing's demands. Ophir, you have the benefit of both being the one making the demands and the one that needs to deliver them. So the expectations are always clear, which is definitely not always the case, especially as companies scale. But I'm just thinking back on times when I'm running marketing
Starting point is 00:49:06 and I'm getting together with the head of engineering to talk about data. And attribution really in many ways was a large part, I think, of what I was trying to accomplish with just a lot of asks around data from the engineering team, because you're trying to triangulate what's going on. And because marketing is so dynamic and you're constantly trying new things and you're constantly running tests, your requests are always changing and your needs around data on the sharp end of things are always changing, which is just interesting. I never really looked back at my interactions with engineering around marketing data through the lens of attribution, but I think that's a huge driver. Yeah.
Starting point is 00:49:48 One other thing I'll add is, I always say, what, as in like what happened? That's a relatively straightforward thing to answer. I wouldn't say it's easy, but it's pretty straightforward. Why it happened, that's where the fun is at. And that's really where kind of it gets a lot more complex. And you need to be also thinking not just about data. One of the biggest mistakes I've seen a lot of companies make is they're looking just at the data, but not looking at the context of what's happening. So you might be looking at, let's say, people who are clicking on ads.
Starting point is 00:50:24 And you might say, oh, okay, well, I see this ad versus that ad. And this ad is doing better than this ad is a better doing ad. But what you might not realize is that one of those ads is from a display campaign, and the other one is from a search campaign. So the whole context of where the person is in the user journey can be totally different. And that's why I think data without the entire context of like what happened before, what happened after and what segments these people are from. That's where I find a lot of people also make mistakes based purely on data. And that's why I think marketing and data together really is both science and data. It's not just one or the other. Yeah, absolutely. And I think one thing that we've seen over the course of doing the show for,
Starting point is 00:51:11 I guess, a year. Wow, that's amazing. I hadn't thought about that. We've heard more and more really cool structures of teams where there's a very tight relationship between engineering and marketing or data engineering and marketing where it's very collaborative because you see so many times that problems arise when marketing gives a vague specification for something that they need and the engineering team will deliver that to spec and it lacks context, which to your point is so key in trying to understand why things are happening. And I think the more you can have a really robust collaboration where both the context of marketing, trying to drive a customer journey or explain things and pushing that context engineering, then also engineering, giving marketing the context of here's what's going on under the hood.
Starting point is 00:52:05 As far as the data, maybe there's limitations or decisions that need to be made, really can create a powerful dynamic for figuring out what's actually working and continuing to invest in those things for growth. Yeah, definitely. Well, we are at the buzzer. This has been a great conversation. We got to hear a lot about open source and Costas' evangelism about open source, both from the startup and enterprise levels. And we've learned a ton from you just about your unique perspective on data tooling and especially in marketing. And thanks for the quick crash course on attribution. That was really helpful for me and I hope helpful for our listeners as well. My pleasure. My pleasure. My big takeaway, and I'm still processing through this, but I think the conversation around open source spreading to different parts of the tech stack is just such an interesting
Starting point is 00:52:58 conversation. And I think a fierce observation around starting open source really having heavy influence in developer tooling and that being a huge wave of adoption. And then that spreading to data tools makes a ton of sense. And then I'm still ruminating on whether a sort of marketing or sales SaaS tool could make a run at it as an open source tool. And I'll probably be thinking about that a lot over the next week. Yeah, my main takeaway is that I might have to change career paths, Eric, and become an open source evangelist or something. That really was... It really...
Starting point is 00:53:45 We had a little aside there and you gave us a very passionate speech on multiple levels of open source. Yeah, yeah, yeah. It's probably the effect of jet lag from what it seems. But outside of this,
Starting point is 00:54:02 actually, it was very, very interesting to have a conversation with someone who is not traditionally exposed a marketeer talking about the importance of open source today in 2021, I think there are good signs that in a couple of years we might see maybe a successful open source CRM, who knows? Yeah, I agree. I think it was a really, really interesting perspective. Maybe we can get Mark Benioff on the show to give us his perspective on whether he thinks an open source company will disrupt Salesforce.
Starting point is 00:54:50 Am I going to be part of this episode? Maybe I'll start preaching to him that he should open source Salesforce. I think he'd be very receptive to that. Absolutely. Let's do it. All righty. Well, Kostas and I are going to go try to figure out how to get Mark Benny off on the show. And until next time, we will catch you later. We hope you enjoyed this episode of the Data Stack Show.
Starting point is 00:55:23 Be sure to subscribe on your favorite podcast app to get notified about new episodes every week. We'd also love your feedback. You can email me, ericdodds, at eric at datastackshow.com. That's E-R-I-C at datastackshow.com. The show is brought to you by Rutterstack, the CDP for developers. Learn how to build a CDP on your data warehouse at Rudderstack, the CDP for developers.
Starting point is 00:55:48 Learn how to build a CDP on your data warehouse at rudderstack.com.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.