The Data Stack Show - 45: Open Source and Attribution with Ophir Prusak of Codesmith
Episode Date: July 21, 2021Highlights from today's conversation include:Ophir's decision to switch from software engineering to marketing and riding the startup train (2:39)Open sourcing in the world of software (5:55)How open ...source has changed Ophir's life as a marketeer working at startups (10:28)Chartio's sunsetting drove Ophir to search for a data tooling replacement (27:27)Discussing trends in adoption of tools for small scale and large scale companies (35:01)Data challenges related to attribution--how wrong do you want to be? (44:07)The Data Stack Show is a weekly podcast powered by RudderStack. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Welcome to the Data Stack Show.
Each week we explore the world of data by talking to the people shaping its future.
You'll learn about new data technology and trends and how data teams and processes are
run at top companies.
The Data Stack Show is brought to you by Rudderstack, the CDP for developers.
You can learn more at rudderstack.com.
Welcome back to the show.
We have another interesting guest this week, Ophir Prusak.
And he started his career as an engineer, but has worked in scrappy marketing roles
doing data-driven growth at early stage startups. And I love that role
because I have played it myself and I know how much you have to really have a hybrid approach,
both in terms of actually doing the tactics of marketing, but getting very technical with the
data. So I think he'll have a really great perspective. I think the thing that I really
want to talk to Ophir about is he has a lot of experience
with attribution.
So if we have time in the show, I would like him to give us just a little primer on marketing
attribution because I think about working with engineering and just the discussions
around attribution and so many things that I know now that I wish I could have articulated
to the engineering teams I was collaborating with. So that is my burning question. Kostas,
what do you want to talk to Ophir about? I know that Ophir has done a lot of research
around data-related tools, and it would be great to hear about this experience and the reason that
he did that, especially because he's a marketer right like his
background is in marketing of course he has started as an engineer but i really want to see like the
perspective of someone who is not let's say involved in maintaining or having to set up and
all that stuff like the actual software but he's the main user and see like how what's his
experience around that and how he sees the landscape today and what changes he has in like
happening these past couple of years great well let's dive in and talk to Ophir let's do it
Ophir thank you so much for joining us on the show. In prep, we had way too many topics
to cover that were interesting. So why don't we just start at the beginning where we always start
with love, a brief background on you, your history working with data, and then what you're up to
today in your day job. Sure. Thanks a lot, Eric. So I got into computers at relatively young age. I got a computer when I was
13, studied computer science in college and started my career as a software engineer. I
worked for a few years and then I made the interesting switch to marketing back in 2005.
I simply wanted something more creative. So I've always considered myself kind of more of a technical oriented marketer
and looked at marketing problems from more of an analytical perspective. And then around nine
years ago, I joined a small startup as head of marketing and really have been riding the startup
train ever since. So for the past, I want to say nine years, really been working as kind of head of marketing or initial marketing hire at a lot of startups.
And as part of that process, it's always been about, well, how do we make sure that we have in place the infrastructure to be able to track what's going on, measure what's going on, and ultimately become data-driven?
Especially for me as a computer science person, it's always been important to be data-driven. And along the way, just have had to
solve so many problems about, well, how do I make this organization data-driven? It's obviously
different for different types of companies and learn the different options in terms of tools
out there. Currently, I am head of growth at Codesmith, which is a software engineering boot camp.
And what's led me more recently to really get more into this is that we have been using a tool called Chart.io, which is going to be sunset in a few months.
So while looking for replacement, I simply found there is just so many new tools and options out there.
And it's a little overwhelming.
So I created the website datatoolreview.com.
And it's just been kind of a fun ride.
Very, very cool.
And so much to talk about.
And I cannot wait to talk about your search for data tooling, just because it is a crazy
world.
And I think hearing your perspective as someone
who's recently gone through an evaluation process will be really cool. One thing I want to hit before
that, though, is something we chatted about before we hit record. And that is a trend that you've
seen around the progression of software being open sourced in different parts of the
tool set. And I think your perspective is really interesting as a marketer doing growth at early
stage startups, especially in technical marketing, or some people might call it growth hacking,
et cetera. It's all about being scrappy and using tools to get the data in and have visibility into
what your experiments are doing.
And you sort of have to play the role almost of data engineer and marketing ops and the
one leading marketing because you have to be small and scrappy.
And I just think that's such a wonderful
experience. But getting back to the topic, open sourcing in the world of software,
tell us about the trend. I think this is such an interesting conversation.
Sure. So when I am thinking back to 2005, actually, when I got into marketing, and at the time,
the only open source tools that I
remember were definitely kind of for developers. And it's always been, I feel, the case where
developers help each other. And it's really the more people using the same tool,
the better it is for everyone. But on the marketing side, and even more so on the sales
side, I want to say it's somewhat of a zero-sum game, I like to
call it, where if one person is getting a sale, that means another company isn't making the sale.
So there's always going to feel more competition and a lot less sharing of, well, we should all
use the same tool, but there has been more of a looking for the differentiators and what's
different and what I can do to get ahead
opposed to the other person. But what we've seen, or what I think I've seen at least over the years
is more and more tools, which ultimately are becoming open source. And I think the natural
progression of purely developer tools to kind of analyst tools or data stack tools makes a lot of
sense for two reasons. First of all,
I think more people want to help each other in terms of having great options. And I think
looking back, I remember Redash and Metabase being two of the first tools that I remember
being open source, which were actually really good in terms of solutions. And at one company,
I even used Redash for a bit. It's great that it's open
source and anybody can use it. And what I think we've seen is that over time, there's kind of a
shift more and more into the world of, let's say, whether that be marketing or I don't know any
sales tools that are right now open source, but really there's a shift more of let's try to help
each other. And the other aspect I think also is in terms of purely a distribution model from a sales
perspective. If I'm going up against some established players in terms of whatever
software is I'm trying to sell, having an open source solution allows, I think, a way to get into
the market much easier than just being another competitor.
So in some ways, if there's, let's say, maybe Looker as an example, which I think is an example of a very established open source tool, and there is a quote unquote Looker
open source alternative, I know that came out recently, again, in order to kind of to
gain more momentum.
And it just makes so much sense from the people trying to create these new
tools, both from a, it's a win-win for the developers or for the analysts, and also for
companies trying to kind of get into the market. It's just a great way for people to, they can
start small. Usually it's a service which you can pay a little, there's a cloud version.
Once you get a little bigger, you can use the open source version. And once you get really big,
you'll usually use the enterprise version of it.
So it just makes sense to me that there's this kind of movement.
And I think over time, we're going to see more and more tools having the open source
model because it's kind of one of those win-win situations.
It just makes so much more sense.
Ofer started the conversation and he mentioned Chart.io and that the tool is like right now
in like a period where it's going to sunset
pretty soon. So I think that another benefit of having open source something out there,
it's also what happens when the team behind or the company behind the product decides to take
a different course, right? Companies have the opportunity to use the open source,
self-host it, and even keep maintaining it.
There are a couple of cases like this,
especially like in the database space,
because especially database are like pretty hard pieces of software to build.
But very good example of this is like RethinkDB, for example.
That's exactly what happened.
The command decided that they cannot move forward.
They open sourced the code, and the community decided to continue supporting CouchDB, the same
thing. So that's another thing. And I think, Eric, you have also seen, especially large enterprises,
which sounds a little bit controversial, to actually be interested in a company that has
an open source project because they know that they can pay for the product and there will always be continuity in the product if something goes wrong with the company. So I think that's another
important reason why open source is important. But Ophira, you as a marketeer, right? And you've
been like in this space for quite a while now. How do you feel that your life as a marketer has changed because
of open source? That's an excellent question. So I want to say as a marketeer, or I should say as
a marketeer who has been working at startups. So for me, it's not just a question of being a
marketeer, but also having to be scrappy and find solutions where either me or me and kind of one other developer can really lift this off the ground.
I want to say a lot of the more advanced tools really have two options.
You can either pay for an enterprise tool. And I've seen a lot of tools which do some really cool things, but the entry level for paying for them is, you know, it's annual contracts only starting at 10K
a year and above, which is just out of the budget for a lot of the smaller startups.
So your option either to go with a very simplistic tool, there's a lot of things which might cost you
let's say $99 a month or something, but again, you're limited. And so in many ways, going down the open source route is kind of your
only solution of getting a more flexible or a more mature product without having a pain in the arm
and a leg. And because it's also open source, having a model where there is a cloud version,
which is relatively inexpensive, makes a lot of sense. They're not going to charge a ton of money if there's an open source version because simply they can't.
So I do think it's definitely, and I think Redash and Metabase are two tools that I've used in the
past, which were good examples of this. I know even though Redash was acquired, I think, by
Databricks, it's still, it's a great solution where you can kind of start easy, you can play
around with it yourself. And also, as you had mentioned, Kostas, there's none of that worry about, well, what happens if
they raise their prices or something? I can always use the open source version. So I definitely think
it's helped me a lot in terms of seeing the different things out there. And definitely for me,
I do see more and more over time seeing open source, especially I want to say open source projects, which have
a cloud version. I think another good example is preset and superset Apache. We say, you know,
that's an open source project. Superset, which is a relatively new service is basically preset
as a service, but it gives you that, I want to say upgrade path of, you can play around with it,
start relatively easy to see if it works for you. And then if you do grow,
you can always have the choice
either to pay a little more or to host it yourself.
So I think that's another thing
in terms of new tools being out there
that you always kind of have the option.
You simply have a lot more options,
I think, when you go down the open source route.
Yeah, makes sense.
I think you put it very well.
I think one of the main values
of open source in general is choice. I think you put it very well. I think one of the main values of open source in
general is choice, either as you described it, or as the choice as a developer from the other side,
to have access to the source code and even extend it if you want. What's a big part of the value
that open source brings is this kind of flexibility and the choice and options.
Eric, you as a marketeer also, do you remember what was the first open source project that
you used?
That's a great question.
Let me see here.
Any sort of technology would probably be WordPress, but that's one of the most pervasive open
source projects in the world and drives a huge amount of the internet.
In the data space, actually, it had to have been Analytics.js from Segment way back in the day when they open sourced it and were doing data integrations.
And they obviously built a very large, successful company
in a really short amount of time.
But there was certainly a period there where Analytics.js
was a very interesting, useful, technical tool
for data integrations back when they were segment.io.
That's interesting.
So it was a data-related related product which i think makes sense i think
in general marketing is let's say somehow the function that drives a lot of innovation or it's
like let's say the first function to adopt and try things right like even with new things like
reverse etl i would say that like the most common use case around it is marketing and then might be like sales ops, right?
So that's something very beneficial that marketing is bringing to the industry.
And I don't think we recognize it enough.
If it may cost us, I do want to add a little point to what you're saying.
Going back to what I was saying beforehand, I think marketing has to try new things. I think the whole point of marketing, it's so crowded
today that if you're just doing what everybody else is doing, you're not going to get awesome
results. And I think the only way is to try something new. And very often to try something
new, you do need those new technologies, which is the opposite, I want to say, for developers,
where if the more people
using technology, the more stable it is, the less risk it is. So definitely marketing is always
going to be at the forefront of let's try something new because it's different. And definitely
whatever new tools are out there, we're going to try. Yeah, absolutely. But Ofer, I have a question.
We are talking about open source and we are talking about marketing being like in the, let's say, the forefront of like trying new things and all that stuff.
Are there any marketing platforms that are open source right now?
Like something, the open source alternative of MailChimp or the open source alternative of Pardo.
Does this thing even exist?
The short answer is yes, it does exist.
But the fact that I don't remember the name of the platform just goes to show you how,
unfortunately, I think for an open source project to really succeed, you need a lot of people whose
day-to-day job depends on that technology who are themselves highly technical. So if you're a
developer, then yeah, it's fine if I do some of my development on an open source project. If I am
a data analyst, but I know Python or I know software development, then sure, if I work on
that project, it's fine. If I'm a marketer, then I'm not going to be contributing to that project.
So I think the big problem with that,
and I forget the name of it,
because I looked into it and it just was not very mature
and it didn't seem to really be taking off,
it's because of exactly that problem.
I don't believe we'll ever see a truly open source solution,
which is going to replace a HubSpot or a Salesforce
simply because there are just not enough people
whose day-to-day job is working on that.
Yeah, I think Pymcore is one of the big ones that comes to mind.
And I know that off the top of my head just because I recently did some research on open
source, like marketing and sales type SaaS tools.
And there are a handful of other ones out there,
but I think you're right over here.
If you think about one way that you frame this,
and I really liked the way that you described
the arc of what's happened is a lot of open source stuff
was initially tooling for developers, right?
So you think about things like Git
and the whole ecosystem around that.
And it was really developers working together to figure out, okay, we're all doing very
similar things here.
How can we create some sort of standard and frameworks and then tools that result from
that, that benefit everyone and make everyone's job easier so that we can focus on stuff that
matters more.
And then you start to see that happening in the data space. And I think
it makes a lot of sense in the data space because by nature, data requires a lot of integrations.
There's a big ecosystem around data, which I think lends itself to open source. And then also,
especially when it comes to analytics, everyone's reporting is a little bit different depending on
the business, but people are trying to build
the same fundamental reports to understand how their business is operating. And I think those
conditions create a healthy environment for open source. Whereas if you think about an email
marketing tool, it just doesn't seem to me that there's ever going to be a commercial or an open
source tool that achieves a level of commercial success that a Salesforce
or Marketo or other sort of traditional marketing and sales SaaS tools.
But what do you think of your, I mean, you seem to think that that's never going to happen.
Yeah.
I think it kind of goes back to what I was saying, where in certain fields, differentiation
is a competitive advantage, which I think sales and marketing are
those fields while in something like software development or even your data stack. I don't
believe that's a competitive advantage in terms of the way to develop people working on it,
actually see it from their perspective is the more people working on it, the better.
So I think that's why we're not going to see something like
sales. Well, let me rephrase that. There is SugarCRM and there have been people who have tried
this, but I don't think it'll ever be able to compete with kind of the elephants of the industry
the way that other tools have been able to compete. Yeah, it's really interesting. And I think your point about
marketers and salespeople not being able to contribute back to the core product, I think,
is a really defining characteristic there. Okay, one more question on open source. And I'm going
to direct this both a few towards you and Costas, because Costas, I think you have some strong
opinions here about some patterns that we've seen. But when you have an open source tool that also commercializes at scale,
it can kind of become controversial. So the one or a couple examples that come to mind
of late, you have Elasticsearch and then MongoDB changed their licensing, both hugely, huge adoption among the products,
but also experienced some turbulent times trying to navigate being a large commercial
entity or connected to a large commercial entity and also open source.
So it makes total sense of the small end, like you said, especially in the startup world,
but it's not always an easy path
to navigate when you're at scale. So what say you, Ophir and Kostas, on the challenges of being
open source as a huge commercial enterprise? Yeah, that's a great question. And coming more
from just the startup world, I want to say of marketing and sales and product development in
general, I personally do think having, it depends, I want to say to your point, the whole MongoDB
and changing of different models.
I do believe having a, what I want to call a core product, which is free and kind of
will always be free, but differentiating, I want to say on the functionality side.
I mean, I think that's totally legit.
I mean, I think it makes sense.
Will there be problems sometimes? Sure. But if the core product is always free and you can do
whatever you want with it, and then adding on top of it, and this is what I almost always see the
case is things like single sign-on or granular access levels to who can do what, as well as scalability to some extent.
That makes sense. Ultimately, the people who are working on the project, they do want to be able to
make money. And I think it's fine as long as the core product is still open source. And I do feel
it's kind of the best of both worlds. Nobody's forcing you to use the kind of enterprise version,
you could figure things out yourself. And I've even seen some open source projects
which actually try to replicate some of the stuff
that kind of the enterprises are doing
or kind of the enterprise version.
So I still think it's a win-win to have open source,
even for enterprises and for companies
to ultimately take funding
and try to contribute the product.
Yeah, I agree.
Eric is a bit of a complex situation, to be honest.
It heavily depends.
Yeah, of course.
That's why I asked you in Ophir.
Yeah.
I mean, it heavily depends on the product itself
and also on the monetization path that the company wants to have.
If you think, for example, about both the cases that you mentioned, on the product itself and also on the monetization path that the company wants to have.
If you think, for example, about both the cases that you mentioned, Elastic and MongoDB,
the problem these two companies had,
it's not that you or me, we would start a company
and we use MongoDB as the backend of our system
and use the free one.
That wasn't the problem that they had.
The problem that they had was that amazon could come and be like okay now i'm giving elastic search
as a service right and that creates a conflict with the business model that elastic has so
it's a bit of like let's say a game and the battle between like the the big companies and the bigger companies in a way i
don't think that like any startup right now starting an open sourcing something they are
going to have to face that problem on the other hand we have cases of companies like databricks
for example or confluent that both of them have open source projects and the core project is open source.
They are offered as a service from big cloud providers,
but at the same time, they also manage to be successful, right?
Databricks is super successful.
Okay, they haven't gone public yet,
but they are on a track to do that pretty soon.
And one of the main rivals of companies like Snowflake,
Confluence just IPO'd.
So it depends.
I mean, I don't think at the end that this kind of behaviors
that the big cloud providers might have
are going to hurt the companies that much.
Maybe they have to change their business models a little bit
or their licenses, which is fair. I mean, it's not going to hurt the companies that much. Maybe they have to change their business models a little bit or their licenses, which is fair.
I mean, it's not going to hurt you
that you are going to use the product for your backend at the end.
So yeah, I think that things are a little bit better
than we tend to think about that.
And don't forget that open source is literally like the core
of the internet
and all this digital revolution, right?
Like Linux is open source.
Without that, we wouldn't have like servers, right?
Yeah, nothing.
I just remembered.
Okay, that's a little bit irrelevant, but I find it funny.
So Linus Torvalds, the guy who started Linux, right?
He's famous for being very aggressive and almost, let's say, a little bit abusive towards developers.
He's very opinionated and very protective of his child.
And I was reading lately that at some point
he decided to go and do therapy for that reason.
And now he has changed his mind completely
and tries to have more empathy.
Well, that's great.
By the way, the guy's the inventor of Linux
and the main maintainer of the kernel of Linux
and also of Git, right?
I mean, we're talking about a person
who has contributed a lot using open source.
That is unbelievable.
I want to add one other thing, Kostas, you raise an excellent example of kind of different
licensing that in a lot of different worlds, and even in the software world, to some example,
when I was working at different SaaS companies, we had a different pricing model for if you
use a product in-house or you're an agency
and you were basically reselling it in some way. The same thing is applicable for media rights.
If you're in the media world, if you're selling a picture, it's different pricing where if you
use it yourself or you're reselling it. So I think it's totally fair to say, hey, just because it's
open source doesn't mean you can do anything you want with it. And all companies are equal in that sense. So I think it's fair to say if I'm creating technology
and use it in-house, then the rules A apply to you. But if you want to actually resell it,
then it's different rules. And I think that's totally legit.
I was going to say, Kostas, that was a very, I think, thoughtful and balanced response to a complex
question, maybe a little bit more so than the most impassioned commenters on hacker news when
some sort of open source, when some sort of open source news like Elastic or Mongo hits the press.
Okay. And one thing I did want to mention, there's a really interesting site out there called
opensource.builders.
So opensource.builders.
And you can go see alternatives to tons of different types of tools from CRMs and email
tools and analytics and you name it.
That came to mind, Ophir, when you were talking about open source email marketing tools. Let's switch gears a little bit. Ophir, you
recently with the announcement about Chartio sunsetting, went on a search for data tooling
to sort of rebuild your go-to stack and ultimately concluded that it's kind of complex and you ended up putting
a website together that collected a lot of your findings. Can you tell us what were the
requirements around your search? And then you have a very fresh set of eyes looking at all
sorts of components of the data stack from pipelining type solutions all the way through to
BI solutions. And we'd just love to know what did you learn in
that process? Yeah, no, thanks a lot, Eric. So yeah, I've been a user of Chart.io since
literally they launched and have just been a huge fan of the product, not to mention seeing how it
grew over the years. And I think for me, it really solved quite a few different parts of the puzzle that I needed.
Its ability to pull data from different places and blending the data or federated queries,
as it's called, and being also very easily within kind of a nice GUI interface to be able to
really kind of do the key part of ETL, not through queries per se, but ultimately
give you a really nice solution. And it's funny because I'm a SQL guy and I love writing SQL,
but what I found was actually having kind of a gooey front end to doing the SQL manipulation
just gave me the ability to make modifications really quickly and easily.
And even though Chart.io is a closed tool, I still was able to do probably 80% of what
I needed to do, like as long as the data was in some SQL database.
So for me, it was in terms of stuff I needed to do was, first of all, make sure the data
is in SQL database.
Chart.io does also pull from Google Analytics in terms of one of the very few kind of non-SQL based as well as CSV files and Google Sheets. And so the
only other part I really need to take care of was to get data into a SQL database. We use HubSpot.
So for us, we were using Segment too. And I played around. There's a lot of tools which do it. I was
using Segment though, just to pull the data into a SQL database. But really Chart.io, I want to say,
gave me the ability to do most of what I needed to do. And I do want to bring up one other thing
that I found is that within the realm of kind of BI tools and Chart.io definitely is in that realm,
I find there's two different types of problems that people are looking to
solve. One is I want to say the day-to-day, week-to-week reporting. And that is, I just
need to see, okay, how many people have signed up to my service? Where are they in the pipeline?
What's happening with all the internal database I have? And it's really just to get an idea of
what's going on and still be able to slice and dice the data to some extent. I want to do certain segments or certain timeframes, but it
is about that kind of ongoing reporting. And then there's the discovery slash ad hoc. Well, I have a
question that nobody's asked before, or I've never asked analytical people before, and I want to
answer that question. So Chart.io is definitely in the first group where it's great for reporting,
but I do agree very often people have come to me and say, well, I have a question. So Chart.io is definitely in the first group where it's great for reporting, but
I do agree very often people have come to me and say, well, I have a question.
And I'd kind of just do a one-off report for that simply to answer it because it doesn't give the
analysts kind of truly an easy way, I want to say, to just go in and ask any questions. So
just a little more about the kind of tool and what I was looking for. I was definitely looking
for a solution for more of that reporting side. And what I found when I started looking for other
tools is, first of all, I didn't find any other tool. I want to say that was kind of a one-to-one
solution that I could easily do. And I really probably would need to now put together a few
different things. I did see some specific tools. I even talked to
Holistics, I remember, in one of your other episodes, which actually looked very close to
what Chart.io does. But I was a little concerned also to go down the, well, this is another tool,
which is not open source. So I was definitely looking more towards, can we solve this
with just open source tools? For what it's worth, I'm still looking for a perfect solution. And I haven't decided yet. But definitely, to your point, it's just really taken off, I want to say
the past couple years, I haven't really looked at new tools. So recently, because I've been using
chart IO, this the whole explosion of like, reverse ETL tools. And just I want to say,
just a lot of tools, which each do a really, really good job of solving
one specific part of the problem. I mean, there's like some tools that just pull data from whatever
data source you want and put it into a flat file. There's some tools which just take care of pulling
the data from whatever sources you want, like a Fivetran or whatever it be, and push it to whatever
other solution you want. And then there's like CDPs like Redder Stack and reverse detail.
I was saying, I feel like there's so many tools which are really trying to solve one
specific part of the problem, which is great when you have a slightly bigger team.
But as a team of one, for me specifically, it's been a little more challenging because
I've realized I need now to, okay, maybe I need now to actually look at a few different tools
and see how we can put them all together.
So I think we mentioned this also before the recording,
it's a little more challenging when you need to kind of do everything open source
because it is a little more complex
and it does require a little more work to kind of get things up and running.
So I would say you have a lot more choices than before
and a lot more specialization in specific tools. But in some ways, we don't have that kind of one thing does everything
solution that a lot of smaller companies need in order to get started. So that's kind of where we
are today. No, that makes total sense. And it is really interesting to think about the data space
and Costas would love your thoughts on this as well. As tools have progressed, it makes a lot of sense that there has been some specialization,
right? Where data pipelines are really hard. Pulling data in is a non-trivial problem and
there's always new sources and everything to maintain. And then doing analytics
really well is also hard in its own right. And so it makes sense that there's specialization,
but to your point, when you're a really small data and analytics organization,
having tools that can accomplish multiple things in one system is way more convenient potentially because you're not
dealing with multiple vendors multiple processes etc etc but costas what do you think about that
i mean do you see any trends i mean of course specialization with different pipelines etc but
would love your thoughts on that as well yeah that's uh that's very good question first of all
i have to say to offer that he made me really happy with what he said about Chart.io.
I'm a very good friend with Dave, the CEO of the company, and I'm pretty sure that he's going to be very happy to hear what you said about the product.
He's one of the most obsessed, in a good way, product-driven person that I have met.
And he put a lot of energy to build this product.
And it's good to hear that what his vision was,
at least for the experience outside of the company,
he managed to deliver it at the end.
So that was great.
And I'm sure you will make Dave really happy
if he listens to the episode.
Now, going back to your question, Eric, you know, they say about software
that a common pattern to build a new product in the new company is go to start with the small
enterprises or medium enterprises, iterate the product on them and then go to enterprise,
right? That's like a very common pattern of innovating on an existing problem creating something that it's better as an
experience use the smaller companies as a vessel let's say to figure out what's the right way of
solving the problem today and then at some point go and sell it to the enterprise and increase your margins and all that stuff.
Now, that might be true for the SaaS space, right?
Where we are building a CRM or a marketing platform.
Now, in the data space,
I think that what is going to happen is the opposite.
And there is a reason behind that. And the reason is
that building technology around data is really, really hard. You have to scale from day one.
And it's very, very expensive. Going out there and building a new database system, for example, it's crazy hard. And the big companies have both the scale
and the money to fund this product. So my feeling is that in data, we are going to see the opposite,
actually. We are going to see products that are going to be built primarily for the enterprises,
and then they are going to scale down in a way to the smaller companies.
And I think we see that happening in a way,
especially with companies like Databricks and Confluent.
They started, first of all, with an on-prem solution,
the traditional enterprise sales going there.
These things that a small company would never pick up the phone and call them for a quote
for the price, right? And then you see them going down markets instead of going up market.
And they open something like a product as a SaaS solution. And then you have tiers that
they are consumption-based. And it's very easy for someone, even as a small company, to go
and afford and use
the solution there is one exception there and this is snowflake which started from like smaller
companies and then started penetrating the larger enterprises but i think that this is a kind of
pattern that we are going to see a lot happening in the data space. At least that's my opinion. Interesting thought, Kostas.
I actually want to say something which is,
I've seen something which I understand
where you're coming from,
though I want to say in terms of
just product growth in general,
what I've seen often the case where
what's the difference between products that really,
at least in the SaaS world where what's the difference between products that really, at least in the SaaS world, what's the difference between products that really, really kind of take on and go viral versus products that just never make it to that kind of super large adoption?
Is it something which I can just go in and within 30 seconds, start playing around with it. I don't need to talk to anybody, especially if you're talking to developers or analysts who don't want
to talk to a salesperson. I have to say, if I need to actually talk to somebody to even play around
with a product, then I don't think it's going to kind of really gain huge adoption. And again,
that's at least in the SaaS world for
developers and for analysts. And what I've seen is if you start by creating a product,
solving for enterprise, you're not thinking about the self-service model first and foremost.
And I'm seeing a lot of, at least when I've talked to a lot of companies, they actually
do what you're saying, Kostas, that they solve kind of problems for the enterprises,
but they don't have an actual demo you can play around with.
It's like, oh, I need to set it up for you.
So I actually want to say the, at least from a product perspective,
if you're not able to accommodate the self-service model,
I think you're going to have problems with growth,
even if you are an enterprise product,
because it's the individual developer, the individual analyst who wants to go in and play around with it and doesn't want it,
doesn't have the resources to install it themselves, but wants to play around with
the cloud version that ultimately is what causes a lot of products ultimately to go viral.
A hundred percent. I'm 100% with you on that. And I think that that's where open source
is also extremely important.
Like the companies that I mentioned,
like Confluent and Databricks,
they started first of all as an open source project.
And because there are tools that are used by developers,
like Databricks is not something that,
I don't know, like a marketeer or a salesperson will take,
although the output of the work done there might be used by them. Developers are fine
to try solutions that are not that easy to use, right? Like they can take it, set it up, play
around, see how it works, make it work. All that are like part of like the developer experience,
which is different from a SaaS user. And yeah, I totally agree with you, this kind of experience is important.
The difference is that it is a little bit different with data related products, especially
infrastructure products, because these are going to be used and maintained by developers,
right by engineers. So the experience there is a little bit different.
So that's at least my experience so far.
And I think that that's another also added value of open source at the end on the side
of the business models and how you can use it to actually build a company and the product
experience.
Yeah, it's interesting. There's also another way that innovation at the enterprise impacts technology.
And that is when a problem is solved in the enterprise.
And then either the pattern for the solution or the actual solution itself is published,
oftentimes it's open source.
So if you think about, I mean, Netflix is a classic example of this. Really interesting technologies have come
out of Netflix in ways that they've solved data infrastructure problems at scale, and they've
open sourced some of those. And I think it's also interesting to think about how, per what you said,
Costas, like open source being a way that you drive adoption to the bottom of the market,
which is also your point of view,
that some of the patterns for that can actually emerge
from the way that enterprises are solving
data infrastructure problems.
So what a fascinating ecosystem.
Yeah, and also, Eric, adding to the open source,
because you mentioned Netflix.
For Netflix, open source is also a tool
to recruit
the best possible talent and that's another again value because if you think about it like netflix
is not a software company right it's not their primary product their primary product is like
content they are content creators but they operate at such a huge scale where they need the best people out there to build and maintain their infrastructure.
And open source gives you a path to go and get these people, which is another benefit of open source.
Absolutely.
I feel like I'm evangelizing open source a lot today.
It's the open source show.
Yes, very eloquent evangelism.
Well, we're getting close to time here.
There's one more subject I wanted to cover.
And Ophir, you have a lot of experience
doing attribution in marketing.
And attribution in marketing is a tricky thing, right?
It's basically trying to answer the question
in any number of ways. I try something and I get these results. And then how can I tie the results
back to this specific effort, right? A classic example is paid advertising, right? When I spend
money on paid advertising, I want to see whatever it is, how many customers actually came from that.
And attribution is a classic challenge when it comes to data for a number of reasons.
Paid advertising is just the tip of the iceberg there.
But I would love to know, I'm thinking about especially our listeners who are on the technical side, probably work with a marketing team or have projects related to marketing, but maybe who just
aren't as familiar with attribution on a tactical day-to-day basis. Since you've played the role of
marketing and marketing ops and data engineering, could you just give us a breakdown? Tell us,
give your basic definition of attribution and then what are the data challenges related to different types of attributes?
Sure.
I'll start with just a quote that I heard that I simply love about attribution is that
attribution is simply a question of how wrong do you want to be?
And the reason of that, I actually learned about attribution the hard way almost a decade
ago, I want to say, when I was running a campaign
and there was, at the time, this was like, I wasn't doing multi-touch attribution. It was
very straightforward, what Google Analytics was telling me. And there was one campaign,
we were spending a lot of money and we were seeing activity, but we were just not seeing
conversions. And ultimately we made the decision just to drop it. And two months later,
our sales dropped drastically. And that was the only explanation. And looking back, it was clear
that it was simply the multi-touch attribution part of it. But I think attribution at the end
of the day comes, and there's an analogy used very often for soccer players. When you make a goal,
you can say it's the person who technically hit the ball
into the net, which made the goal. But if you look at who gets credit for making that goal,
it's not just the person who actually kicked it in, it's the whole team and everything coming up.
But it's really hard to say, well, what percentage of each one of those people ultimately played a role.
Another way to look at it is that if I would have taken out one of those people from the series, or if you look at all in the world of marketing, if somebody had, let's say,
five different touch points, if I were to take one of them out, what can I say about
how much it would have impacted the ultimately attribution?
How much would it impacted the ultimately attribution, how much would
impact the actual revenue?
So I think a few things which are, I think, important to understand kind of for people
is you're never going to get 100% attribution, first of all.
In other words, you're never going to know for sure exactly where somebody came from,
or even if you're doing multi-touch attribution, you're never going to know for sure the impact
of each touch point. I want to say attribution is something which is really
directional. If you think about things also like marketing in general, if you're tracking people
on your website and a lot of people are going to have blockers for being able to track, I think
attribution is good to understand not how many people are coming
from exact numbers of this ad versus that ad, but maybe when I compare this channel to that channel,
what do I see? When I compare first touch, last touch, what do I see? So I think it's really
kind of a directional. It's a great way to do it. And the other thing I want to say, which
is something that I'm seeing more and more
people do recently in terms of attribution is something called incrementality, which is, you
know, similar to just split testing or AB testing in general, but instead of having version A versus
version B of a specific type of copy, you basically kind of have a control group, which doesn't get the ad at
all. And that way you can kind of say, well, all things being equal, if I didn't serve a specific
ad up to a group of people, how much did it impact the percentage of those people, which ultimately
made a purchase or made a conversion? So a few things, ultimately attribution is hard. I will
say there are definitely companies which are doing a decent job and if anything
better than nothing, especially today in the AI world and machine learning, there's a lot
of companies which are able to put together the data using things like linear regression
and able to give you more than just what a tool like Google Analytics is going to give
you. And I would even
say if you're spending a million dollars or more on paid advertising a year, you should definitely
look into kind of a dedicated solution and not just depend, not don't do it yourself with just
SQL and try to figure this out or Google Analytics. You definitely want a dedicated
attribution solution. If you're doing less than a million dollars a year, I feel like you're going to get some benefit,
but it's just not going to be as much of an impact. And also there's a big question of how
many channels. What we found is once you go beyond just Facebook and Google and you start doing
things like maybe TV advertising or OTT advertising, or you have a lot of coupon codes or whatever,
then using a third-party tool definitely helps. Sure. Yeah. One thing that's, I think back on
my background in marketing, and I don't know if a lot of people explain it this way, but
marketing and engineering have historically had somewhat of a tenuous relationship,
in large part due to marketing's demands. Ophir, you have the benefit of both being the one making
the demands and the one that needs to deliver them. So the expectations are always clear,
which is definitely not always the case, especially as companies scale. But
I'm just thinking back on times when I'm running marketing
and I'm getting together with the head of engineering to talk about data. And attribution
really in many ways was a large part, I think, of what I was trying to accomplish with just a lot
of asks around data from the engineering team, because you're trying to triangulate what's going on. And because marketing
is so dynamic and you're constantly trying new things and you're constantly running tests,
your requests are always changing and your needs around data on the sharp end of things are always
changing, which is just interesting. I never really looked back at my interactions with
engineering around marketing data through the lens of attribution, but I think that's a huge driver.
Yeah.
One other thing I'll add is, I always say, what, as in like what happened?
That's a relatively straightforward thing to answer.
I wouldn't say it's easy, but it's pretty straightforward.
Why it happened, that's where the fun is at.
And that's really where kind of it gets a lot more complex.
And you need to be also thinking not just about data. One of the biggest mistakes I've seen a lot
of companies make is they're looking just at the data, but not looking at the context
of what's happening. So you might be looking at, let's say, people who are clicking on ads.
And you might say, oh, okay, well, I see this
ad versus that ad. And this ad is doing better than this ad is a better doing ad. But what you
might not realize is that one of those ads is from a display campaign, and the other one is from a
search campaign. So the whole context of where the person is in the user journey can be totally
different. And that's why I think data without the entire context of like what happened before,
what happened after and what segments these people are from. That's where I find a lot of
people also make mistakes based purely on data. And that's why I think marketing and data together
really is both science and data. It's not just one or the other. Yeah, absolutely. And I think one thing that we've seen over the course of doing the show for,
I guess, a year. Wow, that's amazing. I hadn't thought about that. We've heard more and more
really cool structures of teams where there's a very tight relationship between engineering and
marketing or data engineering and marketing where it's very collaborative because you see so many times that problems
arise when marketing gives a vague specification for something that they need and the engineering
team will deliver that to spec and it lacks context, which to your point is so key in
trying to understand why things are happening.
And I think the more you can have a really robust collaboration where both the context of marketing,
trying to drive a customer journey or explain things and pushing that context engineering, then also engineering, giving marketing the context of here's what's going on under the hood.
As far as the data,
maybe there's limitations or decisions that need to be made, really can create a powerful dynamic for figuring out what's actually working and continuing to invest in those things for growth.
Yeah, definitely.
Well, we are at the buzzer. This has been a great conversation. We got to hear
a lot about open source and Costas' evangelism about open source, both from the startup and enterprise levels. And we've learned a ton from you just about your unique perspective on data tooling and especially in marketing. And thanks for the quick crash course on attribution. That was really helpful for me and I hope helpful for our listeners as well.
My pleasure. My pleasure.
My big takeaway, and I'm still processing through this, but I think the conversation
around open source spreading to different parts of the tech stack is just such an interesting
conversation. And I think a fierce observation around starting open source really having heavy influence in developer tooling and that being a huge wave of adoption.
And then that spreading to data tools makes a ton of sense.
And then I'm still ruminating on whether a sort of marketing or sales SaaS tool
could make a run at it as an open source tool.
And I'll probably be thinking about that a lot over the next week.
Yeah, my main takeaway is that I might have to change career paths, Eric,
and become an open source evangelist or something.
That really was... It really...
We had a little aside there
and you gave us
a very passionate speech
on multiple levels of open source.
Yeah, yeah, yeah.
It's probably the effect of jet lag
from what it seems.
But outside of this,
actually, it was very, very interesting
to have a conversation with someone who is not traditionally exposed a marketeer talking about
the importance of open source today in 2021, I think there are good signs that in a couple
of years we might see maybe a successful open source CRM, who knows?
Yeah, I agree.
I think it was a really, really interesting perspective.
Maybe we can get Mark Benioff on the show to give us
his perspective on whether he thinks an open source company will disrupt Salesforce.
Am I going to be part of this episode? Maybe I'll start preaching to him that he should open source
Salesforce. I think he'd be very receptive to that.
Absolutely.
Let's do it.
All righty.
Well, Kostas and I are going to go try to figure out how to get Mark Benny off on the show.
And until next time, we will catch you later.
We hope you enjoyed this episode of the Data Stack Show.
Be sure to subscribe on your favorite podcast app
to get notified about new episodes every week.
We'd also love your feedback.
You can email me, ericdodds, at eric at datastackshow.com.
That's E-R-I-C at datastackshow.com.
The show is brought to you by Rutterstack,
the CDP for developers.
Learn how to build a CDP on your data warehouse at Rudderstack, the CDP for developers.
Learn how to build a CDP on your data warehouse at rudderstack.com.