Software Misadventures - Building 2 Iconic OSSs Back-to-Back | Maxime Beauchemin (Airflow, Preset)
Episode Date: May 21, 2024If you’ve worked on data problems, you probably have heard of Airflow and Superset, two powerful tools that have cemented their place in the data ecosystem. Building successful open-source software ...is no easy feat, and even fewer engineers have done this back to back. In part 2 of the conversation, we talk about Max’s journey in open source. Segments: (00:03:27) “Project-Community Fit” in Open Source (00:08:31) Fostering Relationships in Open Source (00:10:58) Dealing with Trolls (00:13:40) Attributes of Good Open Source Contributors (00:20:01) How to Get Started with Contributing (00:27:58) Origin Stories of Airflow and Superset (00:33:27) Biggest Surprise since Founding a VC-backed Company? (00:38:47) Picking What to Work On (00:41:46) Advice to Engineers for Building the Next Airflow/Superset? (00:42:35) The 2 New Open Source Projects that Max is Starting (00:52:10) Challenges of Being a Founder (00:57:38) Open Sourcing Ideas Show Notes: Part 1 of our conversation: https://softwaremisadventures.com/p/maxime-beauchemin-llm-ready Max on LinkedIn: https://www.linkedin.com/in/maximebeauchemin/ SQL All Stars: https://github.com/preset-io/allstars Governator: https://github.com/mistercrunch/governator Stay in touch: 👋 Make Ronak’s day by leaving us a review and let us know who we should talk to next! hello@softwaremisadventures.com
Transcript
Discussion (0)
What do you think are the attributes that makes like a really good open source contributor?
One thing that I think is extremely undervalued in software engineering in general and technical
position is just like code orientation, like just being able to find your way in a large code base.
Because like you end up on like these messy big repos that are layered with stuff, right? Like
you join a company like Facebook, there's like two big monorepos
with a hundred different ways of doing things
and like, you know, thousands of microservices.
Similarly, like, you know,
you want to contribute to Airflow Super Set
or really any open source project out there.
Where do you start?
Like maybe you want to change the color of a button.
Like where is that button?
Where do you find it?
Or you want to add a field to
a dashboard that's the common data engineering problem i'm so disinterested in that's like oh
we get this data from hubspot and there's a dashboard and then we got to carry this new
custom property all the way through the pipeline to the end uh what is that pipeline like where's
this dashboard pointing to so i think code, being able to find your way in a large repository
and figuring out what to do
and how to decipher everything that's been done before
and why is extremely key.
For maybe someone starting,
are there easier projects to start with?
Maybe that's not in Rust.
Yeah, well, so I think it's all relative to where you're starting from.
But I think the purpose is what's most important.
So you could install a library.
It doesn't quite do what you need it to do.
Or you find a bug.
It would be nice if this method existed to go and then start with just scratching your itch in someone else's repository.
It's a really nice way to get involved because it will force this little bit of exploration
I was talking about,
developing the code navigation skill.
Like, okay, I pip install this library.
I'm reading the documentation.
The method I need is not there.
The documentation is not clear on something.
Let me get to the bottom of this.
Then you have to get in that repo
and figure out your orientation, that repo,
clone it, contribute something, and track with someone and i don't know maybe eventually like make a friend along the way i'm sure there were a lot of pros through like airflow and preset like
how did you go about handling problem children Welcome to the Software Misadventures podcast.
We are your hosts, Ronak and Gwan.
As engineers, we are interested in not just the technologies, but the people and the stories behind them.
So on this show, we try to scratch our own edge by sitting down with engineers, founders, and investors
to chat about their path, lessons they have learned, and of course, the misadventures
along the way.
Talking about the open source projects, now you've built successfully a lot of open source
projects.
I mean, I think at least as far as SuperCert and Airflow are concerned, they've become
more or less the de facto choices in their own domains.
Like, I know a lot of companies using Airflow for workflow management, for example. What
used to be Azkaban back in, I think, 2011, 12, is getting completely replaced.
I was going to say Luigi, but wow, Azkaban.
Or Luigi.
Uzi.
Dating yourself there.
Yeah, I know, I know. So what I meant to say is, you started these open source projects,
which have become super successful, and you've done it more than once.
What are some of the common ingredients there?
Yeah, I mean, it's a tough question.
I think there's a lot of parallels to be drawn with, you know, so I call it like project community fit is like product market fit.
And product market fit is, you know, a multidimensional, a very complicated thing.
They're just like timing like you know what's
what's the say them i'll be talking about product market fit i think there's and then we can um
translate some of these learnings to project community fit like like a fit of an open source
project but in pmf um you know i think yeah timing is a huge thing. Like what is the market? How ripe is it?
How ripe is it to be disrupted?
And then in what way?
What's the minimum viable product, right?
Like, and what's the new,
what's the TAM of that market?
And how much can you put towards it?
So at first you need clearly
something that there's a need for,
and then you need to address an unaddressed need somewhere out there.
So for Airflow, there's no really good data orchestrator
with data pipelines as code.
Though arguably there was some before.
I think Uzi was XML.
You mentioned Luigi, Azkaban. though like arguably there was some before like I think Uzi was like XML you mentioned like Luigi
Azkaban and then there's a whole generation of like GUI driven tools like Informatica and
Data Stage and other things but there's that there's also like you know kind of the founder
the the founder fit where you know like like I'd been in data engineering for probably 10
10-15 years at the time or like in the ancestor in data engineering for probably 10 10 15 years at the
time or like in the ancestor of data engineering and then at the place that was pushing a big
change in that area so i think i was like the right person at the right time with the right
ideas where those those stars you know you can't really make them align they just align
like you know i was because i had been and kind. They just align. Like, you know, cause I had been in,
in kind of sitting through some of that.
So I think there's some of that,
that you can't recreate,
but then in terms of like,
once you have the project started in some form of like MVP and a little bit
of traction,
I think it's really about,
you know,
it's like you really grow the project of one interaction at a time with
people in the community,
like one issue,
one PR at a time with people in the community like one issue one pr at a time and then uh yeah and then you just gotta iterate like crazy and put you know
passion and effort into it and then and then build if you want stuff to snowball you need to be
welcoming to a certain point uh so it's it's pretty subtle exercise i guess right like in
some ways too and if you're too too flexible too that could be a risk right like it's pretty subtle exercise, I guess, right? Like in some ways, too. And if you're too flexible, too, that could be a risk, right?
So often talk about the BDFL, the Benevolent Dictator for Life and open source projects.
You need to be hard.
Like you look at Linus Dorval and the Linux project.
Sometimes it's been like really very hard on people.
It's like, this is a bad idea.
This idea is shit.
We shouldn't do this.
A lot of those emails are very colorful.
Yeah. So I think I've always done it like super respectfully. I think that's super important, but you need that clear direction leadership and keep the fluff and the nonsense out of
the project, or at least to keep the mission and vision for the project, the scope of the
project, you know, somewhat clear. I think I could have done that better in the past
in some of these projects too,
where they became so big,
like Airflow is kind of everything too, right?
Like it does so much,
where now it's,
maybe it's not as good at certain things
because it's a little bit less focused.
But in other ways,
it has proven at this point that it's coped well
because, you know,
there's like more than 100,000 organizations
using, say, Airflow. So I think adoption speaks, speaks volume to, and it's like, you know,
maybe it's not in some ways, there's a new generation of data engineering tool that are,
I think, offer more guarantees, but they, they, they force you, they, they, they put more
constraint in your way. So it's a trade-off of like, with more constraint,
we can provide more guarantees.
So you need this, you know,
maybe there's more you need to do
to respect the framework.
But then if you do,
you get more value off of it.
Maybe Airflow had the right level
of like constraint and guarantees at the time
to be less data-driven.
So give me your tasks
and I'll run your tasks.
There's not necessarily like,
tell me everything about your data
and like define your schemas up front and you know if I don't know your
lineage I can't run this stuff I airflow just like okay give me some tasks I'll
run them for a lot of people at the time where they adopted it you know it worked
and that stuff is like extremely sticky like there's gonna be people if there's
no you know meltdown of the planet there's gonna be people running airflow
in a hundred years probably. The same way we have
mainframes today.
Nice, nice. I'm very curious
for Airflow
in the early days, once
you hit that inflection point where
you being the main creator can
no longer oversee all the
different aspects of it, where
you do need to rely on community
members, how do you go about picking the right people and all the different sort of aspects of it where you do need to rely on like a community members like
how do you go about like picking the right people and how do you like kind of foster that relationship
yeah so i think like fostering is just kind of one interaction at a time right so then being
welcoming and i'd say like spend more cycle with the related to the impact of the people that
they've had and their commitment so
a lot of people you know i've talked to so many people at so many companies and sometimes like
entire companies are like hey we want to put 12 people on this project like you need to work with
us so that we to enable us and in the end like they don't really they never really get started so
basically saying that they're like the best way to define we're gonna put energy is how much return you've gotten from putting energy
on individual organization and people you interact with so if someone says if
you're a manager and someone in your team is like super promising maybe you
have multiple interns and at one is just like every time you help them a little
bit it multiplies their impact capacity like spend more cycles with the people that support it yeah it does make sense though sometimes you
have the opposite where it's really natural to to go and help the people
struggling more or to spend more time with the people that make the most noise
for instance as opposed to the people that have the best track record so
that's a that's a thing there but it's the fundamental reason
why open source working so well is it's a meritocracy right and then and then
you've really got to embrace that so it's really you define empowerment and
maybe how much you get you help and support people based on the accumulated
merit and of course you need to provide a way for people to go from zero to some
amount of merit.
Right.
But then from,
from that point,
I think it's a,
it's a good governance model for a lot of things and definitely for software,
you know,
I'm like with meritocracy.
So let's change the U S. Capitalism didn't work.
Instead, we're going to do...
We're going to try meritocracy from here.
Smooth transition.
So we were chatting with Mitchell Hashimoto.
Yeah, the HashiCorp.
Exactly.
I thought he had this really cool framework
of dealing with pros
that he learned from working at the Apple store.
This whole acronym of APPL, I thought it was quite cool.
I'm sure there were a lot of trolls through Airflow and Preset.
How did you go about handling the problem, children?
I'm curious now to go back and listen to that episode but and to hear the framework
but uh I've seen a lot less like negative deception in on around the
communities I've been part of and around software and github than I ever and I
thought I would ask me like hey you gotta interact with like you know
thousands of people what percentage do you think are going to be a bad interaction and i would have said like oh there's so much like
stupidity in the world and you go out there in the world and you see so much like friction and
troll and just negative emotion everywhere but like to to me like in my professional life and
my open source life i've seen very little people that are very deceptive
or trolling or just acting like a douche.
We don't see a whole lot of that on repositories.
There's definitely some instances of it,
but they're few and far between.
I think we were saying, Max, is that data engineers
are very nice people.
Which I think that was very subtle, but very nice.
No, but even Silicon Valley very nice but even like Silicon Valley
if you work in Silicon Valley and the companies I worked at
has always been
like educated nice
people I think there's a handful
of people that I could look back
and say I've had numerous bad
interaction with this individual
I think good organizations
or at least the companies that I've been lucky to be at,
companies with really good culture,
with good immune system around just not letting an asshole
stick around for very long.
What about, say, people with good intentions,
but poor execution?
That's more the danger, I would say.
That's where I would say cut like, cut your losses there.
So if someone opens a very large and ambitious PR
and a little bit of guidance does not course correct them,
then, you know, you should probably spend more time and attention
on the PRs that are promising and the people that are learning faster.
So it goes with what I was saying before.
So, like like spend your time
where you can be helpful and where there is you know a good outcome without too much of your your
time and commitment and support so i would say like focus on the committers the contributors that
that you know i've demonstrated so far that they're doing well sometimes you might see from
the first pr the first draft you're like oh this is someone that far that they're doing well. Sometimes you might see from the first PR, the first draft,
you're like, oh, this is someone that knows what they're doing.
Or from the first review on, you can get a pretty good sense for it.
And what do you think are the attributes
that makes a really good open source contributor?
Yeah, I think it's a lot of it's a lot of things in some ways it's not that
different from a good software engineer right um one thing one thing that i say that might be an
interesting thought for people outside of everything that's already been said around
you know what are the good skills to be a good software engineer, data engineer. I was like, one thing that I think is extremely undervalued
in software engineering in general and technical position
is just like code orientation,
like just being able to find your way in a large code base.
Because like you end up on like these messy big repos
that are layered with stuff, right?
Like you join a company like Facebook,
there's like two
big mono repos with a hundred different ways of doing things and like you know thousands of
microservices similarly like you know you want to contribute to airflow superset or really any
open source project out there um where do you start like maybe you want to change the color
of a button like where is that button where do Where do you find it? Or you want to add
a field to a dashboard, that's the common data engineering problem I'm so disinterested in,
that's like, oh, you need to add, you need to carry, like we get this data from HubSpot and
there's a dashboard and this thing, and then we got to carry this new custom property all the way
through the pipeline to the end. What is that pipeline?
Like where is this dashboard pointing to?
So I think code orientation,
being able to find your way in a large repository
and figuring out what to do
and how to decipher everything that's been done before
and why is extremely key.
And that's some stuff you only get through building context
and spending time in cycles like doing this in one repo and a lot of
this is transposable right what the patterns you learn on a project or repository when you get to
a new one you might find your way a little bit better so it's like you're a navigation skill
that you you you got in some village or in some country they do transpose when you get to a new
village or a new country in some capacity yeah that's a nice analogy uh for me that i i was really glad that i
left my first company because of exactly that because uh the first company i was at was like
super green like i built like a lot of stuff that my team was working on so they kind of knew like
where all the things are like how things are set up and then so when i left and i went to the second
company's legacy code right and i was like oh my gosh how do you what is there shit everywhere like yeah i mean mine was also pretty shit but you know it's very different i don't like this
um so other than going to a new company like is there any recommendations i guess on like
engineers like wanting to get better at this like you know a code orientation yeah i mean i think
open source is always an outlet right because there's all this code out there that you can go and get lost into uh or try to find your way so then you have to find
something worthy to work on but that's usually pretty easy so to so maybe venturing off you know
is a good thing and one way to venture off is to transplant yourself like there's no better way
than like moving to a new country to force yourself to learn a new culture.
Right. And develop a lot of these like these skills that are really important in terms of orientation.
They figure out how to make things work in a different context.
But I would say, yeah, I think like generally to expose, to force yourself outside your comfort zone is a good thing.
Something that's interesting related to that is I learned so much at Facebook.
For me, it was a big kickoff of my career.
That's where I changed from old school to new school in some ways.
And I felt like that's where I first got really empowered.
But they were very limited in their own stack in a lot of ways.
They did things their own way.
And it's interesting. I say they they now i said we for so long but but but yeah so i think like to go to airbnb and then that forced us into the open source
ecosystem because there there was no like like at facebook's like their own like orchestration their own like build system
and there's a solution for everything and all these names that you learn in these
contexts of like how the microservices work together like all of that is not that useful
outside of Facebook you need to do that translation of like oh Kubernetes is the
equivalent of Tupperware right and then and but it's so much more useful to know Kubernetes than Tupperware
because Tupperware has no use outside of Facebook, right?
So I think like to go to a more open place
where you can, I mean, that's a great thing
for everyone's career to use more open source
because everything you learn there
is transposable to the rest of the world.
And then any work that you do in the open is also recognized and visible to the to to everyone
forever as opposed to you leave a proprietary company and uh and then no one knows there's
there's no track record that's visible to anyone so So they have to like interview you or, you know,
read your resume to figure out, you know,
your extended reputation.
So working on open source, always a great thing,
you know, getting involved in different things.
I assume working in open source,
like depending on the project,
it's like you pick your, it's like playing a game,
you're picking your difficulty level, right?
Like depending on like the culture
or like how much sort of infrastructure
is around that project
or like what the people are like, you know.
For maybe someone starting,
okay, I guess you can't pick airflow and preset.
That would be kind of cheap.
But outside of those two,
like are there like easier, you know,
projects to start with?
Maybe that's not in Rust.
You know. Yeah, well, so I think it's all relative easier you know projects to start with um maybe that's not in rust um you know
yeah well so i think it's all relative to where you're starting from but i think if
the the purpose is what's most important because if you're like hey i'm gonna go and try to
contribute i'm gonna pick a project that's cool like you know i'm gonna pick kubernetes and try
to go contribute something i think that's i think that's totally cool and it's good to pick Kubernetes and try to go contribute something. I think that's totally cool, and it's good to do it.
And it's probably good to get exposure to these big projects too
that have hundreds of contributors, of active contributors.
But I would say to scratch your own itch.
So you pip install a library, it doesn't quite do what you need it to do,
or you find a bug or it'd be nice if this method existed to go and and then start with just scratching your itch in someone
else's repository it's a really nice way to get involved because it will force this little bit
of exploration i was talking about developing the the the code navigation skill like okay what is
this okay i pip install this library.
I'm reading the documentation.
The method I need is not there.
The object that I'm using is, you know,
or the documentation is not clear on something.
Let me get to the bottom of this.
Then you have to get in that repo and figure out your orientation, that repo, clone it,
contribute something, interact with someone.
And I don't know, maybe eventually, like,
make a friend along the way.
And then maybe, like, oh oh i'm going to start using this for more things and advocate for it and contribute
more to it so so i would say find like don't try to find an artificial like easy mode yeah it's like
all right if you're a data engineer you use airflow every day i mean like really i know you
said to not use the just kidding but but I think it applies just as well, right?
Like whatever you use daily, you're like, oh, this is my toolkit.
Yeah.
I think it's really good to spend some time like using your, if you use an axe, you should spend some time sharpening your axe and sharpening your axe in software often might mean contributing to an open source project you use.
One thing that I would just add is like in general for engineers,
especially early in their career, like read open source course,
but like similar to what you said, if you're,
if you're using a specific toolkit, just go read about that.
Even if initially things just don't make sense,
read it over and over again and read code more than you end up writing.
Because I think just navigating
these large code bases helps one understand how to go about organizing their own code eventually
and then making it over time easier to find things like oh I need this thing where how does it do it
this thing oh I know where to look or at least I can navigate my way through it much faster
oh yeah is that how you got started with Kubernetes, Ronak?
For me, it was pretty much that.
So I work a lot on, primarily on Kubernetes
over the last four years.
And it's like a lot of things, with any project,
there are many ways to do a certain thing.
And then you're like, well, I don't know what the right way
to do this is.
And in many cases, like, well, why does know what the right way to do this is and in many cases
like well why does this thing operate this way the documentation says x is it really true so i think
developing that practice of always go to the court to see what it actually means and how it actually
works like that's where you will know the guarantees and documentation could be out of date
and navigating the code will also help at least me understand a lot more about
just the system and some of the principles that, um,
that the system has been built with.
And I'm sure it's true for other open source projects too.
It makes the leap of contributing feel like a lot easier too, right?
You read, you're in the code, you're like, Oh, I, you know,
I know where that would be and how it would change it to,
to do certain things.
And it definitely makes the system less scary.
Like you're like, oh, this is such a giant system.
How can I even go about understanding something?
It's like, I have this one question.
Let me try to get answer to that one question.
And along the way, you kind of know that I don't need to look at the other 10,000 lines.
I just care about these three.
So it's a matter of finding these three.
Yeah, a big thing is like the imposter syndrome.
A lot of people are like, oh, would they accept a PR from me?
Or like there's that, I think I've heard that so many times of like,
oh, wait, do they even accept PR?
And are they going to, someone going to make fun of me?
Or I don't know what exactly.
But I think people are just like, oh, there's a line here.
There's an imaginary line that I absolutely cannot cross.
And it might be a difficulty thing.
It might be a self-value or self-confidence issue or this or that.
But that's something that we need to shoot down actively in open source.
I don't know what the right place is.
I mean, the right place to do this is in a place like here to say,
like, if you contribute a PR,
people will be super
happy to receive it. If you open an issue,
you contribute a PR, even if it's
misinformed, incomplete,
draft, there's a nice
little draft button.
For discussion.
People will
be stoked to see a new face on
the repo.
As I said, I've seen very, very, very little negative interactions anywhere.
And then most of the places where I've seen it is entitled users.
People are like, I can't believe you don't offer a way to do this.
I'm like, just contribute it.
Seriously.
It's like, I'm not. What relationship do like seriously it's like uh like I'm not like wait
what relationship do we have I don't know you I don't know what the company you work for
you know it's it's fun it's interesting that you think we owe you that you know but but but I think
most people are of the bias of being overly cautious we need to fix that maybe if we fix
that we're going to get more entitlement which we're going to have to fight back on.
But overall, I think the problem we have is we need more empowerment,
be more welcoming for people to just be like, oh, yeah, I can totally,
if I pip install it, I can open a PR on it for sure.
I would just plug in a tool here.
It's not a tool, but a company.
So GitHub is amazing for hosting code, obviously,
and majority of open source repos live there. I would say for hosting code, obviously, and majority of open source
repos live there. I would say for code navigation, personally, I love source graph. Like from
a code navigation and search perspective, it's at least my go-to tool. And I know a
lot of my colleagues too who use it as well. Significantly better than what you get with
GitHub and makes code navigation much more
easier.
It's like you have an IDE on the browser when you're searching open source code and I think
GitHub can use some improvements there.
Yeah.
I mean, that sounds like a great thing, like knowing that code navigation is so important
and so potentially challenging, you know, having better tooling there really helps.
So for me, I guess a place where like I still use vim i don't use a lot of id i'm just old school because
this is why i like this guy and and i'm not saying that it's better it's just like bad
habits i've got all the muscle memory and like i like to have my shell really wait wait wait like
vim in shell or like vi mode within like VS code? Just trying to get the gauge.
Oh, no, definitely.
Definitely.
I'm in Tmux.
Oh, my gosh.
I'm in the show.
So I'm on the show.
But it's not because like I think every year I'm like,
I need to teach myself, you know, a proper IDE.
And then I don't because like I just refer.
I just use like my old method.
I get grip a lot.
The way I navigate code is kind of my own way of doing things,
using Shell and Bash just in general.
But I think the new world of software engineering,
you can have these great graphs of saying,
oh, this method is part of this class.'s the inheritance scheme and it's all visual you can
click around um you can do that in vim in some other ways like we all have different ways of
doing the same things and then what's hard is like if you have muscle memory you'll do it and
then send it yeah i don't think i mean i can't do it but i think like i will watch it on twitch
right it's just like someone who's like really really good because you just see like without clicking
anything right just with all the switches and yeah yeah it's uh it's quite nice yeah uh so
talking about open source projects like you you came to airbnb from facebook and obviously you
knew a lot of these tools existed and there were some gaps and you wanted to build something new.
For Airbnb, what was that pitch like?
I mean, you are a data engineer on the team.
You're like, hey, let's build a new tool and open source it.
And they said, yes, sounds like a great idea.
Well, there was a little bit of entitlement.
I joined with that premise.
So I was very happy at Facebook but I liked the idea of like moving often just to kind of force yourself to to experience new environments and stuff like that but I moved that was the premise on which I decided to join was like I'm going to get to work
on this problem and I'm likely to be able to open source the stuff. Everything goes well. And then actually between the job,
I took a break of about two weeks
and I started writing Airflow
and a vacation in Mexico on the beach.
I'm like, class dag.
Does it go capital D-A-G or D, you know, lowercase A-G?
So I remember those moments of saying like,
okay, well, what's the executor I should use?
You know, a local one, you know, a Celery one.
So what are we going to need?
That was, you know, pre-Kubernetes.
But yeah, so and then that landed on my own personal repo as I joined.
And then I was like, okay, let's try to use this stuff here and internally.
But I think the play for them I mean for
organization sponsoring open sources there's a bunch of things one is like
attracting talent like I would not I've gone without that guarantee but then I
think there's the huge like I think for it for a long time and matters it
mattered more or when the market gets more competitive it matters more but the the aura of
the engineering team is really important for these like engineering driven organization
so for airbnb to be like hey you know we do have these you know these 12 open source projects and
you can see everything we do um out in the open you might get to work on some of this stuff in the open if you join
because it's kind of exciting for for people so it's more on like talent acquisition retention
i think is the real thing because like the angle of like we we get like free contribution to
projects we care about i think is you can you play you play that card is like somewhat arguable but it generally at tip to
me has been not a super net positive though like I don't know or maybe that's early on in the
projects where I was most active but like the fact that say Airbnb is they they have a huge amount of
airflow and the fact that airflow is much greater than it would have been if it would have been like one person working on that problem in isolation is is a really positive thing now people
could come in and out airbnb and know the orchestration you know makes sense um so you
had these two open source projects i think both came out when you were in airbnb like uh airflow
and superset and you worked at Lyft after Airbnb and then you
eventually started preset.
Uh, what prompted you to start a company around superset?
Uh, well, so, so a bunch of things there, but I think the move from Airbnb to Lyft,
I was ready to, I, you know, I, I just get like this feeling I gotta keep moving,
you know, so after like three years at everything, I'm like, okay, where am I going next?
And I also wanted to plant the seed of like SuperSit, try it in different contexts,
and then plant that seed and create a team there too around it.
And I was just really excited to work on more geospatial real-time stuff.
It just seemed really fun to work on.
And then in terms of starting the company, so the VCs started approaching me,
it was in the fall of like, or probably before then, terms like starting the company so the vc started approaching me it was in the
in the fall of like or probably before then but like in 2018 and i think it was as a result of
things like you know hashi core being super successful confluent data bricks they're like oh
shit like um you know commercial open source could be a really good uh business model in some cases
so what are the open source projects out there that are getting tractions
that are popular?
So a lot of people found me as this became a pattern.
Martin Casado at E16Z, I think, was part of his thesis of data, open source.
We're going to make the modern data stack.
It's going to be open source.
So they found me, like, why don't you start a company I was like I don't know I love to just chase IPOs
and go from tech startup to tech startup and work on open source and that sounds a little stressful you know I'm not sure if I don't I don't really want to you know I don't have the
MBA type skills I'm not sure if I want to acquire them. And, but then I realized too that,
so I was in my,
I think I just turned 40.
And then I realized it was just like a really unique opportunity.
The VCs were like,
we want to fund you very,
very well.
And you don't need to write a business plan and all this stuff.
Like you just really,
you know,
you're in a position that a lot of wannabe founders wish they were in which is
like you get like a tub vc with a good investment when we skipped the seed round went straight to
series a so i was like i'm gonna regret it for my whole life if i don't take this opportunity
at this time very unique opportunity so and the the rest is is kind of history and it's it's been
super in terms of like I talked a few times already
about like taking yourself
out of your comfort zone
to learn new things
and maybe become
a better professional,
a better human as a result.
Like that was the single,
you know,
most important thing I've done
to just like transplant myself
to this different planet
and be a founder.
It's been super great.
It's been a fun ride,
but with intense ups and downs. downs any what was the biggest like surprises i guess compared to our
expectations i um i think one thing was uh you you kind of think the vcs are a little bit evil
and that the oversight is going to be very intense like basically like i'm i'm going to be very intense. Like basically, like I'm going to be,
the moment I take this money,
that's, you know,
lots of millions of dollars,
the heat is on, right?
Like people are going to be on my ass
looking for results at all time
and the pressure and the tension
is going to come from the top down.
And then what I realized was like,
not that there's no pressure,
the stakes are
extremely high but it's mostly self-inflicted and it's mostly like oh if i do well you know i can i
can do very very well you know if the company does well and then you make promises to you know every
investor but also every employee every customer every prospect so people little by little um the
the pressure goes up but it's not necessarily inflicted by the investors or the organization.
And it's surprising how much latitude you have as to how you run your business.
No one is like, oh, you got to do A or B.
It's just like, you're like, okay, well, you build a business, you want to build it, which is great.
So you definitely don't come across as the MBA types.
I mean, you see one right now as a policy of the company.
You're deep into the trenches and actually writing code, which is super impressive to
see.
How did you go about thinking about the business plan, for example?
Like, this is not a skill set that many engineers typically have.
And I would say engineers probably make the worst customers.
So how did you put yourself in that
shoe and say okay this is how I can build a business around this yeah I mean I think you
take like the the challenges of being a founder as a first class problem you go to the first
principle and and then you try to figure out how you should organize your time and where you need
to seek advice from and what's most important to work on
you know today this week this month this year um you know find the the right advisor and the right
people to surround yourself with so it's nothing uh nothing unusual on the on the answer here on
the on coding i think it's like maybe by the time we got to a certain scale, it just didn't make sense to do any of the coding that the company would depend on to succeed.
So then it's more like, oh, I could for the early on.
I was definitely very involved when we're less than 10, still acting as a very active engineer and PM. And then over time, I think,
distancing myself from that and more saying like,
okay, I code because I need that as an outlet
or it's good for my mental health
or something like that.
I was just like, you know,
or that's how I've been realizing myself
for the past 15 years
that to not have that as an outlet
that I know that I'm good at,
you know, it's difficult.
So then I distanced myself from that.
And there's like a long period where I didn't code at all there's just like too much stuff to to manage
and then um i think recently yes i decided to to spend more time to be to wear the cto hat
you know more often which includes like being being in a code base is very positive thing to
to be around so so yeah it's it's been
an interesting journey there and you learn things along the way like maybe things that
um you really love that you didn't know you might love and then some some things that you're like
okay i know i need to find someone to do that for me because i'm not that interested in that
wait was there something that you found that you really like you didn't know that you love
yeah i think or that and it goes with like what you're good at is generally what
you love so where you can assign passion but like i love product marketing in general now so just
like messaging positioning pricing packaging uh some of the strategy right like how do we
think and expose about the the um component of the product right to to the market so and that
can shape that that shapes not that can shape the roadmap the product direction too right so it's a
so maybe the layer of like if you have like scope mission vision scope for a product like the product
marketing can shape the direction of these things it can apply that stuff but it can also shape the direction of it I think like I was doing
it naturally in open source in some ways right like the airflow had a logo and a
one-liner and some way like what it does what it doesn't do right like there was
a if you think of a read me of an open source project if it's effectively
product marketing is how you present your project to the potential
community right so that's that's a thing i think i generally hate opera like hate operation but
it's a must do but like just things that are more repetitive uh financial planning is kind
of interesting like modeling stuff in excel you know i don't know but but there's the diversity
of the these things is what's what's
interesting and then management has never been super like my like i like you know spending time
with people too but like managing it's you know i prefer like coaching leadership than management
um when you said so free therapy uh so so when you were viewing coding in that lens, like how do you go about picking what to work on when you see it
as free therapy, what to, uh, what to work on?
Yeah.
Yeah.
I don't know.
It's a, it's a, it's a mix of like, if you use your Doug footing, your user
of the product, you can fix the little things that annoy you on a product.
So sometimes I don't really like CSS, but I hate, you know,
cricket pictures on the wall, my OCD triggers.
So like also some of that is really easy.
Usually it doesn't get in the way.
It's non-critical work.
And then about going more meta, like recently I got closer to the repo
and got really into developer experience, Docker, um, dot, uh, Docker compose,
um,
Helm chart,
like just making sure that the stuff we,
like all this,
the CICD stuff,
Docker builds,
like getting that stuff to actually run.
I think that's more of a getting more meta on the problem as you get more
senior.
I don't,
I don't know what it is.
I'm not that passionate about that.
I'm like,
I kind of hate get up actions and like,
it's,
it's really hard to work with, but maybe it's like the, the repo really needed it too.
Uh, I think that's cool.
The pattern you're describing.
I've, I've noticed that pattern, uh, obviously different context, different
scale, uh, but translating that to a tech lead instead of a founder, for example,
like a tech lead, bring in the trenches, initially designing exactly what the
system should do and should not do.
And then slowly they go out, kind of spread the word for it
within the company, outside the company, if it's open source.
And then eventually they start looking at it from a user standpoint
and they start fixing these things that you mentioned.
It's like, well, someone should be able to get clone
and get something built right away, for example,
improving the CSED pipeline.
So it makes sense to put on that user
hat and seeing how the product can be improved not just the engineering aspect of it yeah the
developer they're both like user experience and developer experience they're both like closer to
human which is kind of interesting and they're both like kind of if you think of like the
development pipeline or like what's you know middleware or they're they're both like one css
is like probably the most like veneer layer on the
application and then cicd is like the deep right of the back end so but but it's like extreme but
in some ways it comes full circle because in both cases one is about user experience for a developer
and then user experience for a user the product and in the middle maybe it's like because the
middle is like gets so tangled
up and complicated you're like if i touch anything and i'm like you know it's a bag of knowledge you
started touching it and then you gotta you know you gotta get deeper so that maybe that's why i'm
staying away from the the guts and you don't want to be in the blocking you need the therapy for the
free therapy that's it that's not that Then you need some, some extra real therapy.
So for like a listener,
right.
That's like,
yo,
this max guy is pretty cool.
I want to do what he does.
I want to make the next airflow,
but for LLMs,
like what do you,
okay.
I don't think it would be fair to ask you like what that would mean or like in terms of like what specific project,
but like,
what does that journey
look like right because like for you you join like big companies you see like how things work at scale
like what would your advice be to um this engineer well it's the first thing i would say don't be a
founder uh it's a it doesn't i think we we just like glorify the founder i think like it's it's
been like overly glorified and it's,
you know,
not all that fun.
I think it's a good thing to tell people to not be a founder.
Cause like you need the skill that you need to succeed as like this,
like delusional,
like,
I don't care what you think I'm going to do it regardless.
So we're kind of testing that with telling people that.
Yeah.
Yeah.
You need,
cause you need some of that.
So people need to break through like,
like screw this guy, Max, who sat down this podcast and my other people that told me not
to be a founder, I'm still going to be a founder. So I think it's good overall to put that message
out there. Unless you're like, you know, chewing glass and like, swimming in subzero water. And
like, it's just like, like this, this stuff that's like kind of more brutal.
But then in terms of like a question on like how to start a project, I think to be in tune with your environment and its needs and skills and that kind of holes in the market offering for open source specifically.
Like, you know, it wouldn't be great if there was a tool that allowed me to do
test-driven development
type stuff
with prompt engineering.
Oh, it doesn't exist.
Well, maybe I can create
a thing there.
So to keep,
I think it's better
when it comes from
within and scratching
your own itch
and the use case
and the hole that you observe
from a place
that you're very,
very familiar with, right?
So that's key.
For me,
there's two projects I'm looking at that are're very, very familiar with, right? So that's key. For me, there's two projects I'm looking at
that are like mostly in the early,
the kind of validation phase
that if people wanted to get involved
to kind of run with the thing or with the idea
or with some of the assets and the thinking
that I put together,
I can kind of pitch these projects.
Yes, we do.
And maybe as an example, too, of the kind of projects
that could be interesting to someone
and not necessarily like, oh, you know, take it and run
or come and collaborate with us and get it started.
But the first one is around semantic layer.
And I know like DBT is coming up with a good like metrics layer,
semantic layer.
That's super interesting.
We're integrating with it at preset,
but then we've had a hackathon and some set of new ideas that extend upon
this idea.
So the world really need a universal open source semantic layer that works
well and is simple.
So if you look, maybe we'll put the link in the show note, but I think it's that preset IO. open source semantic layer that works well and is simple.
So if you look, maybe we'll put the link in the show note,
but I think it's that preset IEO slash all stars to SQL all stars.
It's a semantic layer that works as a virtual database. So you put your semantics of like,
what are your metrics and dimension in which, you know,
you map your schema as an all the semantic layers.
So you say this table is going to be joined.
This column is a metric, this is a dimension,
and you organize all your stuff as code, as it should.
So similar to LookML from that perspective.
But then it's exposed as a virtual database.
So you have a table called star.
So you can say select stuff from star,
and star becomes, we would transpile your sequel behind the scenes
it's exposed as a large flat data set but behind the scene we transpile your sequel to do the
underlying joins that need to be done so it's a cool idea um and there's just some other ideas
around that around progressive adoptability and having the semantic layer guess your or help you in
terms of guessing the semantics of your schema that your schema already has information you
should be able to figure out which tables you can join and how what looks like a metric and
a dimension already so the semantic can be mostly inferred progressively adoptable it can enrich it
over time and still get value from day
zero and it's exposed as a virtual database so that every bi tool out there is already compatible
with it because it's a sql interface that was one idea yeah and i guess so because it's everything's
more codified like then it makes like the lm sort of uh improvements and stuff like with that yeah the lm and goal
on this semantic layer there's there's two angles i can think of now one is like well the lm can
help you define your semantic layer like set up your semantics but then if you do have your
semantics set up it's it becomes like a map for the lm to better better understand your schema and maybe instead of generating, you know, the
LLM might do better with the abstraction layer than it does without it.
Right, because the semantic layer is there to use to help business users self-serve in
BI tools.
So that means that if it helps business user self-serve, it should help an LLM, any form
of intelligence, you know, self-serve, it should help an LLM, any form of intelligence, you know,
self-serve. So that's one project. Yeah. I don't know if we want to get deeper into this one before
I talk about the other one. Sorry, sorry, sorry.
So like for people who might not know, what is a semantic layer?
Oh yeah. So the semantic layer is you can think of, well, so maybe I'll start with the purpose, but the purpose of the semantic layer is to help more people, namely like business users, self-serve
with their data without necessarily like knowing SQL or understanding as much of the underlying
data model.
So it bridges the gap between, so the semantic layer is a bit of a map of your database
and it maps the metrics and dimension
and more like business term, right?
So instead of having cryptic table names
and column names that you don't necessarily know how to join
and you need to write SQL to make sense of,
we expose a layer on top of that
that has the map of the physical layer
plus metrics and dimension, pretty label, pretty descriptions, sense of we we expose a layer on top of that that has the map of the physical layer plus
metrics and dimension pretty label pretty descriptions so that people can just drag
and drop these things and behind the scene we generate the sequel so that's a so that's the
general idea and a lot of these things historically have been part of the bi tool so they're proprietary
by nature and they're not shareable across tools so that's a bit of a problem if you use multiple BI tool, which most companies do.
So whatever you do in Lucre,
you can't really take with you on the Tableau or Superset
or any other tool.
So a lot of value there and having like an open source,
universal semantic layer,
define as code, exposes a virtual database.
So some cool ideas.
So there's a repo out there that's mostly just early
skiff holding of what the project might look like. And the other thing, I think that the
other need that we identified is something around data access policy as code. So every
company, and the bigger the company, I think the more of a reality that is. But every company needs to define groups of users
and what data access they should have.
And mostly for analytics purposes, right?
Like, of course, like every company needs to define
like rules and access of every user to different systems.
But this would be more targeted towards data access policy.
So you probably have some snowflakes, some big queries,
some database left and right, different BI tools.
And you can say certain people have access to certain tables
or columns or schemas.
And maybe this category of the user
has access to the schema, but not the PII in here or there.
So the other project would be called Governator
with the aesthetic of Arnold Schwarzenegger,
for sure, like somewhere there, the logo.
I think we have a logo that got mid-journey to produce.
Oh, that's nice.
So, and then Governator,
you would define your data access policy as code
and it can push and pull
or as code or as YAML, right, in a repo.
So it's like in a file system in a repo so you can know exactly who has access to what and who gave access to what and you know when you
change the access you do it in the repo so you know who you know gave access to what when
and then the there would probably be a cli to or ci cd type tool where you can push and pull
to different you know
sources and destinations so you could pull whatever you put in snowflake as
data access policy rule stamp it as code change it push it back or push it to
other systems and that solves a problem for us that we have which is it's good
for sometimes the bi tool needs to know about your access rules, so
we know which charts and dashboards to show you.
Right, right.
But I want you, but often you don't want to have a service account that has access
to everything and the BI tool enforce the data access policy, you want to enforce it
at both layer, because the user might want to go straight to the Snowflake UI or the
BigQuery UI console.
Right.
And then to have consistent, you know,
what they see, what they have access to.
So this tool would allow for people to manage,
you know, data access policy centrally
and synchronize across BI tools, databases,
and things like that.
I imagine the, like, the access simulation
will be pretty interesting to build out
and pretty core, right?
Because it's kind of like, I remember IAM on AWS,
it was such a pain to work with,
like before they actually kind of had more,
like made it user friendly to actually test out
like different policies and how that impacts things
instead of just, but yeah.
Yeah, when it's managed as code, you know,
and that's a theme in my career,
like for Promptimize for Airflow is like, you know,
data pipelines as code, but like when things
are managed as code, you can test them,
you can version them, you can version them, you can review them,
you can know who changed what when,
and then you can CICD a bunch of things.
So before you deploy your data access policy,
you can run a bunch of tests to make sure data scientists,
a simulated data scientist, make sure they don't have access
to financial data, for instance, or something like that. Make sure they cannot get push access to financial data for instance or something like that.
Make sure they cannot get push on this repo
because they're going to break everything.
So in the advice you mentioned to people,
don't be a founder
unless they want to chew glass
or swim in sub-zero temperatures.
So Guang asked the question about surprises,
but your advice makes me ask,
what have been the challenges of being a founder in this case, especially when you are building a company around an open source project?
Yeah, I mean, I think it's a lot of things.
I mean, the first thing is like, you know, taking yourself out of your comfort zone and learning a bunch of things that you may or may not be passionate about.
But it doesn't matter because it's what it takes to succeed.
Right. So then you're like okay um now
i'm putting myself in a situation where i have to learn new things that are really important really
core to this business succeeding and i may or may not like that so unless like you know you might
love these things or you'd love to be outside your comfort zone then uh that's a bit of a an exercise you don't know how it's going to go and then
this the stakes are very high so sometimes i don't know these these things like go together
but like reward and recognition is really important for everyone everyone wants to be
kind of recognized and rewarded uh but it it comes usually with, you know,
hardship or the difficulty level of what you take on, right?
So if you want to be a volunteer for an organization,
work part-time or be an advisor, it can be rewarding,
but it's not going to be as rewarding as, you know,
doing the real thing and, you know,
busting your ass for years and things like that.
But these things, like there's a danger in putting yourself in a situation where the stakes are so high,
you know, that the potential for reward recognition is very high.
But it's also, like, the risk of, like, you know, coming short on things is very high, too.
And the stakes are really high so unless you you
want to be really hardcore like a more progressive approach in life to things is probably better like
climbing the ranks at some companies and like making sure along the way that it's like i i'm
still comfortable managing a larger team or being a little bit closer to being an executive at a
company i still like this progression.
I'm going to take another step in that direction.
As a founder, you're like, okay, let's just go
and throw myself in a completely different area.
And it should be fun, right?
Or I might get rich.
I don't know what you both really think.
But I would say a more progressive approach
to the things you do in life is probably better.
Unless, yeah, I mean, there's a lot of reasons why you want to do it, that we might want to do it, more progressive approach to the things you do in life is probably better um unless yeah i mean
there's there's a lot of reasons why you want to do it we might want to do it and people do do it
uh but uh but think about it you know cautiously before doing doing this big jump good advice good
advice uh by the way you mentioned you generated this i'm doing a hard pivot here uh you mentioned
you generated the governor of logo using MidJourney. So we spent
a lot of time trying to generate
logos for our podcast using
a bunch of these AI tools.
I spent significant evenings on
just generating logos on
DALI or on ChatGPT.
One thing which I found was like
you give them a prompt
and they come up with a logo, which kind of is okay,
but if you ask them to change something
or put in a text, they are terrible.
If I say spell software and have it in the logo,
they never do that.
I never got the system to do that.
Yeah, they're learning how to write still.
So yes, they have to ask,
do not write anything,
and then you add it in Photoshop
or you use what it's good at,
and then you add the layer of what you're good at.
See, we just need to get better at prompting
or just better working with these tools instead of're good at. See, we just need to get better at prompting instead
or just better working with these tools instead of just,
sorry, what was it?
I was going to show the logo that I made.
If I could.
Whoa, that's pretty cool.
That's pretty good.
That is really good.
Wow.
You got mid-journey to do that for you?
I think it might have been Chad's.
But there was definitely, it would be interesting to pull this session because I did fight with it quite a bit because I wanted like both the Governator, you know, I wanted both Arnie and the database logos.
That was pretty tricky, but I got it to do this thing.
I wanted the red eye because at some point it comes up with a red eye and like, I love the red eye.
I want it.
Then it stops doing it.
So you have to ask for it and doesn't want to do it but uh there was definitely like a bunch of prompts
here for me to get to this so that's if you want to check it out so that's on my personal github
the project is kind of early it has mostly just information architecture and some of the ideas
are behind it yeah and the other one is called um all stars That's on the preset one. A little bit further.
It uses some of the, I don't know if it's legal,
but that's the Mario font here.
But that's a pretty fun one to read.
So if you're interested in semantic layers
or the future of them,
that's probably a fun repo to read.
We'll do, we'll do.
Well, this has been an awesome conversation, Max.
Thank you so much for taking the time.
This was super amazing for us.
This was super fun.
Yeah, it's like really interesting topics overall.
So glad I showed up on the show.
And we'll add links to all the things we talked about.
Presets, Superset, Airflow, the new repos that you mentioned, Goer Nature, All Stars,
so that people can go find these projects and hopefully contribute to them.
And we'll also link to your profile, of course, so that people can find you as well.
That'd be great if people wanted to run with these projects, you know, because like right
now I don't have the bandwidth.
I want to work on these things, but it's like I don't have like we're pretty busy.
I appreciate working on a bunch of other things, too, but it's likely I'll don't have like, we're pretty busy. I appreciate working on a bunch of other things too,
but it's likely I'll work on some of these things.
So if people wanted to get a little bit closer and help, you know,
lead on these projects, it could be fun.
If either of these projects get a contribution to this podcast,
I would count that as a win.
Yeah.
Or even, I mean, I think like in a lot of like,
a lot of like what's great about some of these,
some of these things and my intention and putting the read me out there is like, oh, we might build this stuff.
But I think it's it's also just the ideas, you know, and the empowerment.
Like you just put the idea out there and someone might run with it.
It might be like, you know what?
I have a different twist on this, too.
So I don't just think like code should be open and free.
It's like ideas, too.
I'm very i'm usually
against like intellectual property i just think these two words don't fit well together like
intellectual and property no it's like let the ideas want to be free you know code wants to be
free too so let's uh make that a reality you know for sure i like that well thanks so much cool
thank you so much this was amazing take care
hey
thank you so much
for listening to the show
you can subscribe
wherever you get
your podcasts
and learn more about us
at
softwaremisadventures.com
you can also
write to us
at
hello
at
softwaremisadventures.com
we would love to hear
from you until next time.
Take care.