Software Misadventures - Building 2 Iconic OSSs Back-to-Back | Maxime Beauchemin (Airflow, Preset)

Episode Date: May 21, 2024

If you’ve worked on data problems, you probably have heard of Airflow and Superset, two powerful tools that have cemented their place in the data ecosystem. Building successful open-source software ...is no easy feat, and even fewer engineers have done this back to back. In part 2 of the conversation, we talk about Max’s journey in open source. Segments:    (00:03:27) “Project-Community Fit” in Open Source    (00:08:31) Fostering Relationships in Open Source    (00:10:58) Dealing with Trolls    (00:13:40) Attributes of Good Open Source Contributors    (00:20:01) How to Get Started with Contributing    (00:27:58) Origin Stories of Airflow and Superset    (00:33:27) Biggest Surprise since Founding a VC-backed Company?    (00:38:47) Picking What to Work On    (00:41:46) Advice to Engineers for Building the Next Airflow/Superset?    (00:42:35) The 2 New Open Source Projects that Max is Starting    (00:52:10) Challenges of Being a Founder    (00:57:38) Open Sourcing Ideas Show Notes: Part 1 of our conversation: https://softwaremisadventures.com/p/maxime-beauchemin-llm-ready Max on LinkedIn: https://www.linkedin.com/in/maximebeauchemin/ SQL All Stars: https://github.com/preset-io/allstars Governator: https://github.com/mistercrunch/governator Stay in touch: 👋 Make Ronak’s day by leaving us a review and let us know who we should talk to next! hello@softwaremisadventures.com

Transcript
Discussion (0)
Starting point is 00:00:00 What do you think are the attributes that makes like a really good open source contributor? One thing that I think is extremely undervalued in software engineering in general and technical position is just like code orientation, like just being able to find your way in a large code base. Because like you end up on like these messy big repos that are layered with stuff, right? Like you join a company like Facebook, there's like two big monorepos with a hundred different ways of doing things and like, you know, thousands of microservices. Similarly, like, you know,
Starting point is 00:00:32 you want to contribute to Airflow Super Set or really any open source project out there. Where do you start? Like maybe you want to change the color of a button. Like where is that button? Where do you find it? Or you want to add a field to a dashboard that's the common data engineering problem i'm so disinterested in that's like oh
Starting point is 00:00:50 we get this data from hubspot and there's a dashboard and then we got to carry this new custom property all the way through the pipeline to the end uh what is that pipeline like where's this dashboard pointing to so i think code, being able to find your way in a large repository and figuring out what to do and how to decipher everything that's been done before and why is extremely key. For maybe someone starting, are there easier projects to start with?
Starting point is 00:01:18 Maybe that's not in Rust. Yeah, well, so I think it's all relative to where you're starting from. But I think the purpose is what's most important. So you could install a library. It doesn't quite do what you need it to do. Or you find a bug. It would be nice if this method existed to go and then start with just scratching your itch in someone else's repository. It's a really nice way to get involved because it will force this little bit of exploration
Starting point is 00:01:47 I was talking about, developing the code navigation skill. Like, okay, I pip install this library. I'm reading the documentation. The method I need is not there. The documentation is not clear on something. Let me get to the bottom of this. Then you have to get in that repo
Starting point is 00:02:02 and figure out your orientation, that repo, clone it, contribute something, and track with someone and i don't know maybe eventually like make a friend along the way i'm sure there were a lot of pros through like airflow and preset like how did you go about handling problem children Welcome to the Software Misadventures podcast. We are your hosts, Ronak and Gwan. As engineers, we are interested in not just the technologies, but the people and the stories behind them. So on this show, we try to scratch our own edge by sitting down with engineers, founders, and investors to chat about their path, lessons they have learned, and of course, the misadventures along the way.
Starting point is 00:02:47 Talking about the open source projects, now you've built successfully a lot of open source projects. I mean, I think at least as far as SuperCert and Airflow are concerned, they've become more or less the de facto choices in their own domains. Like, I know a lot of companies using Airflow for workflow management, for example. What used to be Azkaban back in, I think, 2011, 12, is getting completely replaced. I was going to say Luigi, but wow, Azkaban. Or Luigi.
Starting point is 00:03:14 Uzi. Dating yourself there. Yeah, I know, I know. So what I meant to say is, you started these open source projects, which have become super successful, and you've done it more than once. What are some of the common ingredients there? Yeah, I mean, it's a tough question. I think there's a lot of parallels to be drawn with, you know, so I call it like project community fit is like product market fit. And product market fit is, you know, a multidimensional, a very complicated thing.
Starting point is 00:03:42 They're just like timing like you know what's what's the say them i'll be talking about product market fit i think there's and then we can um translate some of these learnings to project community fit like like a fit of an open source project but in pmf um you know i think yeah timing is a huge thing. Like what is the market? How ripe is it? How ripe is it to be disrupted? And then in what way? What's the minimum viable product, right? Like, and what's the new,
Starting point is 00:04:15 what's the TAM of that market? And how much can you put towards it? So at first you need clearly something that there's a need for, and then you need to address an unaddressed need somewhere out there. So for Airflow, there's no really good data orchestrator with data pipelines as code. Though arguably there was some before.
Starting point is 00:04:42 I think Uzi was XML. You mentioned Luigi, Azkaban. though like arguably there was some before like I think Uzi was like XML you mentioned like Luigi Azkaban and then there's a whole generation of like GUI driven tools like Informatica and Data Stage and other things but there's that there's also like you know kind of the founder the the founder fit where you know like like I'd been in data engineering for probably 10 10-15 years at the time or like in the ancestor in data engineering for probably 10 10 15 years at the time or like in the ancestor of data engineering and then at the place that was pushing a big change in that area so i think i was like the right person at the right time with the right
Starting point is 00:05:16 ideas where those those stars you know you can't really make them align they just align like you know i was because i had been and kind. They just align. Like, you know, cause I had been in, in kind of sitting through some of that. So I think there's some of that, that you can't recreate, but then in terms of like, once you have the project started in some form of like MVP and a little bit of traction,
Starting point is 00:05:36 I think it's really about, you know, it's like you really grow the project of one interaction at a time with people in the community, like one issue, one PR at a time with people in the community like one issue one pr at a time and then uh yeah and then you just gotta iterate like crazy and put you know passion and effort into it and then and then build if you want stuff to snowball you need to be welcoming to a certain point uh so it's it's pretty subtle exercise i guess right like in
Starting point is 00:06:03 some ways too and if you're too too flexible too that could be a risk right like it's pretty subtle exercise, I guess, right? Like in some ways, too. And if you're too flexible, too, that could be a risk, right? So often talk about the BDFL, the Benevolent Dictator for Life and open source projects. You need to be hard. Like you look at Linus Dorval and the Linux project. Sometimes it's been like really very hard on people. It's like, this is a bad idea. This idea is shit. We shouldn't do this.
Starting point is 00:06:23 A lot of those emails are very colorful. Yeah. So I think I've always done it like super respectfully. I think that's super important, but you need that clear direction leadership and keep the fluff and the nonsense out of the project, or at least to keep the mission and vision for the project, the scope of the project, you know, somewhat clear. I think I could have done that better in the past in some of these projects too, where they became so big, like Airflow is kind of everything too, right? Like it does so much,
Starting point is 00:06:50 where now it's, maybe it's not as good at certain things because it's a little bit less focused. But in other ways, it has proven at this point that it's coped well because, you know, there's like more than 100,000 organizations using, say, Airflow. So I think adoption speaks, speaks volume to, and it's like, you know,
Starting point is 00:07:11 maybe it's not in some ways, there's a new generation of data engineering tool that are, I think, offer more guarantees, but they, they, they force you, they, they, they put more constraint in your way. So it's a trade-off of like, with more constraint, we can provide more guarantees. So you need this, you know, maybe there's more you need to do to respect the framework. But then if you do,
Starting point is 00:07:32 you get more value off of it. Maybe Airflow had the right level of like constraint and guarantees at the time to be less data-driven. So give me your tasks and I'll run your tasks. There's not necessarily like, tell me everything about your data
Starting point is 00:07:43 and like define your schemas up front and you know if I don't know your lineage I can't run this stuff I airflow just like okay give me some tasks I'll run them for a lot of people at the time where they adopted it you know it worked and that stuff is like extremely sticky like there's gonna be people if there's no you know meltdown of the planet there's gonna be people running airflow in a hundred years probably. The same way we have mainframes today. Nice, nice. I'm very curious
Starting point is 00:08:10 for Airflow in the early days, once you hit that inflection point where you being the main creator can no longer oversee all the different aspects of it, where you do need to rely on community members, how do you go about picking the right people and all the different sort of aspects of it where you do need to rely on like a community members like
Starting point is 00:08:25 how do you go about like picking the right people and how do you like kind of foster that relationship yeah so i think like fostering is just kind of one interaction at a time right so then being welcoming and i'd say like spend more cycle with the related to the impact of the people that they've had and their commitment so a lot of people you know i've talked to so many people at so many companies and sometimes like entire companies are like hey we want to put 12 people on this project like you need to work with us so that we to enable us and in the end like they don't really they never really get started so basically saying that they're like the best way to define we're gonna put energy is how much return you've gotten from putting energy
Starting point is 00:09:08 on individual organization and people you interact with so if someone says if you're a manager and someone in your team is like super promising maybe you have multiple interns and at one is just like every time you help them a little bit it multiplies their impact capacity like spend more cycles with the people that support it yeah it does make sense though sometimes you have the opposite where it's really natural to to go and help the people struggling more or to spend more time with the people that make the most noise for instance as opposed to the people that have the best track record so that's a that's a thing there but it's the fundamental reason
Starting point is 00:09:48 why open source working so well is it's a meritocracy right and then and then you've really got to embrace that so it's really you define empowerment and maybe how much you get you help and support people based on the accumulated merit and of course you need to provide a way for people to go from zero to some amount of merit. Right. But then from, from that point,
Starting point is 00:10:12 I think it's a, it's a good governance model for a lot of things and definitely for software, you know, I'm like with meritocracy. So let's change the U S. Capitalism didn't work. Instead, we're going to do... We're going to try meritocracy from here. Smooth transition.
Starting point is 00:10:33 So we were chatting with Mitchell Hashimoto. Yeah, the HashiCorp. Exactly. I thought he had this really cool framework of dealing with pros that he learned from working at the Apple store. This whole acronym of APPL, I thought it was quite cool. I'm sure there were a lot of trolls through Airflow and Preset.
Starting point is 00:10:54 How did you go about handling the problem, children? I'm curious now to go back and listen to that episode but and to hear the framework but uh I've seen a lot less like negative deception in on around the communities I've been part of and around software and github than I ever and I thought I would ask me like hey you gotta interact with like you know thousands of people what percentage do you think are going to be a bad interaction and i would have said like oh there's so much like stupidity in the world and you go out there in the world and you see so much like friction and troll and just negative emotion everywhere but like to to me like in my professional life and
Starting point is 00:11:41 my open source life i've seen very little people that are very deceptive or trolling or just acting like a douche. We don't see a whole lot of that on repositories. There's definitely some instances of it, but they're few and far between. I think we were saying, Max, is that data engineers are very nice people. Which I think that was very subtle, but very nice.
Starting point is 00:12:04 No, but even Silicon Valley very nice but even like Silicon Valley if you work in Silicon Valley and the companies I worked at has always been like educated nice people I think there's a handful of people that I could look back and say I've had numerous bad interaction with this individual
Starting point is 00:12:20 I think good organizations or at least the companies that I've been lucky to be at, companies with really good culture, with good immune system around just not letting an asshole stick around for very long. What about, say, people with good intentions, but poor execution? That's more the danger, I would say.
Starting point is 00:12:43 That's where I would say cut like, cut your losses there. So if someone opens a very large and ambitious PR and a little bit of guidance does not course correct them, then, you know, you should probably spend more time and attention on the PRs that are promising and the people that are learning faster. So it goes with what I was saying before. So, like like spend your time where you can be helpful and where there is you know a good outcome without too much of your your
Starting point is 00:13:14 time and commitment and support so i would say like focus on the committers the contributors that that you know i've demonstrated so far that they're doing well sometimes you might see from the first pr the first draft you're like oh this is someone that far that they're doing well. Sometimes you might see from the first PR, the first draft, you're like, oh, this is someone that knows what they're doing. Or from the first review on, you can get a pretty good sense for it. And what do you think are the attributes that makes a really good open source contributor? Yeah, I think it's a lot of it's a lot of things in some ways it's not that
Starting point is 00:13:48 different from a good software engineer right um one thing one thing that i say that might be an interesting thought for people outside of everything that's already been said around you know what are the good skills to be a good software engineer, data engineer. I was like, one thing that I think is extremely undervalued in software engineering in general and technical position is just like code orientation, like just being able to find your way in a large code base. Because like you end up on like these messy big repos that are layered with stuff, right?
Starting point is 00:14:23 Like you join a company like Facebook, there's like two big mono repos with a hundred different ways of doing things and like you know thousands of microservices similarly like you know you want to contribute to airflow superset or really any open source project out there um where do you start like maybe you want to change the color of a button like where is that button where do Where do you find it? Or you want to add a field to a dashboard, that's the common data engineering problem I'm so disinterested in, that's like, oh, you need to add, you need to carry, like we get this data from HubSpot and
Starting point is 00:14:54 there's a dashboard and this thing, and then we got to carry this new custom property all the way through the pipeline to the end. What is that pipeline? Like where is this dashboard pointing to? So I think code orientation, being able to find your way in a large repository and figuring out what to do and how to decipher everything that's been done before and why is extremely key.
Starting point is 00:15:18 And that's some stuff you only get through building context and spending time in cycles like doing this in one repo and a lot of this is transposable right what the patterns you learn on a project or repository when you get to a new one you might find your way a little bit better so it's like you're a navigation skill that you you you got in some village or in some country they do transpose when you get to a new village or a new country in some capacity yeah that's a nice analogy uh for me that i i was really glad that i left my first company because of exactly that because uh the first company i was at was like super green like i built like a lot of stuff that my team was working on so they kind of knew like
Starting point is 00:15:55 where all the things are like how things are set up and then so when i left and i went to the second company's legacy code right and i was like oh my gosh how do you what is there shit everywhere like yeah i mean mine was also pretty shit but you know it's very different i don't like this um so other than going to a new company like is there any recommendations i guess on like engineers like wanting to get better at this like you know a code orientation yeah i mean i think open source is always an outlet right because there's all this code out there that you can go and get lost into uh or try to find your way so then you have to find something worthy to work on but that's usually pretty easy so to so maybe venturing off you know is a good thing and one way to venture off is to transplant yourself like there's no better way than like moving to a new country to force yourself to learn a new culture.
Starting point is 00:16:47 Right. And develop a lot of these like these skills that are really important in terms of orientation. They figure out how to make things work in a different context. But I would say, yeah, I think like generally to expose, to force yourself outside your comfort zone is a good thing. Something that's interesting related to that is I learned so much at Facebook. For me, it was a big kickoff of my career. That's where I changed from old school to new school in some ways. And I felt like that's where I first got really empowered. But they were very limited in their own stack in a lot of ways.
Starting point is 00:17:22 They did things their own way. And it's interesting. I say they they now i said we for so long but but but yeah so i think like to go to airbnb and then that forced us into the open source ecosystem because there there was no like like at facebook's like their own like orchestration their own like build system and there's a solution for everything and all these names that you learn in these contexts of like how the microservices work together like all of that is not that useful outside of Facebook you need to do that translation of like oh Kubernetes is the equivalent of Tupperware right and then and but it's so much more useful to know Kubernetes than Tupperware because Tupperware has no use outside of Facebook, right?
Starting point is 00:18:10 So I think like to go to a more open place where you can, I mean, that's a great thing for everyone's career to use more open source because everything you learn there is transposable to the rest of the world. And then any work that you do in the open is also recognized and visible to the to to everyone forever as opposed to you leave a proprietary company and uh and then no one knows there's there's no track record that's visible to anyone so So they have to like interview you or, you know,
Starting point is 00:18:45 read your resume to figure out, you know, your extended reputation. So working on open source, always a great thing, you know, getting involved in different things. I assume working in open source, like depending on the project, it's like you pick your, it's like playing a game, you're picking your difficulty level, right?
Starting point is 00:19:03 Like depending on like the culture or like how much sort of infrastructure is around that project or like what the people are like, you know. For maybe someone starting, okay, I guess you can't pick airflow and preset. That would be kind of cheap. But outside of those two,
Starting point is 00:19:18 like are there like easier, you know, projects to start with? Maybe that's not in Rust. You know. Yeah, well, so I think it's all relative easier you know projects to start with um maybe that's not in rust um you know yeah well so i think it's all relative to where you're starting from but i think if the the purpose is what's most important because if you're like hey i'm gonna go and try to contribute i'm gonna pick a project that's cool like you know i'm gonna pick kubernetes and try to go contribute something i think that's i think that's totally cool and it's good to pick Kubernetes and try to go contribute something. I think that's totally cool, and it's good to do it.
Starting point is 00:19:46 And it's probably good to get exposure to these big projects too that have hundreds of contributors, of active contributors. But I would say to scratch your own itch. So you pip install a library, it doesn't quite do what you need it to do, or you find a bug or it'd be nice if this method existed to go and and then start with just scratching your itch in someone else's repository it's a really nice way to get involved because it will force this little bit of exploration i was talking about developing the the the code navigation skill like okay what is this okay i pip install this library.
Starting point is 00:20:26 I'm reading the documentation. The method I need is not there. The object that I'm using is, you know, or the documentation is not clear on something. Let me get to the bottom of this. Then you have to get in that repo and figure out your orientation, that repo, clone it, contribute something, interact with someone. And I don't know, maybe eventually, like,
Starting point is 00:20:43 make a friend along the way. And then maybe, like, oh oh i'm going to start using this for more things and advocate for it and contribute more to it so so i would say find like don't try to find an artificial like easy mode yeah it's like all right if you're a data engineer you use airflow every day i mean like really i know you said to not use the just kidding but but I think it applies just as well, right? Like whatever you use daily, you're like, oh, this is my toolkit. Yeah. I think it's really good to spend some time like using your, if you use an axe, you should spend some time sharpening your axe and sharpening your axe in software often might mean contributing to an open source project you use.
Starting point is 00:21:25 One thing that I would just add is like in general for engineers, especially early in their career, like read open source course, but like similar to what you said, if you're, if you're using a specific toolkit, just go read about that. Even if initially things just don't make sense, read it over and over again and read code more than you end up writing. Because I think just navigating these large code bases helps one understand how to go about organizing their own code eventually
Starting point is 00:21:51 and then making it over time easier to find things like oh I need this thing where how does it do it this thing oh I know where to look or at least I can navigate my way through it much faster oh yeah is that how you got started with Kubernetes, Ronak? For me, it was pretty much that. So I work a lot on, primarily on Kubernetes over the last four years. And it's like a lot of things, with any project, there are many ways to do a certain thing.
Starting point is 00:22:21 And then you're like, well, I don't know what the right way to do this is. And in many cases, like, well, why does know what the right way to do this is and in many cases like well why does this thing operate this way the documentation says x is it really true so i think developing that practice of always go to the court to see what it actually means and how it actually works like that's where you will know the guarantees and documentation could be out of date and navigating the code will also help at least me understand a lot more about just the system and some of the principles that, um,
Starting point is 00:22:48 that the system has been built with. And I'm sure it's true for other open source projects too. It makes the leap of contributing feel like a lot easier too, right? You read, you're in the code, you're like, Oh, I, you know, I know where that would be and how it would change it to, to do certain things. And it definitely makes the system less scary. Like you're like, oh, this is such a giant system.
Starting point is 00:23:09 How can I even go about understanding something? It's like, I have this one question. Let me try to get answer to that one question. And along the way, you kind of know that I don't need to look at the other 10,000 lines. I just care about these three. So it's a matter of finding these three. Yeah, a big thing is like the imposter syndrome. A lot of people are like, oh, would they accept a PR from me?
Starting point is 00:23:32 Or like there's that, I think I've heard that so many times of like, oh, wait, do they even accept PR? And are they going to, someone going to make fun of me? Or I don't know what exactly. But I think people are just like, oh, there's a line here. There's an imaginary line that I absolutely cannot cross. And it might be a difficulty thing. It might be a self-value or self-confidence issue or this or that.
Starting point is 00:23:53 But that's something that we need to shoot down actively in open source. I don't know what the right place is. I mean, the right place to do this is in a place like here to say, like, if you contribute a PR, people will be super happy to receive it. If you open an issue, you contribute a PR, even if it's misinformed, incomplete,
Starting point is 00:24:15 draft, there's a nice little draft button. For discussion. People will be stoked to see a new face on the repo. As I said, I've seen very, very, very little negative interactions anywhere. And then most of the places where I've seen it is entitled users.
Starting point is 00:24:37 People are like, I can't believe you don't offer a way to do this. I'm like, just contribute it. Seriously. It's like, I'm not. What relationship do like seriously it's like uh like I'm not like wait what relationship do we have I don't know you I don't know what the company you work for you know it's it's fun it's interesting that you think we owe you that you know but but but I think most people are of the bias of being overly cautious we need to fix that maybe if we fix that we're going to get more entitlement which we're going to have to fight back on.
Starting point is 00:25:06 But overall, I think the problem we have is we need more empowerment, be more welcoming for people to just be like, oh, yeah, I can totally, if I pip install it, I can open a PR on it for sure. I would just plug in a tool here. It's not a tool, but a company. So GitHub is amazing for hosting code, obviously, and majority of open source repos live there. I would say for hosting code, obviously, and majority of open source repos live there. I would say for code navigation, personally, I love source graph. Like from
Starting point is 00:25:32 a code navigation and search perspective, it's at least my go-to tool. And I know a lot of my colleagues too who use it as well. Significantly better than what you get with GitHub and makes code navigation much more easier. It's like you have an IDE on the browser when you're searching open source code and I think GitHub can use some improvements there. Yeah. I mean, that sounds like a great thing, like knowing that code navigation is so important
Starting point is 00:25:59 and so potentially challenging, you know, having better tooling there really helps. So for me, I guess a place where like I still use vim i don't use a lot of id i'm just old school because this is why i like this guy and and i'm not saying that it's better it's just like bad habits i've got all the muscle memory and like i like to have my shell really wait wait wait like vim in shell or like vi mode within like VS code? Just trying to get the gauge. Oh, no, definitely. Definitely. I'm in Tmux.
Starting point is 00:26:28 Oh, my gosh. I'm in the show. So I'm on the show. But it's not because like I think every year I'm like, I need to teach myself, you know, a proper IDE. And then I don't because like I just refer. I just use like my old method. I get grip a lot.
Starting point is 00:26:47 The way I navigate code is kind of my own way of doing things, using Shell and Bash just in general. But I think the new world of software engineering, you can have these great graphs of saying, oh, this method is part of this class.'s the inheritance scheme and it's all visual you can click around um you can do that in vim in some other ways like we all have different ways of doing the same things and then what's hard is like if you have muscle memory you'll do it and then send it yeah i don't think i mean i can't do it but i think like i will watch it on twitch
Starting point is 00:27:23 right it's just like someone who's like really really good because you just see like without clicking anything right just with all the switches and yeah yeah it's uh it's quite nice yeah uh so talking about open source projects like you you came to airbnb from facebook and obviously you knew a lot of these tools existed and there were some gaps and you wanted to build something new. For Airbnb, what was that pitch like? I mean, you are a data engineer on the team. You're like, hey, let's build a new tool and open source it. And they said, yes, sounds like a great idea.
Starting point is 00:27:57 Well, there was a little bit of entitlement. I joined with that premise. So I was very happy at Facebook but I liked the idea of like moving often just to kind of force yourself to to experience new environments and stuff like that but I moved that was the premise on which I decided to join was like I'm going to get to work on this problem and I'm likely to be able to open source the stuff. Everything goes well. And then actually between the job, I took a break of about two weeks and I started writing Airflow and a vacation in Mexico on the beach. I'm like, class dag.
Starting point is 00:28:36 Does it go capital D-A-G or D, you know, lowercase A-G? So I remember those moments of saying like, okay, well, what's the executor I should use? You know, a local one, you know, a Celery one. So what are we going to need? That was, you know, pre-Kubernetes. But yeah, so and then that landed on my own personal repo as I joined. And then I was like, okay, let's try to use this stuff here and internally.
Starting point is 00:29:01 But I think the play for them I mean for organization sponsoring open sources there's a bunch of things one is like attracting talent like I would not I've gone without that guarantee but then I think there's the huge like I think for it for a long time and matters it mattered more or when the market gets more competitive it matters more but the the aura of the engineering team is really important for these like engineering driven organization so for airbnb to be like hey you know we do have these you know these 12 open source projects and you can see everything we do um out in the open you might get to work on some of this stuff in the open if you join
Starting point is 00:29:45 because it's kind of exciting for for people so it's more on like talent acquisition retention i think is the real thing because like the angle of like we we get like free contribution to projects we care about i think is you can you play you play that card is like somewhat arguable but it generally at tip to me has been not a super net positive though like I don't know or maybe that's early on in the projects where I was most active but like the fact that say Airbnb is they they have a huge amount of airflow and the fact that airflow is much greater than it would have been if it would have been like one person working on that problem in isolation is is a really positive thing now people could come in and out airbnb and know the orchestration you know makes sense um so you had these two open source projects i think both came out when you were in airbnb like uh airflow
Starting point is 00:30:40 and superset and you worked at Lyft after Airbnb and then you eventually started preset. Uh, what prompted you to start a company around superset? Uh, well, so, so a bunch of things there, but I think the move from Airbnb to Lyft, I was ready to, I, you know, I, I just get like this feeling I gotta keep moving, you know, so after like three years at everything, I'm like, okay, where am I going next? And I also wanted to plant the seed of like SuperSit, try it in different contexts, and then plant that seed and create a team there too around it.
Starting point is 00:31:13 And I was just really excited to work on more geospatial real-time stuff. It just seemed really fun to work on. And then in terms of starting the company, so the VCs started approaching me, it was in the fall of like, or probably before then, terms like starting the company so the vc started approaching me it was in the in the fall of like or probably before then but like in 2018 and i think it was as a result of things like you know hashi core being super successful confluent data bricks they're like oh shit like um you know commercial open source could be a really good uh business model in some cases so what are the open source projects out there that are getting tractions
Starting point is 00:31:48 that are popular? So a lot of people found me as this became a pattern. Martin Casado at E16Z, I think, was part of his thesis of data, open source. We're going to make the modern data stack. It's going to be open source. So they found me, like, why don't you start a company I was like I don't know I love to just chase IPOs and go from tech startup to tech startup and work on open source and that sounds a little stressful you know I'm not sure if I don't I don't really want to you know I don't have the MBA type skills I'm not sure if I want to acquire them. And, but then I realized too that,
Starting point is 00:32:26 so I was in my, I think I just turned 40. And then I realized it was just like a really unique opportunity. The VCs were like, we want to fund you very, very well. And you don't need to write a business plan and all this stuff. Like you just really,
Starting point is 00:32:40 you know, you're in a position that a lot of wannabe founders wish they were in which is like you get like a tub vc with a good investment when we skipped the seed round went straight to series a so i was like i'm gonna regret it for my whole life if i don't take this opportunity at this time very unique opportunity so and the the rest is is kind of history and it's it's been super in terms of like I talked a few times already about like taking yourself out of your comfort zone
Starting point is 00:33:07 to learn new things and maybe become a better professional, a better human as a result. Like that was the single, you know, most important thing I've done to just like transplant myself
Starting point is 00:33:19 to this different planet and be a founder. It's been super great. It's been a fun ride, but with intense ups and downs. downs any what was the biggest like surprises i guess compared to our expectations i um i think one thing was uh you you kind of think the vcs are a little bit evil and that the oversight is going to be very intense like basically like i'm i'm going to be very intense. Like basically, like I'm going to be, the moment I take this money,
Starting point is 00:33:47 that's, you know, lots of millions of dollars, the heat is on, right? Like people are going to be on my ass looking for results at all time and the pressure and the tension is going to come from the top down. And then what I realized was like,
Starting point is 00:34:03 not that there's no pressure, the stakes are extremely high but it's mostly self-inflicted and it's mostly like oh if i do well you know i can i can do very very well you know if the company does well and then you make promises to you know every investor but also every employee every customer every prospect so people little by little um the the pressure goes up but it's not necessarily inflicted by the investors or the organization. And it's surprising how much latitude you have as to how you run your business. No one is like, oh, you got to do A or B.
Starting point is 00:34:36 It's just like, you're like, okay, well, you build a business, you want to build it, which is great. So you definitely don't come across as the MBA types. I mean, you see one right now as a policy of the company. You're deep into the trenches and actually writing code, which is super impressive to see. How did you go about thinking about the business plan, for example? Like, this is not a skill set that many engineers typically have. And I would say engineers probably make the worst customers.
Starting point is 00:35:04 So how did you put yourself in that shoe and say okay this is how I can build a business around this yeah I mean I think you take like the the challenges of being a founder as a first class problem you go to the first principle and and then you try to figure out how you should organize your time and where you need to seek advice from and what's most important to work on you know today this week this month this year um you know find the the right advisor and the right people to surround yourself with so it's nothing uh nothing unusual on the on the answer here on the on coding i think it's like maybe by the time we got to a certain scale, it just didn't make sense to do any of the coding that the company would depend on to succeed.
Starting point is 00:35:52 So then it's more like, oh, I could for the early on. I was definitely very involved when we're less than 10, still acting as a very active engineer and PM. And then over time, I think, distancing myself from that and more saying like, okay, I code because I need that as an outlet or it's good for my mental health or something like that. I was just like, you know, or that's how I've been realizing myself
Starting point is 00:36:16 for the past 15 years that to not have that as an outlet that I know that I'm good at, you know, it's difficult. So then I distanced myself from that. And there's like a long period where I didn't code at all there's just like too much stuff to to manage and then um i think recently yes i decided to to spend more time to be to wear the cto hat you know more often which includes like being being in a code base is very positive thing to
Starting point is 00:36:43 to be around so so yeah it's it's been an interesting journey there and you learn things along the way like maybe things that um you really love that you didn't know you might love and then some some things that you're like okay i know i need to find someone to do that for me because i'm not that interested in that wait was there something that you found that you really like you didn't know that you love yeah i think or that and it goes with like what you're good at is generally what you love so where you can assign passion but like i love product marketing in general now so just like messaging positioning pricing packaging uh some of the strategy right like how do we
Starting point is 00:37:20 think and expose about the the um component of the product right to to the market so and that can shape that that shapes not that can shape the roadmap the product direction too right so it's a so maybe the layer of like if you have like scope mission vision scope for a product like the product marketing can shape the direction of these things it can apply that stuff but it can also shape the direction of it I think like I was doing it naturally in open source in some ways right like the airflow had a logo and a one-liner and some way like what it does what it doesn't do right like there was a if you think of a read me of an open source project if it's effectively product marketing is how you present your project to the potential
Starting point is 00:38:05 community right so that's that's a thing i think i generally hate opera like hate operation but it's a must do but like just things that are more repetitive uh financial planning is kind of interesting like modeling stuff in excel you know i don't know but but there's the diversity of the these things is what's what's interesting and then management has never been super like my like i like you know spending time with people too but like managing it's you know i prefer like coaching leadership than management um when you said so free therapy uh so so when you were viewing coding in that lens, like how do you go about picking what to work on when you see it as free therapy, what to, uh, what to work on?
Starting point is 00:38:50 Yeah. Yeah. I don't know. It's a, it's a, it's a mix of like, if you use your Doug footing, your user of the product, you can fix the little things that annoy you on a product. So sometimes I don't really like CSS, but I hate, you know, cricket pictures on the wall, my OCD triggers. So like also some of that is really easy.
Starting point is 00:39:09 Usually it doesn't get in the way. It's non-critical work. And then about going more meta, like recently I got closer to the repo and got really into developer experience, Docker, um, dot, uh, Docker compose, um, Helm chart, like just making sure that the stuff we, like all this,
Starting point is 00:39:31 the CICD stuff, Docker builds, like getting that stuff to actually run. I think that's more of a getting more meta on the problem as you get more senior. I don't, I don't know what it is. I'm not that passionate about that.
Starting point is 00:39:42 I'm like, I kind of hate get up actions and like, it's, it's really hard to work with, but maybe it's like the, the repo really needed it too. Uh, I think that's cool. The pattern you're describing. I've, I've noticed that pattern, uh, obviously different context, different scale, uh, but translating that to a tech lead instead of a founder, for example,
Starting point is 00:39:59 like a tech lead, bring in the trenches, initially designing exactly what the system should do and should not do. And then slowly they go out, kind of spread the word for it within the company, outside the company, if it's open source. And then eventually they start looking at it from a user standpoint and they start fixing these things that you mentioned. It's like, well, someone should be able to get clone and get something built right away, for example,
Starting point is 00:40:21 improving the CSED pipeline. So it makes sense to put on that user hat and seeing how the product can be improved not just the engineering aspect of it yeah the developer they're both like user experience and developer experience they're both like closer to human which is kind of interesting and they're both like kind of if you think of like the development pipeline or like what's you know middleware or they're they're both like one css is like probably the most like veneer layer on the application and then cicd is like the deep right of the back end so but but it's like extreme but
Starting point is 00:40:52 in some ways it comes full circle because in both cases one is about user experience for a developer and then user experience for a user the product and in the middle maybe it's like because the middle is like gets so tangled up and complicated you're like if i touch anything and i'm like you know it's a bag of knowledge you started touching it and then you gotta you know you gotta get deeper so that maybe that's why i'm staying away from the the guts and you don't want to be in the blocking you need the therapy for the free therapy that's it that's not that Then you need some, some extra real therapy. So for like a listener,
Starting point is 00:41:28 right. That's like, yo, this max guy is pretty cool. I want to do what he does. I want to make the next airflow, but for LLMs, like what do you,
Starting point is 00:41:37 okay. I don't think it would be fair to ask you like what that would mean or like in terms of like what specific project, but like, what does that journey look like right because like for you you join like big companies you see like how things work at scale like what would your advice be to um this engineer well it's the first thing i would say don't be a founder uh it's a it doesn't i think we we just like glorify the founder i think like it's it's been like overly glorified and it's,
Starting point is 00:42:05 you know, not all that fun. I think it's a good thing to tell people to not be a founder. Cause like you need the skill that you need to succeed as like this, like delusional, like, I don't care what you think I'm going to do it regardless. So we're kind of testing that with telling people that.
Starting point is 00:42:21 Yeah. Yeah. You need, cause you need some of that. So people need to break through like, like screw this guy, Max, who sat down this podcast and my other people that told me not to be a founder, I'm still going to be a founder. So I think it's good overall to put that message out there. Unless you're like, you know, chewing glass and like, swimming in subzero water. And
Starting point is 00:42:41 like, it's just like, like this, this stuff that's like kind of more brutal. But then in terms of like a question on like how to start a project, I think to be in tune with your environment and its needs and skills and that kind of holes in the market offering for open source specifically. Like, you know, it wouldn't be great if there was a tool that allowed me to do test-driven development type stuff with prompt engineering. Oh, it doesn't exist. Well, maybe I can create
Starting point is 00:43:10 a thing there. So to keep, I think it's better when it comes from within and scratching your own itch and the use case and the hole that you observe
Starting point is 00:43:19 from a place that you're very, very familiar with, right? So that's key. For me, there's two projects I'm looking at that are're very, very familiar with, right? So that's key. For me, there's two projects I'm looking at that are like mostly in the early, the kind of validation phase
Starting point is 00:43:32 that if people wanted to get involved to kind of run with the thing or with the idea or with some of the assets and the thinking that I put together, I can kind of pitch these projects. Yes, we do. And maybe as an example, too, of the kind of projects that could be interesting to someone
Starting point is 00:43:53 and not necessarily like, oh, you know, take it and run or come and collaborate with us and get it started. But the first one is around semantic layer. And I know like DBT is coming up with a good like metrics layer, semantic layer. That's super interesting. We're integrating with it at preset, but then we've had a hackathon and some set of new ideas that extend upon
Starting point is 00:44:16 this idea. So the world really need a universal open source semantic layer that works well and is simple. So if you look, maybe we'll put the link in the show note, but I think it's that preset IO. open source semantic layer that works well and is simple. So if you look, maybe we'll put the link in the show note, but I think it's that preset IEO slash all stars to SQL all stars. It's a semantic layer that works as a virtual database. So you put your semantics of like, what are your metrics and dimension in which, you know,
Starting point is 00:44:41 you map your schema as an all the semantic layers. So you say this table is going to be joined. This column is a metric, this is a dimension, and you organize all your stuff as code, as it should. So similar to LookML from that perspective. But then it's exposed as a virtual database. So you have a table called star. So you can say select stuff from star,
Starting point is 00:45:02 and star becomes, we would transpile your sequel behind the scenes it's exposed as a large flat data set but behind the scene we transpile your sequel to do the underlying joins that need to be done so it's a cool idea um and there's just some other ideas around that around progressive adoptability and having the semantic layer guess your or help you in terms of guessing the semantics of your schema that your schema already has information you should be able to figure out which tables you can join and how what looks like a metric and a dimension already so the semantic can be mostly inferred progressively adoptable it can enrich it over time and still get value from day
Starting point is 00:45:45 zero and it's exposed as a virtual database so that every bi tool out there is already compatible with it because it's a sql interface that was one idea yeah and i guess so because it's everything's more codified like then it makes like the lm sort of uh improvements and stuff like with that yeah the lm and goal on this semantic layer there's there's two angles i can think of now one is like well the lm can help you define your semantic layer like set up your semantics but then if you do have your semantics set up it's it becomes like a map for the lm to better better understand your schema and maybe instead of generating, you know, the LLM might do better with the abstraction layer than it does without it. Right, because the semantic layer is there to use to help business users self-serve in
Starting point is 00:46:37 BI tools. So that means that if it helps business user self-serve, it should help an LLM, any form of intelligence, you know, self-serve, it should help an LLM, any form of intelligence, you know, self-serve. So that's one project. Yeah. I don't know if we want to get deeper into this one before I talk about the other one. Sorry, sorry, sorry. So like for people who might not know, what is a semantic layer? Oh yeah. So the semantic layer is you can think of, well, so maybe I'll start with the purpose, but the purpose of the semantic layer is to help more people, namely like business users, self-serve with their data without necessarily like knowing SQL or understanding as much of the underlying
Starting point is 00:47:18 data model. So it bridges the gap between, so the semantic layer is a bit of a map of your database and it maps the metrics and dimension and more like business term, right? So instead of having cryptic table names and column names that you don't necessarily know how to join and you need to write SQL to make sense of, we expose a layer on top of that
Starting point is 00:47:43 that has the map of the physical layer plus metrics and dimension, pretty label, pretty descriptions, sense of we we expose a layer on top of that that has the map of the physical layer plus metrics and dimension pretty label pretty descriptions so that people can just drag and drop these things and behind the scene we generate the sequel so that's a so that's the general idea and a lot of these things historically have been part of the bi tool so they're proprietary by nature and they're not shareable across tools so that's a bit of a problem if you use multiple BI tool, which most companies do. So whatever you do in Lucre, you can't really take with you on the Tableau or Superset
Starting point is 00:48:12 or any other tool. So a lot of value there and having like an open source, universal semantic layer, define as code, exposes a virtual database. So some cool ideas. So there's a repo out there that's mostly just early skiff holding of what the project might look like. And the other thing, I think that the other need that we identified is something around data access policy as code. So every
Starting point is 00:48:38 company, and the bigger the company, I think the more of a reality that is. But every company needs to define groups of users and what data access they should have. And mostly for analytics purposes, right? Like, of course, like every company needs to define like rules and access of every user to different systems. But this would be more targeted towards data access policy. So you probably have some snowflakes, some big queries, some database left and right, different BI tools.
Starting point is 00:49:07 And you can say certain people have access to certain tables or columns or schemas. And maybe this category of the user has access to the schema, but not the PII in here or there. So the other project would be called Governator with the aesthetic of Arnold Schwarzenegger, for sure, like somewhere there, the logo. I think we have a logo that got mid-journey to produce.
Starting point is 00:49:34 Oh, that's nice. So, and then Governator, you would define your data access policy as code and it can push and pull or as code or as YAML, right, in a repo. So it's like in a file system in a repo so you can know exactly who has access to what and who gave access to what and you know when you change the access you do it in the repo so you know who you know gave access to what when and then the there would probably be a cli to or ci cd type tool where you can push and pull
Starting point is 00:50:03 to different you know sources and destinations so you could pull whatever you put in snowflake as data access policy rule stamp it as code change it push it back or push it to other systems and that solves a problem for us that we have which is it's good for sometimes the bi tool needs to know about your access rules, so we know which charts and dashboards to show you. Right, right. But I want you, but often you don't want to have a service account that has access
Starting point is 00:50:34 to everything and the BI tool enforce the data access policy, you want to enforce it at both layer, because the user might want to go straight to the Snowflake UI or the BigQuery UI console. Right. And then to have consistent, you know, what they see, what they have access to. So this tool would allow for people to manage, you know, data access policy centrally
Starting point is 00:50:53 and synchronize across BI tools, databases, and things like that. I imagine the, like, the access simulation will be pretty interesting to build out and pretty core, right? Because it's kind of like, I remember IAM on AWS, it was such a pain to work with, like before they actually kind of had more,
Starting point is 00:51:08 like made it user friendly to actually test out like different policies and how that impacts things instead of just, but yeah. Yeah, when it's managed as code, you know, and that's a theme in my career, like for Promptimize for Airflow is like, you know, data pipelines as code, but like when things are managed as code, you can test them,
Starting point is 00:51:24 you can version them, you can version them, you can review them, you can know who changed what when, and then you can CICD a bunch of things. So before you deploy your data access policy, you can run a bunch of tests to make sure data scientists, a simulated data scientist, make sure they don't have access to financial data, for instance, or something like that. Make sure they cannot get push access to financial data for instance or something like that. Make sure they cannot get push on this repo
Starting point is 00:51:48 because they're going to break everything. So in the advice you mentioned to people, don't be a founder unless they want to chew glass or swim in sub-zero temperatures. So Guang asked the question about surprises, but your advice makes me ask, what have been the challenges of being a founder in this case, especially when you are building a company around an open source project?
Starting point is 00:52:10 Yeah, I mean, I think it's a lot of things. I mean, the first thing is like, you know, taking yourself out of your comfort zone and learning a bunch of things that you may or may not be passionate about. But it doesn't matter because it's what it takes to succeed. Right. So then you're like okay um now i'm putting myself in a situation where i have to learn new things that are really important really core to this business succeeding and i may or may not like that so unless like you know you might love these things or you'd love to be outside your comfort zone then uh that's a bit of a an exercise you don't know how it's going to go and then this the stakes are very high so sometimes i don't know these these things like go together
Starting point is 00:52:54 but like reward and recognition is really important for everyone everyone wants to be kind of recognized and rewarded uh but it it comes usually with, you know, hardship or the difficulty level of what you take on, right? So if you want to be a volunteer for an organization, work part-time or be an advisor, it can be rewarding, but it's not going to be as rewarding as, you know, doing the real thing and, you know, busting your ass for years and things like that.
Starting point is 00:53:25 But these things, like there's a danger in putting yourself in a situation where the stakes are so high, you know, that the potential for reward recognition is very high. But it's also, like, the risk of, like, you know, coming short on things is very high, too. And the stakes are really high so unless you you want to be really hardcore like a more progressive approach in life to things is probably better like climbing the ranks at some companies and like making sure along the way that it's like i i'm still comfortable managing a larger team or being a little bit closer to being an executive at a company i still like this progression.
Starting point is 00:54:05 I'm going to take another step in that direction. As a founder, you're like, okay, let's just go and throw myself in a completely different area. And it should be fun, right? Or I might get rich. I don't know what you both really think. But I would say a more progressive approach to the things you do in life is probably better.
Starting point is 00:54:24 Unless, yeah, I mean, there's a lot of reasons why you want to do it, that we might want to do it, more progressive approach to the things you do in life is probably better um unless yeah i mean there's there's a lot of reasons why you want to do it we might want to do it and people do do it uh but uh but think about it you know cautiously before doing doing this big jump good advice good advice uh by the way you mentioned you generated this i'm doing a hard pivot here uh you mentioned you generated the governor of logo using MidJourney. So we spent a lot of time trying to generate logos for our podcast using a bunch of these AI tools.
Starting point is 00:54:51 I spent significant evenings on just generating logos on DALI or on ChatGPT. One thing which I found was like you give them a prompt and they come up with a logo, which kind of is okay, but if you ask them to change something or put in a text, they are terrible.
Starting point is 00:55:08 If I say spell software and have it in the logo, they never do that. I never got the system to do that. Yeah, they're learning how to write still. So yes, they have to ask, do not write anything, and then you add it in Photoshop or you use what it's good at,
Starting point is 00:55:20 and then you add the layer of what you're good at. See, we just need to get better at prompting or just better working with these tools instead of're good at. See, we just need to get better at prompting instead or just better working with these tools instead of just, sorry, what was it? I was going to show the logo that I made. If I could. Whoa, that's pretty cool.
Starting point is 00:55:35 That's pretty good. That is really good. Wow. You got mid-journey to do that for you? I think it might have been Chad's. But there was definitely, it would be interesting to pull this session because I did fight with it quite a bit because I wanted like both the Governator, you know, I wanted both Arnie and the database logos. That was pretty tricky, but I got it to do this thing. I wanted the red eye because at some point it comes up with a red eye and like, I love the red eye.
Starting point is 00:56:02 I want it. Then it stops doing it. So you have to ask for it and doesn't want to do it but uh there was definitely like a bunch of prompts here for me to get to this so that's if you want to check it out so that's on my personal github the project is kind of early it has mostly just information architecture and some of the ideas are behind it yeah and the other one is called um all stars That's on the preset one. A little bit further. It uses some of the, I don't know if it's legal, but that's the Mario font here.
Starting point is 00:56:32 But that's a pretty fun one to read. So if you're interested in semantic layers or the future of them, that's probably a fun repo to read. We'll do, we'll do. Well, this has been an awesome conversation, Max. Thank you so much for taking the time. This was super amazing for us.
Starting point is 00:56:48 This was super fun. Yeah, it's like really interesting topics overall. So glad I showed up on the show. And we'll add links to all the things we talked about. Presets, Superset, Airflow, the new repos that you mentioned, Goer Nature, All Stars, so that people can go find these projects and hopefully contribute to them. And we'll also link to your profile, of course, so that people can find you as well. That'd be great if people wanted to run with these projects, you know, because like right
Starting point is 00:57:18 now I don't have the bandwidth. I want to work on these things, but it's like I don't have like we're pretty busy. I appreciate working on a bunch of other things, too, but it's likely I'll don't have like, we're pretty busy. I appreciate working on a bunch of other things too, but it's likely I'll work on some of these things. So if people wanted to get a little bit closer and help, you know, lead on these projects, it could be fun. If either of these projects get a contribution to this podcast, I would count that as a win.
Starting point is 00:57:37 Yeah. Or even, I mean, I think like in a lot of like, a lot of like what's great about some of these, some of these things and my intention and putting the read me out there is like, oh, we might build this stuff. But I think it's it's also just the ideas, you know, and the empowerment. Like you just put the idea out there and someone might run with it. It might be like, you know what? I have a different twist on this, too.
Starting point is 00:57:58 So I don't just think like code should be open and free. It's like ideas, too. I'm very i'm usually against like intellectual property i just think these two words don't fit well together like intellectual and property no it's like let the ideas want to be free you know code wants to be free too so let's uh make that a reality you know for sure i like that well thanks so much cool thank you so much this was amazing take care hey
Starting point is 00:58:30 thank you so much for listening to the show you can subscribe wherever you get your podcasts and learn more about us at softwaremisadventures.com
Starting point is 00:58:38 you can also write to us at hello at softwaremisadventures.com we would love to hear from you until next time.
Starting point is 00:58:47 Take care.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.