Software Misadventures - The 3 traps of open source funding models | Wes McKinney (pandas, Voltron Data, Posit)

Starting point is 00:00:00 Creating open source software, like it's very difficult. And for me, it's been very emotionally draining because there's a lot of like, you have to soldier through like the dark days of the project where there's not that many people that care and you have a conviction and a belief that what you're doing is important and is gonna have impact.

Starting point is 00:00:16 But that impact is gonna be realized like far into the future. Like work that you're doing today, you're not gonna see the impact of that or feel recognition or see the value of that work for at least six months, like probably even more than that. And so it's like, it's very deferred gratification. I think this goes back to like making open source your vocation. That should be a full-time job if you want to do open source. Going back to the second trap that you talked about, which is the startup trap.

Starting point is 00:00:42 Can you tell us more about that? Yeah, the startup trap is where you create a company, you raise some venture capital, and you build a product that is either an explicit commercialization of the open source project, or you build some kind of a vertical solution that's powered by the open source project. And so there's a couple of issues that can happen here. Welcome to the Software Misadventures podcast. We are your hosts, Ronak and Gwan. As engineers, we are interested in not just the technologies, but the people and the stories behind them. So on this show, we try to scratch our own edge by sitting down with engineers, founders, and investors to chat about their path, lessons they've learned,

Starting point is 00:01:26 and of course, the misadventures along the way. Sweet. Yeah, Wes, thanks so much for joining us on the show. Thanks for having me. So as the creator of Pandas, you wrote the book Python for Data Analysis back in 2012. I really liked how hands-on it was when I was learning data engineering back in 2015,

Starting point is 00:01:45 this was. So thank you for that. But I would like to file a complaint about the cover of the book. Okay, so for context, the book was published by O'Reilly and O'Reilly books all feature like these different animals. Can I just say how sad, you know, I am that the featured animal was not a panda but it wasn't it wasn't even a snake it was like a weasel like what kind of animal was it yeah i it's funny when i was working on the book i you don't get to as an o'reilly author you don't get to choose the um you don't get to choose the animal on the cover so i um i you know the injustice i suggested i was like so just to you know say it would be cool to have a panda on the cover. I think what they said was, oh, we're saving the panda for like, like something really big.

Starting point is 00:02:33 What? The panda is awesome. Which is kind of weird. That's messed up. Well, it's funny because I think that the book ended up being way more successful than anybody expected. Because when you go back, like when I originally got the book contract with O'Reilly was in like November, 2011. And it was a little bit, it was definitely experimental at the time.

Starting point is 00:02:55 I don't think anyone had a really clear idea whether Python was gonna become a big deal in mainstream data analysis or what we now call data science. So I think the fact that the book has been so successful and it's been translated into like 10 languages and has sold, I don't even have the full count, but my guess is like three, 400,000 copies, like kind of like ballpark, like maybe more when you account for all of the subsidiary translations and things like that. But I think that's because it's become a reference textbook for a lot of university

Starting point is 00:03:29 courses. And so that creates, creates consistent around, around the globe demand for the book. And it's funny, sometimes I get emails from people who live in a country with, with sanctions and aren't able to, for example, I've gotten emails from people in Iran and they say, I pirated your book. I'm sorry. Is there some way that I can pay you? And it's like, literally, they could not buy the book because of sanctions. But now with the third edition, the content's freely available online. And given the book is now 12 years old, and I've tried to update it and keep it relevant and keep up with the changes that that happen in pandas it reminds me that i i have like a pending

Starting point is 00:04:09 queue of erotic effects and to get a new printing out to fix like it's basically the way it works is you make a new version which contains the major edits to the book and then as over time you fix little things and then they'll update things at the printers so so people get like little bugs fixed in the book patch releases i would call them and was it hard to convince them to hey uh we're gonna have just have it free on pdf like for the last edition it was somewhat tricky i think the fact that there were there was precedent for open access books like R for Data Science is one example. So when Hadley wrote R for Data Science, they had the book available for free as it was being written. And they had the stipulation with their contract that we will only do this with Riley if we're able to release it as open access. And I think that helped show that having the open access version

Starting point is 00:05:09 actually doesn't hurt book sales, like print sales as much as you might think. And I've been, I actually expected that having it available for free would reduce print sales. But to my surprise, the print sales have been pretty stable. Or maybe they were hurt a little bit and maybe the market got bigger because there's more and more people doing Python. But I got permission to release the book in one and only one open access format. So I got to pick whether it would be Jupyter Notebooks on GitHub or a website or whatnot. And so I chose the website because I thought that having the SEO and the ability to like go to westmckinney.com slash book and search the whole book, you know, and get instant results was pretty, like a pretty useful feature. So, and, you know, JJ Allaire helped me port the book to Quarto, which is an awesome new technical publishing system that I've been

Starting point is 00:06:03 recommending to everyone. And so partly why, the reason why the book looks so nice and so easy to browse and search on the website is because of Quarto. I see, I see. Very cool, very cool. By the way, quick question on Quarto. When you say it's a new digital publishing tool,

Starting point is 00:06:18 can you say more about this? I'm just curious what this is. I've never heard of it before. Yeah, so if you go to quarto.org, it's a language independent technical publishing system that under the hood it's powered by at the core of Quarto, you have Pandoc, which handles transpilation between different document formats. But Quarto has become a pretty big software project that handles creating books and blogs

Starting point is 00:06:43 and websites. And you can use basically write a book using Jupyter notebooks and then use Quarto to stitch the notebooks together to create a book-like structure. And then Quarto handles all the orchestration of rendering the Jupyter notebooks, converting the output of the Jupyter notebooks into the appropriate, like the necessary output format given your book publisher. So for example, O'Reilly Media uses ASCII doc and doc book XML as their input formats for publishing. And so Quarto knows how to go from Jupyter notebook with various tags and markdown cells and code and everything. And you can add special annotations within your Jupyter notebook to handle particular things that have to do with a Riley's tool chain.

Starting point is 00:07:30 But my book was written in DocBook XML in 2011, 2012. And so I actually got like really good at writing XML. And I have all these like, I have all these emacs shortcuts for generating XML tags for DocBook XML. That's not something that I would recommend to everyone. But it's something that I and I have all these like, I have all these emacs shortcuts for generating XML tags for a doc book XML. But it's not something that I would recommend to everyone. But it's something that I was just forced by necessity to get good at. But what basically what we did, what JJ helped me do with Corto is write Pandoc filters, which are written in Lua, to convert the book from doc book XML into Corto markdowndown which is markdown plus some extensions and customizations for a quarto and then i can use quarto to render the book to a pdf or to a website

Starting point is 00:08:13 or in principle any output format so originally like the history with quarto is that jj and his collaborators created to have created multiple other dynamic web publishing systems. So there was ColdFusion in the 1990s, which was one of the original dynamic web page frameworks, along with CGI and PHP. And then he and his company created what ultimately became Windows Live Writer. And then they created R Markdown early on in R Studio, which turned into Posit.

Starting point is 00:08:45 But R Markdown was a basically technical document publishing framework where you could write Markdown interlaced with R code, and it would handle all of the rendering and outputting to different formats. So Quarto is kind of a reimagining of all those things built on a very modern foundation. It generates portable binary. It ships a whole JavaScript runtime, uses Deno, which is like kind of the fancy Rust-based Node.js runtime.

Starting point is 00:09:12 But it's very easy to deploy. And yeah, I think it's a really cool project. And so I've also been encouraging a lot of like open source projects to migrate their project documentation and websites to use Quarto because we did that for the IBIS project, for example, and it generated really good results. That's what I was thinking, actually. I was just navigating the book on your website. It

Starting point is 00:09:32 looks really good actually, and super easy to navigate. I was thinking for at least many teams, they could use something like this for internal documentation, for example, to make it look such nice. Yeah. Yeah. So you can think about like creating internal websites using Quarto. And actually, one of one of Posit's enterprise products is called Connect, which is a basically a secure publishing system for internal publications. So it could be documents, Jupyter notebooks, really anything you could create with Python or R with Corto can be published dynamically to Connect and you can set up fine grain permissioning. So imagine like you had some Corto document or some set of documentation,

Starting point is 00:10:16 you only want it to be visible to one team with inside your company and you want to set it up to deploy from a GitHub repository, something like that. That's something that you can do with something you can do with Connect. So it's all interconnected. But if you want to use it up to deploy from a GitHub repository, something like that. That's something that you can do with Connect. So it's all interconnected. But if you want to use Quarto to generate a Confluence page and put it in Confluence if you're an Atlassian customer, that's something you can do also. Super cool.

Starting point is 00:10:38 Cool, cool, cool. So, sorry, coming back. Yeah, a bit of a tangent. I'm a huge, in summary, i'm a huge in summary i'm a huge fan so no we'll definitely link it in the show notes for check it out so a couple years back you wrote this post announcing ursa laughs in which you mentioned these three traps about people working in open source in terms of like how they get funding. I thought this was really cool because it kind of ties different parts of your career sort of together.

Starting point is 00:11:09 I like how you're like, yeah, I have direct directly experienced some variant of all these problems. I'm a big believer of experiential learning. So I think that, right, that's the only way to really get understanding of problems. So I thought that we can kind of go into these different traps, interweave things together. So the first one is the consulting trap. And I think that kind of maybe ties back to like pandas. So to kind of get us started, this is like early on

Starting point is 00:11:38 in your career after college. So you worked in finance at a hedge fund and that's where you started building pandas and eventually made it public. And then shortly after that, you actually decided to pursue a PhD in stats at Duke. So you mentioned this yourself, that financial institutions, they're not really charitable to open source. We're both very curious, like how did you manage to convince them to open source it? It wasn't easy to convince them. I will say that in the last 15, I guess, 17 years since I first got involved in working in finance in the mid 2000s, that financial firms have become a lot, have seen the value of making things open source.

Starting point is 00:12:19 And so not only AQR where I work, but Two Sigma, Bloomberg, Jane Street, these companies have released a lot of open source software. But to get companies like this that value their intellectual property so highly to dip their toes in open source was not easy. I think at the time, it took maybe six months of discussions and convincing. And ultimately, I made the argument that yes, we'd be giving away potentially some secret sauce that would help our help the company's competitors, like be able to work with data more easily. But I also talked about the that the likelihood that Python would become more widely used. I think at the time, like de Shaw, for example, had begun to use Python for certain

Starting point is 00:13:05 things. And so there's a little bit of a cost benefit. So if you release a piece of open source software, you have a better chance of your thing becoming the main thing. And that creates a lot of network effects and value within the open source ecosystem. But if you don't open source, and then somebody else, somebody else open sources is their thing and then that becomes popular then you're sort of on an island and so you it's like building bridges and doing trade with your neighbors versus having a very isolationist mindset and so there's definitely pros and cons to to the different approach if you create something really valuable maybe you want to hoard that invention and use it to use it to your maximum benefit, but there's also downsides. And I think it helped that I was very, I was very keen to engage with the open source community. And so

Starting point is 00:13:53 I made the argument that I would use pandas like early pandas as a tool to better engage with the open source community, use it to recruit people to come work at the company. Maybe if the project became popular, then people would learn how to use it and they would want to come work at the company to be able to have a job where they could use pandas as part of their jobs. And thankfully, I think all of that has basically come true.

Starting point is 00:14:18 And so now AQR can hire new college grads and they show up on their first day and they know how to use pandas and Python and they can be productive working for the company pretty much right away, which is very different from the old way that many financial firms used to operate, which is they have these very proprietary tool sets, proprietary data analysis tools and systems. And so new employees would face a pretty significant learning curve to be able to get up and running. And there was a lot of debates about licensing.

Starting point is 00:14:51 I think some of the lawyers wanted to use the GPL. And of course, the Python ecosystem is not very GPL friendly. And so if you put GPL on something, a lot of Python users, almost as a matter of principle, are not going to touch the library because they're concerned about the viral effect of the GPL. But eventually we agreed on using the new three-clause BSD license and putting it out there. But I think I initially started having the conversation about open sourcing it sometime in early to mid-2009 and was only able to really push it through at the end of, I think the first Panda 0.1 was released on New Year's Eve 2009. So I was ignoring my family or friends, or I can't remember where I was to get it up on PyPI. I'll tell you the anxiety of publishing my very first Python package. It was pretty intense. It got easier after that. But

Starting point is 00:15:38 yeah, the first time was hard. Wait, what was that like? So this was like, literally New Year's Eve, and then you're just like pushing it? Or how did that go? Yeah, I mean, if you look at, if you look at, you know, pypi.org, oh boy, there's so many like projects that are not pandas in PyPI, which is, let's see here.

Starting point is 00:16:04 Pandas 3, my goodness. I swear the Python package index is full of malware. But yeah, if you go all the way back to, no, I had it wrong. It was released on Christmas, Christmas 2009, which is even worse in the sense of like, of neglecting my family. But I don't

Starting point is 00:16:25 remember where i was at the time but uh you know i had time over the holidays and so given that that the open source side of panda started out as my side project and side interest and as long as continue to be maintained and work well for my job that was enough nice nice and i mean that's like a lot of work to go to have gone through right six months of like discussions and you know meetings and then really pushing for it um i think a lot of people to go to have gone through, right? Six months of like discussions and, you know, meetings, and then really pushing for it. I think a lot of people would have given up, right? Especially at that point, it wasn't clear that this is going to have the kind of impact that it does

Starting point is 00:16:54 today. What gave you the conviction of like, you know, this is worth me like pushing for? Yeah. I'm trying to place myself back in that, back in that mindset. It's been 15 years, but yeah, I felt that there was a lot of potential. I thought whole like research set of research tools that we created in python were so much better than what i had used was using prior to that so i i felt like there was this potential to create have like a really transformative effect on people's productivity or just making data analysis data science a lot more accessible and making it open source. And so I clearly had a strong conviction and it was something that I really wanted to do and see about. But yeah, I would say that it languished for a little while for maybe, languishes maybe

Starting point is 00:17:57 putting it strongly, but for about a year, because I got busy doing some other things, I applied to grad school, I started a PhD. And it was only when I started getting contacted by other companies who wanted my advice on switching to Python from other things that I realized that this was the time and it's now or never. And I need to spend all my time on this to help the ecosystem develop into something that people can adopt and be successful using. I mean, it wasn't just Pandas. There were a lot of other things that needed to fall into place to make it all happen. But Pandas was an important part of the solution.

Starting point is 00:18:34 I see. That must have been cool to get validation, right? Even after all this time to have the inbound interests. Yeah. I mean, my kind of self-deprecating way of looking at it is that I, you know, I was in the I was in the right place at the right time. And it's, you know, certainly more than being in the right place at the right time, like I had to take a lot of actions in order to make it in order to make it happen. I had to make, you know, personal sacrifices, I sacrificed

Starting point is 00:19:00 my sacrifice, my personal life, like I took time away from friends and family to work on it. I made significant career diversions because I believed that it would create more interesting opportunities for me in the future. I could have continued to work in finance and had a very comfortable and lucrative career working in quant finance. So some of the people that I worked with in those years, some of them are still involved with the same companies that I worked with in those years, like are still, some of them are still involved with the same, you know, the same companies that I collaborated with in those days. And so I could have stayed on that path, but I chose to take a risk and put a lot of energy into it, basically sweat equity, I suppose.

Starting point is 00:19:40 And I, but I was in a very fortunate situation. I had no, I had no student loans which is like I think an under underappreciated uh benefit and that I was able to take a risk and I wasn't I wasn't digging myself into that much of a financial hole I had some savings from like I had lived very frugally in my first few years of working and had maybe like you know a year's worth of living expenses saved up and so I was what I told myself at the time was, okay, I'm going to work on this full time. I'll find like a little bit of consulting work on the side

Starting point is 00:20:11 to help pay the bills, but don't do too much consulting that I'm not able to spend most of my energy improving pandas. And then after a year or so, I can see where I find myself, whether this makes sense or like whether I'm getting the kind of return on my time,

Starting point is 00:20:28 like return on investment that justifies continuing to do this right right by the way i um like that reflection i love your post uh that you wrote when you turned 30 just to like reflect on things um one sentence that stood out to me was like right you talked about um mit it was more about yeah being smart and then like uh in New York he was like being wealth uh and then or also in San Francisco like we had a very similar kind of conversation with Josh Wills uh about that and I thought yeah it was quite cool that you were like you know given all that let me try to figure out like what do i actually want what makes me happy um i thought that was very impactful yeah i think in you know going through all these different professional situations and deciding how to spend my time and what to work on i think there there is like an underlying like search for meaning search for like what like what actually matters to you. Like do you value like recognition or fame or do you value money?

Starting point is 00:21:28 Do you value comfort? Like what are the underlying motivations, the things that will make you feel satisfied, like be happy with your life? And I think in retrospect, I went a little too far at times and made some significant, you know, personal sacrifices. I will say like in my twenties, my, my personal relationship suffered as a result of my, at

Starting point is 00:21:53 times maniacal focus on working on pandas and working on this project. Like I have a bit of an obsessive personality and anyone who knows me well is familiar with, with that side of me. Like, oh, like Wes has his projects and sometimes he sometimes the projects become like an obsessive like an obsessive focus and so I think learning to find some learning to find some balance and the importance of like relationships and friendships and things like that I think it was it was good that I went through all that I think it was very helpful personal growth but I've learned about myself that I'm very motivated by very motivated by impact and to be able to have impact

Starting point is 00:22:30 in a sustainable way. But I also have to take care of myself. Like I have to be a, like a happy and like resilient person. Like if I'm depressed all the time and like, don't have like the, can't bring myself to care enough, like to start a new project or to like drive drive forward projects like through the uh the tough times because creating open source software like it's very difficult and for me it's been very emotionally draining because there's a lot of like you have to soldier through like the dark days of the project where there's not that many people that care and you have a conviction and a belief that what you're doing is important and is going to have impact but that impact is going to be realized like far into the future like work that you're doing today you're not going to see the impact of that or feel recognition or see the value of that work for at least six months like probably even more than that

Starting point is 00:23:19 and so it's like it's very deferred gratification. So you have to tell yourself, okay, this is tough. Like, gosh, like the build keeps breaking and like, oh, the release is like, oh, this windows build. And there was like a dark time when like building stuff on windows is really hard. And so every time like I would fire up VirtualBox to build windows binaries, I'd be like, oh, this really sucks. Like why? It's like, why must I go through

Starting point is 00:23:45 this misery? And often like that, that, that it's like silent suffering. Cause no, and you can always tell someone like, you know, like, Oh man, it sucks that I had to like spend four hours, like fixing the windows build and like getting these binaries out so I could release. And people would often remind me like, Oh, you, you chose this life. Like if you wanted to, if you want it to be more comfortable or to not have like to be all on your own building or just like feel chronically like understaffed working on these projects and making them happen, you know, this was a choice. And I guess it helps to remind yourself that it's like, it's always a choice. And yeah, if you're not happy with, you know, ultimately happy with what you're doing, yeah, there's going to be like good days and bad days.

Starting point is 00:24:25 But hopefully you have more good days than bad days. I don't know. I think it was like Steve Jobs who said if you have like a certain number of bad days in a row or it doesn't seem like you're not getting any positive feedback, then you should probably, you need to make changes. Interesting. Two follow-ups. So one, do you think obsession is an important ingredient to push like projects like these where, like you said, right, it is so hard to have the conviction of like, it can be I, I mean, for me, that's just my personality. And so I don't know that it's an essential ingredient ingredient. It worked for me, I think that an obsessive personality can also lead to unhealthy behavior. So earlier in my life, I got involved in video game speed running. And so we were playing the game GoldenEye 007.

Starting point is 00:25:15 And that is a special kind of obsession to play the same stretch of a video game hundreds of times in a row to try to try to get the fastest fastest time and like perfect all of the little details in order to set in order to set a record or break your previous personal best and so i think i fell into those kind of like obsessive patterns and patterns of self-improvement and uh and efficiency and yeah it's very much like yeah i've been that way since i was since I was a child. So not something that I would recommend to everyone. And I don't think it's the only way to do open source successfully. And particularly now that open source has become a fixture and a strategy for businesses. So I think the model of like the lone wolf, like obsessive lone wolf hacker working on

Starting point is 00:26:03 their nights and weekends to build a project is more or less going by the wayside. And I think also it's become harder and harder for individuals to mount successful efforts because we've solved a lot of the easy problems. And so in many cases, it was like, okay, well, we need just to need an open source solution and an individual can scrap together an open source solution to this problem relatively straightforward in a reasonable amount of time. But what if you have a problem that is much more difficult that requires that needs 50 person years of effort or 100 person years of effort. So an individual can't possibly do 100, even if they are 10 times more productive than the next person, or they overwork,

Starting point is 00:26:46 or they work 80 hours a week, or 100 hours a week, maybe they can muster in in one year, the same amount of work output that somebody else might do in three or four years, but you want to deliver results on the order of single digit years rather than, you know, 100 years or 25 years or something like that. So I think as the problems have become more difficult, it's required a different approach and reject and explicitly rejecting the lone wolf, like the lone wolf mindset, which was a feature of like the early days of pandas. But I think there's fewer and fewer projects like that. That being said, like, you know, we have Polars and Python, which was a, which has been a lone wolf project from Richie Fink until recently.

Starting point is 00:27:27 He founded a company and is now hiring people to help him. So we still do see successful scenarios like that, but it would be disappointing to me if that was the only way to be successful in open source, is to engage in this objectively unhealthy behavior. And I think a lot of my, yeah, like I said earlier, a lot of the stuff that I think I did, I definitely made a lot of decisions and worked at the expense of like my mental and physical health in my 20s. And so I've had to make it a mindful choice

Starting point is 00:27:57 to reject that and to not continue to do that to myself. Also, I'm getting older and I can't work long hours like I used to, and I need sleep, like, and I have other things that I like doing in life. So anyway, balance is a good thing. And so to be able to build important open source projects while also having balance in your life, I think is something worth striving for.

Starting point is 00:28:21 So I wanna ask this question, and this is a recurring theme I've seen. So aspects of what you said I relate to in terms of sometimes being so narrowly focused on one problem that you neglect everything else at the cost of your personal life at times. And then many folks we've spoken to on the podcast, this theme comes up as like early in the career, yes, super driven, super focused on this one problem, made a lot of progress, but then also resulted into self-awareness, which is like, hey, this is not really sustainable.

Starting point is 00:28:49 But that surge in the initial period does result in impact, recognition, or even I would say future opportunities that you weren't thinking about at the time. At that time, you just wanted to get this thing to work so when this aspect of balance comes in i say that when i've seen almost this advice consistently that make sure you have that balance so that you have some extra energy in your pool to do other projects or you're behaving well your personal life is good but for people who are starting out would you say that they should it's okay to have that narrow focus yeah imbalance in life it's like hey that's okay if you don't have let's say for example no student loans to worry about you don't have family you're responsible for yeah maybe it's okay go crazy well it's it's uh

Starting point is 00:29:35 i mean for yeah it's important to point out like it's things that i did when i was 25 i think wouldn't be practical for a lot of people like they have they have a family they have a family to support or maybe they have a family to support or maybe they have student loans to pay. Like they have other obligations in their life that makes it hard for them to work from 7 p.m. to 1 a.m. every day. And if you have a demanding job,

Starting point is 00:29:58 then spending time on your nights and weekends, maybe you need to work a second job to make ends meet. And so I think fundamentally, like the early story of open source software, I think part of the reason that the open source world has significant inclusivity and diversity issues is indeed because open source development is fundamentally a privileged activity or started out as a very privileged activity. And so I think what's great now is that large companies have, and startups and large companies

Starting point is 00:30:27 have made open source an essential part of their strategy. Microsoft, from the Steve Ballmer days, has transformed itself into being a very open source friendly company. And Guido van Rossum works at Microsoft, working on making CPython faster. And Microsoft has made enormous contributions

Starting point is 00:30:44 to the open source world and out of like the major tech companies like the Magnificent Seven, I would say that Microsoft is probably the best place to go and be able to work on open source software for a living. And so that means that to take the software development, yes, you're giving away software, building software and giving away for free on the internet. But also it allows people to be able to have more balanced lives, to treat it as a job rather than like something that's coming at the expense of like your friends and family and like your life outside of your day job. And so I think that's, it's essential. And I, yeah, I think that it would be better for the volunteer model of open source to more or less go away because

Starting point is 00:31:26 it's not very sustainable. It leads to significant maintainership problems, maintainer burnout, particularly when somebody is working on a project outside of a day job or some other responsibilities that they have. And so it's common that you see maintainers, volunteer maintainers burnout. So one of the solutions to maintainer burnout is for people to do open source as their job. And yeah, and so I think Linus Torvalds works on the Linux kernel, has worked on the Linux kernel as his full time job for a long time. And so yeah, I think, yeah, I recognize like, I did the lone wolf thing, like I did a lot of volunteer. Early days, eventually, I arranged to get paid to work on open source. And so that's made things. I've been continuously paid to work on open source projects in the last eight or nine

Starting point is 00:32:11 years. But that was partly a reaction to the open source model. It's like, this is going to cause me to be burnt out and miserable. And I need to make this my vocation, my profession. And so I've given a lot of talks and I've given a lot of talks and I've written a lot about how it is important for open source to become like a true vocation, like a job and not something that's like this privileged activity that people do on their free time. So great lead into, so the first trap of consulting trap,

Starting point is 00:32:39 can you tell us more about that? Yeah, so the consulting trap is where you get, you have an open source projects and project and you find consulting gigs or consulting projects where you work for a company that's using the open source project and maybe they partly are paying you to fix bugs and customize the project for their needs. But what can happen is that you end up spending a lot more time working on the company's internal software projects. You become more or less a software developer of that company and your work on the

Starting point is 00:33:10 open source project can become incidental or something that you do on the side. Or ideally, you would spend 50% of your time working on building custom software, building things for the company, 50% of the time on the open source project, or even more time on the open source project. But it's not uncommon to see the shift and it being 10, 20% of your time on the open source project and 80, 90% of your time building custom solutions for the client. So I've seen that happen a number of times. And so it's, yeah, there's good situations and bad situations. I've seen very productive open source consulting type relationships. I think it's gotten easier as time has gone on. But I think nowadays when a company engages a consultant who is an open source maintainer,

Starting point is 00:33:56 they understand that partly what they're doing is paying this person to work on the open source project because maintaining it is good for them as well. But it's still a risk. And I think it's a trap in the sense that some fraction of the time, you end up being kind of a substitute, more or less a fungible employee working within that company. And the work on the open source project is something that's on the back burner. Like ways to avoid that trap as someone that's getting started doing that.

Starting point is 00:34:23 Would that be just being very clear about setting time boundaries and how you should allocate your time in the contract? Yeah, I think it's just being clear about the expectations and the contract and the statement of work and yeah, setting clear boundaries. I think, yeah, sometimes, yeah, if people go into the contract with, yeah, just the kind of, if it's sort of hand wavy, like, yes, yes, like improve, improve the open source project, keep fixed bugs and things like that. It's easy to underestimate how much time that, how much time that really takes. And so, yeah, so just, yeah, I think setting those boundaries or the expectations that say, if it's your goal to spend 50% of your time on the project on a steady state, that you have that to carve out and you protect that time. I think this goes back to making open source your vocation.

Starting point is 00:35:10 That should be a full-time job if you want to do open source. Going back to the second trap that you talked about, which is the startup trap. Can you tell us more about that? Yeah, the startup trap is where you create a company, you raise some venture capital, and you build a product that is either an explicit commercialization of the open source project, or you build some kind of a vertical solution that's powered by the open source project. And so there's a couple of issues that can happen here. So one issue is where you create a conflict between the needs and the business needs of the startup and the open source project and its user base.

Starting point is 00:35:48 And so that would take the form of, I've seen any number of things from license changes to holding back features, like basically maintaining a private fork of the project and reserving like pro features or features that you don't want to release to the open source project because it will might undermine your edge in your business there can also be governance challenges because there can be governance challenges because you as a startup you want to be able to move fast you don't if your goal is to create a healthy relationship with the contributors that are outside of the company it does create an implicit negotiation with contributors that are not your colleagues. And so what can sometimes happen is that the company will become like a, you know, pejorative term would be like a backroom call. So they communicate in private, they decide to

Starting point is 00:36:35 make changes, and then they push through, and they push through changes in the project without getting the buy-in and convincing the other maintainers. And so the other contributors might feel demotivated because they feel like second class citizens if they're not working at the startup that is commercializing the open source project. Another thing that can happen that is also very common is that the investors in the startup can take operational control of the company as a result of firing CEO or the founders losing board control. And that may lead to a shift of shifting of budget and more or less like developers being laid off or reallocated to work on other parts of the company that are deemed to be more in line with generating a return on

Starting point is 00:37:26 investment for the investors. And so sometimes you can see like, okay, the company is really engaged in this project. And then at some point there's a shift of, there's a leadership change, or there's some other shift in the company status. And then the developers just disappear. And it's like, well, my boss says i have to work on something else and so suddenly like you're no longer getting paid effectively to work on the open source project so sudden suddenly getting like defunded to to work that can definitely happen and relatedly i mean projects can also be dependent on development infrastructure provided by a company and so that that can create another source of risk that if that suddenly disappears, then yeah.

Starting point is 00:38:07 So anyway, we've seen all of these things and this is one of the issues that causes communities to fork. Like if they don't, if they, you know, if they like this, like a fork, like this happened with Presto, like the SQL engine. So there was the fork to PrestoDB and Trino.

Starting point is 00:38:29 And this wasn't a startup issue per se, but it was provoked by, my understanding, it was provoked in part by a governance conflict between Meta, Facebook, and the open source community of developers working on the project who did not work at Facebook at the time. Yeah, that was interesting to see, by the way, to actually see Presto being forked to Trino. I read the post, I think, at five-year anniversary for Trino. They wrote about some of this historical

Starting point is 00:38:55 context and how Trino came to be. And this was one of the things they highlighted there. Like, these are the reasons for actually doing a fork. And if you look at things right now, at least I know at LinkedIn, we used to use Presto very heavily. But since this fork, over the last, I want to say, at least three plus years, we have been mostly using Trino. I shouldn't say completely,

Starting point is 00:39:13 but most of it, like a lot bigger part of our infrastructure is moving there. And you see that community shifting over to Trino as well. I was following that space for a little while, so I saw some of this shift at the time. Okay, the third trap, the corporate user trap.

Starting point is 00:39:28 Can you tell us more about that? Yes, like the big company trap. That is similar. I think there's similar think what you see there is that, that it's easier for developers to get to, to shift around or get moved off of a project. And so developers shift in and out of working on, like, I was just looking at some component in a Microsoft open source project, and there was a developer who just left Microsoft. And so essentially this did disappear from the, from the project. And so I guess this could happen with developers,

Starting point is 00:40:06 a developer working at a startup that's working on an open source project, but particularly in big companies, priorities and budgets can change on a quarterly to annual basis. And so this can, and some companies are notorious for their priorities shifting or being somewhat flippant.

Starting point is 00:40:24 Especially in this environment, for sure. Yeah. And so whenever a project becomes too dependent on the generosity of a particular big company, that can also become a source of risk because you're dependent on having the support of a particular vice president or senior vice president who believes that the project is important,

Starting point is 00:40:44 something important for the company to to be maintaining and contributing to but that that could change based on the vicissitudes of the company and its quarterly performance and things like that so and yeah and then you also see some of the some of the government some of the governance conflicts where decisions are how decisions decisions are getting made. Like there's product managers involved and like other corporate apparatchiks. And so, yeah, it's again, open, like big corporate open source can be done well. I mean, look, I think Microsoft has done, has done an outstanding job, but we've seen plenty of scenarios where, where things have gone, things have gone the other way. And I mean, look at, I think if you look at the MySQL, MariaDB, there was a community fork in part because of bristling or challenges

Starting point is 00:41:34 working with Oracle, I think, right? Oracle, yeah, Oracle, MySQL. And so it's a very common story, and in particular when an open source project is part of some product line or is related to some profit center of the business. Ultimately, corporations have, in most cases, have a primary obligation to their shareholders. And so, yeah, that can easily come in conflict with the needs of the open source community. I think over time, as you mentioned, like this idea of having lone wolves working on an open source project is changing with a bunch of companies doing open source.

Starting point is 00:42:08 And in many cases, I think the successful open source projects you see are not the ones which have only one company behind it. The ones which have multiple big companies behind it because not one company will have dominance or won't be able to govern the entire project themselves. It becomes more of a community thing. And you're not dependent on only one company at that point. And this is something we see very commonly in many of the cloud offerings that companies build on top of. So like the open source products that companies build cloud offerings on top of, where multiple companies are incentivized to improve that offering, for example.

Starting point is 00:42:41 And that essentially translates to some of the things they're offering as a cloud. But it's not necessarily true everywhere. Like I don't know if you saw this recently in the XZ compression library, there was this backdoor injected where this person did like social engineering for, I don't know, three years or something like that. I might be misremembering that. Yeah.

Starting point is 00:43:03 Maybe two years. Yeah. Yeah. like that i might be misremembering that but yeah maybe two years yeah yeah i think the xe uh lib lzma thing that was you know the level of sophistication it must have been a it must have been a state actor like a whole a whole shop of black hat security hackers creating obfuscated back doors into you know a important kind of component in the linux in the linux supply chain i think one thing i yeah I guess we didn't really mention is how the at times predatory relationship

Starting point is 00:43:30 between the major cloud vendors and open source projects, and that's precipitated license changes and like the anti-AWS licenses, like source available licenses. Like you may do anything you want, but with this open source project, you can except operate a cloud service in a company with more than $10 billion a year in revenue. And so there's only like a handful of companies that-

Starting point is 00:43:51 Whose name starts with A. Yeah. So I think this has like the corporate part kind of has like a cool tie-in to kind of your decision of leaving and joining Posit, but before getting into that, I want to kind of rebuy rewind a little bit to the, uh, the startup trap. And so you founded, um, or what led to Voltron data given sort of these challenges? Like how did you deal with it when you were starting a company?

Starting point is 00:44:21 Yeah. So we worked on, so we created Ursa labs in 2018, which was a not-for-profit development group that was funded by RStudio, which is now Posit, and Two Sigma and NVIDIA, Intel, like some other financial firms like Bloomberg. So I wanted to create like a non-profit industry consortium to fund aero development. And that was going great for a couple of years. And we were seeing significant demand to put a lot more firepower into the aero ecosystem and companies that were interested in having support, like a formal relationship, development relationship with a company behind the aero ecosystem. And so it was an interesting challenge

Starting point is 00:45:05 to set up a company to create, pursue a product vision, but to create guardrails and to have that startup trap in mind. How could we build an open source team that's driving forward progress in the Arrow ecosystem and some of the peripheral projects while at the same time having investors and doing

Starting point is 00:45:25 enterprise product development. And so I think partly it helped that when we created Voltron data that we had very clear expectations with our investors that open source was a huge dimension of how the company would be successful over time, that creating open standards and protocols and building this open source composable data stack was an essential aspect of how we would be successful. And so for people who are not aware, like what the company is doing is while we do enterprise support and open source partnerships for the Arrow ecosystem, but the company also builds a accelerator native, like GPU accelerated execution engine, which can be incorporated into different data processing systems to essentially enable modular GPU acceleration.

Starting point is 00:46:13 And it's all arrow based. And so it's something that needs to be able to plug into all of these different systems. And so to develop these open source projects and standards and protocols to make that all work seamlessly is an essential aspect of how that will succeed. So getting that buy-in from investors, I think, helped us avoid the startup trap. And the company has a team of 20-some developers who are largely working full-time on open source. And it's over a period of many years. So to be able to invest decades of person years in the open source ecosystem has been a game changer for Arrow. In this case, you mentioned this company was not for profit?

Starting point is 00:46:54 No, this was Ursa Labs. Yeah, so Ursa Labs were functionally like a satellite of RStudio Posit. So we operated independently. They handled the back office, like payroll, health insurance for us-based employees, things like that. And, uh, and so in 2020, we spun out from, we spun out from, from our studio to create Ursa computing. And we raised a venture round in August, 2020. And then at the, at the beginning of 2021, we, we joined up with the leadership from Rapids and, and Blazing Sequel to sort of mash everything together.

Starting point is 00:47:34 And we created a new brand identity Voltron Data. And, and then we, we raised more money for, for Voltron Data, a couple rounds, kind of one and seed round in 2021, supersede, seed two, I guess, as we'd raised a seed for URSA computing, and then a series A in January, 2022. I see, got it. Yeah, the reason I was asking is because from a not-for-profit perspective, like if that is the case, then it might become harder to hire engineers because

Starting point is 00:47:59 at some point you have to figure out compensation for people working on this. And if it's not competitive enough as compared to other companies, for example, then you don't have the right quality of engineers working on the problem. Yeah, that's true. And that was, I think that that was indeed a challenge in the, in the Ursa Labs era that, that there, there were really talented engineers that I was interested in, in hiring to work, work full time in, in thetime in the Aero ecosystem and simply because of the economics of Ursa Labs,

Starting point is 00:48:29 like the funding model and what we could afford to pay in terms of salaries and there was no, you know, really no equity to offer because it was, you know, a not-for-profit endeavor. And so, you know, I think we had a great team, but to be able to scale up and also to hire, you know, I think we had a great team, but to be able to scale up and also to hire, you know, to hire people who could easily go work for, you know, the big tech companies or Google and make a lot more money. So I think that that was partly that was partly the motivation, not only to have have a larger team to be able to put more resources objectively into into aero development, but also to be able to hire individuals that have a lot of, a lot of career opportunity. So, um, I guess historically, would it be fair to say that one of the cons has been right, uh, compensation since you can't offer

Starting point is 00:49:15 stock, but in terms of pros, in addition to the mission, um, the flexibility, right. In terms of location, because I feel like a lot of the great people that have helped me, I think when I was first getting into Kubernetes, like there were like two people that really helped me out. And then they were just like living in the middle of nowhere in the States, where I imagine, I guess, at least a few years back, it would have been difficult to kind of go to like a bigger tech if you want to have that lifestyle. But then I guess that's also changing now since companies are more open to remote. Would that be fair to say?

Starting point is 00:49:51 That's, yeah, that's definitely true. I think COVID definitely helped with changing culture as far as like hybrid, you know, hybrid and remote. But yeah, I've been, you know, working in a remote, remote only capacity for, you know, the last, yeah, six years or so. And, you know, it has its pros and cons. But for, you know, for open source development, it's ideal because you can hire people where they are. I've worked with, like, a lot of people in Europe.

Starting point is 00:50:19 And I think, you know, Europe is really friendly for open source developers because health insurance is separate from employment. And so if you are in between maybe in between full time jobs and you want to pick up like a contract to do some open source development, that's something that you can do without putting your family's health at risk. Whereas in the United States, I think there's definitely, there's definitely a psychological burden of losing continuity of healthcare coverage. And that does, you know, lead people to, to, you know, to not not not make decisions like that. And so having managed a global, you know, global workforce, you know, people around the world and different countries, so I've gotten to see like the different like the psychological impact of that you know yeah that the health insurance question has on people so I think open source will be much better off if everyone had had you know at least a guaranteed level of you know a basic health care right right interesting

Starting point is 00:51:19 and just going back a little bit so about Voltron data so you mentioned you were able to avoid some of the startup trap when you were being very clear and with the venture funding. How did you, so like being very specific. We've avoided it for now. Yeah, maybe not forever,

Starting point is 00:51:37 but we're really doing our best and we wish to be good stewards in the open source projects that we're involved in. And I think by choosing investors that understand that as well, I think is part of ensuring that that will remain the case. Right. So being specific about how do you deal with... Because one of the issues I saw to me that's like, oh yeah, that is very hard,

Starting point is 00:52:01 is how do you balance like which features to, um, uh, to open source versus what to keep for your enterprise, um, version? Like, how did you guys go about making those decisions at Voltron Data? Well, at Voltron Data, I mean, anything related to core arrow or anything that is like projects that we want people to, uh, like interfaces protocols, interfaces, protocols, like we've developed, been working a lot on database, better database connectivity, like ADBC, which is the Arrow Database Connectivity API standard, and FlightSQL, which is a wire protocol for databases to offer SQL support. And then we've, you know, gone and partnered with, you know, partnered with Snowflake, for example, to integrate that into their drivers and to make AeroNative connectivity work better for Snowflake users.

Starting point is 00:52:53 And so there, all of the pieces of technology related to that need to be fully open source. And so there's nothing that's held back um i think the company's main product uh theseus it's a gpu accelerated uh modular execution engine um i think there's there's very clear separation between like this this um this system that runs on a rack of you know rack of uh you know a100 or h100 gpus it requires kubernetes like it it requires Kubernetes, it requires basically an enterprise data center type setup to use. And so it's a pretty clear delineation between software that's involved with building and operating Theseus and also the types types of users, uh, like you need to have certain types of hardware available to use the system at all. And so I think at least at the

Starting point is 00:53:50 moment, um, it, it's not open source. It may be that it becomes source available or in some capacity in the future. Um, I, you know, I, it's hard for me to predict and, and, you know, but that there's, you know, it's, it's a, it's a specialized product for organizations that have very large data sets and like the over 10 terabyte type data sets where you can get 10x or 50x performance improvement or efficiency improvement by using racks of GPUs to do that processing. Or maybe you've got a data center, like you've built a sort of infrastructure for doing LLMs and machine learning, and you wish to also be able to do your analytics and future engineering directly on that hardware so you can shorten the whole pipeline,

Starting point is 00:54:38 run it on one sort of consistent set of hardware and get a lot better performance that way. So in a sense, like the kind of the market or the user base for that type of system is a lot narrower than say, you know, say PyArrow, which is, you know, a Python library and has, you know, millions of users and downstream and, you know, tons of downstream projects that depend on it. So yeah, so I think ultimately it comes down to a question of like, who is the audience? Like who are the potential users? Are people is something like a project you intend people to build other open source projects on top of? Or is it like a solution kind of an end solution in and of itself? are a lot less sensitive to copy left licenses like the GPL

Starting point is 00:55:25 because in a sense, like the development environment is itself, it is in self an end. So you could build extensions to it, but people don't really need to depend on the project in the kind of sense that you would like a project like NumPy or Pandas where like this is, these are projects are like essential library dependencies of building something else and so

Starting point is 00:55:48 if they were if they had gpl licenses that would uh constrain constrain use and you know the same logic applies to close for software um so it's like you know it's like what you know what aspirations do you have for a piece of software and so i think and so you know like, you know, it's like what, you know, what aspirations do you have for a piece of software? And so I think in so, you know, when we, you know, kind of made all of our early decisions and, you know, what to build and, you know, licensing and things like that with Voltron data. Ultimately, like our decisions were about like, how do we how do we grow the composable data stack happen faster, enable these modular pieces, modularization, like what open standards are missing? How do we design those open standards? How do we build libraries to make it easier for people to use them?

Starting point is 00:56:38 And so that's, you know, so there's, yeah. So we've been very busy with a lot of things from Substrate to Arrow to these Arrow kind of new protocol projects, IBIS for Python. Yeah, our open source footprint of the company is pretty significant. Nice, nice. like as open source become more critical components of any software business do you envision like innovations in terms of funding models so like patreon but like a very like i don't know right like something that makes it like uh that's kind of opens up to a new traps but like uh that's a bit different from like what we've seen before yeah i think that's something we didn't really get into is like other other funding models people have had for you know for open source um but there's open collective there's there's patreon

Starting point is 00:57:36 there's github sponsors there's a new uh there's a new platform called polar um which is kind of like a GitHub sponsors or Patreon alternative. And there have been a number of developers that have been able to successfully support themselves and get a lot of sponsorship these ways. It can be hard to get big dollars to be able to pay a full-time full-time team of developers.

Starting point is 00:58:06 But, but, um, in a number of cases you have individual project maintainers that are able to support themselves as individuals, if they have like a prominent, prominent enough role in the project. I think what, partly what they've been doing is, is monetizing access to themselves, either creating like exclusive content or like having a private Slack channel or a private Discord, where if you're a sponsor or a patron of the project that you get exclusive access to talk with the developer about your needs

Starting point is 00:58:39 versus like, you know, if you're on GitHub, like anyone in the world can open an issue and bother you at any time of day or night. It's like getting ahead of the line. Yeah, so there have been some successful examples of doing that. And I think that these models like Open Collective and like crowdfunding or, you know,

Starting point is 00:59:03 crowdfunding platforms for open source support, they're definitely very helpful. And we didn't have them. We didn't have them a decade ago. So it's been a big improvement for projects that have been able to make it work. So a bit of a hard pivot, I guess.

Starting point is 00:59:20 So you recently became a general partner at Compose Ventures, which does early stage investing in data infra and AI companies. I feel like throughout your career, like, right, you're like really good at picking up new skills, like, you know, open source, writing a book, building a company. So I imagine venture capital is not like a new skill that you're like building so i'm kind of curious like what's like generally what's the approach that you use to like learning new skills that like you've developed over time and how you're applying that to a venture yeah so i so there are a couple of things there so i mean as we um as we built out the era project and made it more successful, people started reaching out to me to get feedback on projects that they were thinking about founding or new projects that

Starting point is 01:00:12 they're working on. And just for getting my advice on technical matters or asking me for favors for other things. And at some point I started asking them if they, you know, would let me invest in their funding rounds, whether their friends and family round or their seed round or something like that. And I was never investing a lot of money, but, you know, it was just a way for me to be involved and to have some skin in the game to, you know, help, help, you know, help, help, help these companies be, be successful. And, you know, as time, as time went on and the, and the ecosystem has gotten a lot bigger, I think there's a couple of things. So partly I wanted to be able to, so I, some investors got in touch with me and wanted, wanted me to invest some of their money on, on their behalf and into the types of investments that I'd made in the past

Starting point is 01:01:07 since I have an interesting network and can get in touch with companies maybe before when they're raising only a small amount of money before they go to raise like a larger round. And I also wanted to create more content and messaging around the super trend of these composable data systems and the composable data sack like what we've seen with you know modular acceleration

Starting point is 01:01:32 as similar to what similar to what we're doing with voltron data but also we're seeing modular acceleration projects out of out of meta and out of out of apple and um and different things there. And so basically what's happened is that people are building new versions of old products, but with these really high quality off the shelf, open source components. And that's in a sense like what we wanted to happen. And so the fund gives me a way to invest in those companies, but also to create more awareness of

Starting point is 01:02:07 this, this trend that is taking place with all these different companies, which are building on building on arrow. We're building on some arrow offshoot projects like data fusion, which is a rust, um, based, um, embeddable query engine, a modular query engine, um, building on duck DB or things like that. Cause we've worked very hard to enable all these pieces to exist and to make them fit together nicely. And so it seems like a healthy sort of ecosystem shift that's taking place. And so this gives me a way to be involved with founders and to, uh, uh, to help companies, um, you know, uh, get off the ground. Um, but also for, for people to be like aware of like, you know, different people working on different

Starting point is 01:02:50 approaches to solving, you know, old problems with these new kind of, uh, um, open source tools. I see. How do you like it so far? I mean, my goal is for it to not become a full-time job. I, uh, so it's something that I'm doing part-time and, you know, my, my full-time job. So it's something that I'm doing part-time. It's, you know, my full-time engagement is with, full-time engagement was with Posit. I still, you know, I'm an advisor at Voltron Data. I advise a couple of other companies, LanceDB, Union.ai.

Starting point is 01:03:21 So I have, you know, kind of one leg in the startup venture world. And then, you know, one I have, you know, kind of one leg in, in the startup venture world. And then, you know, one leg is, you know, as a software architect at, at, at Posit. But yeah, I, so I, I, I've enjoyed it so far. You know, the fund just, just started in, in January. And so I've made a couple of, you know, the first couple of investments, but, uh, it's, uh, yeah, I, I, I don't currently have plans to become a full-time investor or to raise a large, like a large fund, but, uh, to have a small fund that enables me to, you know, write, um, you know, medium sized, like angel checks or like super angel type type checks, um, and be helpful to

Starting point is 01:04:03 founders. Uh, yeah, I think it gives me a meaningful way to be involved and type checks, um, and be helpful to founders. Uh, yeah, I think gives me a meaningful way to be involved and, uh, um, you know, and yeah, I, I maybe, you know, maybe the investments will make, maybe the investments will make money, but I'm, I'm not doing it as like a, you know, as a way to become, um, you know, strictly, I would clearly like, you know, I'm putting, you know, putting capital at risk. And so I hope that, that, you know, the investments will make, you know, as much or more money much or more money than buying real estate or investing in the stock market. But my primary goal is I wish to accelerate innovation in the space and help people succeed. Nice.

Starting point is 01:04:37 So I did a little incubator a few years back. Had a terrible idea. So I feel like I'm qualified to ask this question. So let's not pick on the worst idea you've heard but what's the second worst idea that someone's pitched you the worst idea that somebody has pitched to me the second worst but the second worst well i have a hard time sort of remembering but yeah probably wouldn't be appropriate for me to share sorry i don't want to hurt anybody's feelings.

Starting point is 01:05:05 Sorry, sorry about that. I tried to, you know, my bad. No, that's fine. I'm actually going to ask you, mention this new trend, which is composable data sacks. So I work more on the compute infrastructure side, less on the data infrastructure side. I just know a little bit about the data space. And historically, I've seen a lot of these projects being open source all the way from like storage data like Hadoop, for example, or processing layers like Spark

Starting point is 01:05:31 and then streaming layers like Link. And then you look at data formats, protobufs and thrift, whatnot. Apache Arrow is another example. When you see data scientists wrangling data, they use Pandas, NumPy. So in my mind, data stacks have been composable, but I'm not sure what you mean by the new trend. So it would be good if you could

Starting point is 01:05:51 describe what you were referring to. Yeah, so the, the general idea is that is is building a building a system, while making use of as many, you know, open standards or protocols for different layers of the stack. So, for example, at your storage layer, projects like Parquet and Iceberg. So Iceberg is an open standard

Starting point is 01:06:17 for kind of an open source data lake format that's interoperable across many different execution engines. Parquet, an open standard for file format for, for analytic data storage. Um, there's execution engines, which, um, can be, um, the goal ultimately is to be able to hot swap or, uh, to be able to, uh, sort of, uh, hand off work, um, like choose which execution engine to use based on like what will deliver the best performance or the best efficiency for a certain workload. And so to be at the query optimization level

Starting point is 01:06:56 or the user interface level, if your user interface and your query optimizer is loosely coupled to the execution engine and to the storage, this enables you to make a different decision about which engine to use and kind of other decisions about... You can also incrementally make improvements to the stack

Starting point is 01:07:16 or incorporate new components in a way that's less disruptive. And so it's challenging right now because I think some of these, these things are still, um, in, in their early days, but they, um, they're, they're, you know, rapidly developing. And, um, and so, you know, it's our hope that, you know, kind of in the coming years that, that, uh, that it will be a little, that building systems like this will be a bit less bleeding edge and like a more obvious and like the, the, what's considered to be the best like the best choice uh for how to build new new

Starting point is 01:07:51 data systems makes sense it sounds like uh even open source projects have this thing of buy-in in a way it's like uh sorry lock-in in a way it's like yes you can change it but changing is super expensive what it sounds like is these modular systems can make it easier for you to swap one out versus the other. Right. That's right. Makes sense. Well, Wes, this has been an awesome chat. Thank you so much for taking the time today. We learned a lot through this conversation, and I'm sure our listeners will too. Thank you so much for joining the show. Yeah. Thanks for having me. I enjoyed it. Awesome. Thanks a lot. Hey, thank you so much for listening to the show. You can subscribe wherever you get your podcasts and learn more about us at softwaremisadventures.com.

Starting point is 01:08:35 You can also write to us at hello at softwaremisadventures.com. We would love to hear from you. Until next time, take care.

Your Ad Here

Software Misadventures - The 3 traps of open source funding models | Wes McKinney (pandas, Voltron Data, Posit)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.