Software Misadventures - The 3 traps of open source funding models | Wes McKinney (pandas, Voltron Data, Posit)
Episode Date: June 25, 2024From creating one of the Python’s most influential libraries to co-founding Voltron Data, Wes joins the show to chat about why the book cover of the pandas book doesn’t feature a panda, open sourc...e pitfalls to avoid, the pros and cons of hiring engineers at a non-profit, and more. Segments: (00:02:50) Guang’s complaint about the pandas book cover (00:04:38) Quarto and Open Access Publishing (00:12:00) Convincing Wall Street to Open Source (00:15:31) Publishing the first python package over Christmas (00:18:01) Doubling Down on Building pandas (00:23:23) Personal sacrifices for the sake of impact (00:26:28) The Evolution of Open-Source (00:29:19) “Open source development started out as a very privileged activity” (00:32:40) The Consulting Trap (00:35:17) The Startup Trap (00:39:29) The Corporate User Trap (00:44:21) Avoiding the Startup Trap (00:46:54) Non-Profit vs. For-Profit (00:48:09) The Challenges of Hiring Engineers in a Non-Profit Setting (00:50:08) The Benefits of Remote Work for Open Source Development (00:52:15) Balancing Open Source and Enterprise Interests (00:57:25) New Funding Models for Open Source? (01:00:01) Getting into VC (01:06:19) The Future of Composable Data Systems Show Notes: - online edition of pandas book: https://wesmckinney.com/book/ - the new digital publishing tool that Wes recommends: https://quarto.org/ Stay in touch: 👋 Make Ronak’s day by leaving us a review and let us know who we should talk to next! hello@softwaremisadventures.com Music: Vlad Gluschenko — Forest License: Creative Commons Attribution 3.0 Unported: https://creativecommons.org/licenses/by/3.0/deed.en
Transcript
Discussion (0)
Creating open source software, like it's very difficult.
And for me, it's been very emotionally draining
because there's a lot of like,
you have to soldier through like the dark days
of the project where there's not that many people that care
and you have a conviction and a belief
that what you're doing is important
and is gonna have impact.
But that impact is gonna be realized
like far into the future.
Like work that you're doing today,
you're not gonna see the impact of that
or feel recognition or see the value of that work for at least six months, like probably even more than
that. And so it's like, it's very deferred gratification. I think this goes back to like
making open source your vocation. That should be a full-time job if you want to do open source.
Going back to the second trap that you talked about, which is the startup trap.
Can you tell us more about that? Yeah, the startup trap is where you create a company, you raise some venture capital,
and you build a product that is either an explicit commercialization of the open source project,
or you build some kind of a vertical solution that's powered by the open source project.
And so there's a couple of issues that can happen here.
Welcome to the Software Misadventures podcast.
We are your hosts, Ronak and Gwan.
As engineers, we are interested in not just the technologies, but the people and the stories behind them.
So on this show, we try to scratch our own edge by sitting down with engineers, founders, and investors to chat about their path, lessons they've learned,
and of course, the misadventures along the way.
Sweet.
Yeah, Wes, thanks so much for joining us on the show.
Thanks for having me.
So as the creator of Pandas,
you wrote the book Python for Data Analysis back in 2012.
I really liked how hands-on it was
when I was learning data engineering back in 2015,
this was. So thank you for that. But I would like to file a complaint about the cover of the book.
Okay, so for context, the book was published by O'Reilly and O'Reilly books all feature like
these different animals. Can I just say how sad, you know, I am that the featured animal was not a panda but it wasn't it wasn't
even a snake it was like a weasel like what kind of animal was it yeah i it's funny when i was
working on the book i you don't get to as an o'reilly author you don't get to choose the um
you don't get to choose the animal on the cover so i um i you know the injustice i suggested i
was like so just to you know say it would be cool to have a panda on the cover.
I think what they said was, oh, we're saving the panda for like, like something really big.
What?
The panda is awesome.
Which is kind of weird.
That's messed up.
Well, it's funny because I think that the book ended up being way more successful than anybody expected. Because when you go back, like when I originally got the book contract with O'Reilly
was in like November, 2011.
And it was a little bit,
it was definitely experimental at the time.
I don't think anyone had a really clear idea
whether Python was gonna become a big deal
in mainstream data analysis
or what we now call data science.
So I think the fact that the book has been so successful and it's been translated into like
10 languages and has sold, I don't even have the full count, but my guess is like
three, 400,000 copies, like kind of like ballpark, like maybe more when you account for all of the
subsidiary translations and things like that. But I think that's because it's become a reference textbook for a lot of university
courses. And so that creates, creates consistent around, around the globe demand for the book.
And it's funny, sometimes I get emails from people who live in a country with, with sanctions and
aren't able to, for example, I've gotten emails from people in Iran and they
say, I pirated your book. I'm sorry. Is there some way that I can pay you? And it's like,
literally, they could not buy the book because of sanctions. But now with the third edition,
the content's freely available online. And given the book is now 12 years old, and I've tried to
update it and keep it relevant
and keep up with the changes that that happen in pandas it reminds me that i i have like a pending
queue of erotic effects and to get a new printing out to fix like it's basically the way it works
is you make a new version which contains the major edits to the book and then as over time you fix
little things and then they'll update things at the printers so so people get like
little bugs fixed in the book patch releases i would call them and was it hard to convince them
to hey uh we're gonna have just have it free on pdf like for the last edition it was somewhat
tricky i think the fact that there were there was precedent for open access books like R for Data Science is one example.
So when Hadley wrote R for Data Science, they had the book available for free as it was being written.
And they had the stipulation with their contract that we will only do this with Riley if we're able to release it as open access. And I think that helped show that having the open access version
actually doesn't hurt book sales, like print sales as much as you might think.
And I've been, I actually expected that having it available for free would reduce print sales. But
to my surprise, the print sales have been pretty stable. Or maybe they were hurt a little bit and maybe the market got bigger because there's more and more people doing Python.
But I got permission to release the book in one and only one open access format.
So I got to pick whether it would be Jupyter Notebooks on GitHub or a website or whatnot. And so I chose the website because I thought that having the SEO and the
ability to like go to westmckinney.com slash book and search the whole book, you know, and get
instant results was pretty, like a pretty useful feature. So, and, you know, JJ Allaire helped me
port the book to Quarto, which is an awesome new technical publishing system that I've been
recommending to everyone. And so partly why,
the reason why the book looks so nice
and so easy to browse and search on the website
is because of Quarto.
I see, I see.
Very cool, very cool.
By the way, quick question on Quarto.
When you say it's a new digital publishing tool,
can you say more about this?
I'm just curious what this is.
I've never heard of it before.
Yeah, so if you go to quarto.org,
it's a language independent technical publishing
system that under the hood it's powered by at the core of Quarto, you have Pandoc, which
handles transpilation between different document formats.
But Quarto has become a pretty big software project that handles creating books and blogs
and websites. And you can
use basically write a book using Jupyter notebooks and then use Quarto to stitch the notebooks
together to create a book-like structure. And then Quarto handles all the orchestration of
rendering the Jupyter notebooks, converting the output of the Jupyter notebooks into the appropriate, like the necessary output
format given your book publisher. So for example, O'Reilly Media uses ASCII doc and doc book XML
as their input formats for publishing. And so Quarto knows how to go from Jupyter notebook
with various tags and markdown cells and code and everything. And you can add special annotations within your
Jupyter notebook to handle particular things that have to do with a Riley's tool chain.
But my book was written in DocBook XML in 2011, 2012. And so I actually got like really good
at writing XML. And I have all these like, I have all these emacs shortcuts for generating XML tags
for DocBook XML. That's not something that I would recommend to everyone. But it's something that I and I have all these like, I have all these emacs shortcuts for generating XML tags for a doc book
XML. But it's not something that I would recommend to everyone. But it's something that I was just
forced by necessity to get good at. But what basically what we did, what JJ helped me do
with Corto is write Pandoc filters, which are written in Lua, to convert the book from doc
book XML into Corto markdowndown which is markdown plus some extensions
and customizations for a quarto and then i can use quarto to render the book to a pdf or to a website
or in principle any output format so originally like the history with quarto is that jj and his
collaborators created to have created multiple other dynamic web publishing systems.
So there was ColdFusion in the 1990s,
which was one of the original dynamic web page frameworks,
along with CGI and PHP.
And then he and his company created what ultimately became Windows Live Writer.
And then they created R Markdown early on in R Studio,
which turned into Posit.
But R Markdown was a basically technical document publishing framework where you could write
Markdown interlaced with R code, and it would handle all of the rendering and outputting
to different formats.
So Quarto is kind of a reimagining of all those things built on a very modern foundation.
It generates portable binary.
It ships a whole JavaScript runtime,
uses Deno, which is like kind of the fancy
Rust-based Node.js runtime.
But it's very easy to deploy.
And yeah, I think it's a really cool project.
And so I've also been encouraging
a lot of like open source projects
to migrate their project documentation
and websites to use Quarto
because we did that for the IBIS project, for example, and it generated really good results.
That's what I was thinking, actually. I was just navigating the book on your website. It
looks really good actually, and super easy to navigate. I was thinking for at least many teams,
they could use something like this for internal documentation, for example, to make it look such
nice. Yeah. Yeah. So you can think about like
creating internal websites using Quarto. And actually, one of one of Posit's enterprise
products is called Connect, which is a basically a secure publishing system for internal
publications. So it could be documents, Jupyter notebooks, really anything you could
create with Python or R with Corto can be published dynamically to Connect and you can set up fine
grain permissioning. So imagine like you had some Corto document or some set of documentation,
you only want it to be visible to one team with inside your company and you want to set it up to
deploy from a GitHub repository, something like that. That's something that you can do with
something you can do with Connect. So it's all interconnected. But if you want to use it up to deploy from a GitHub repository, something like that. That's something that you can do with Connect.
So it's all interconnected.
But if you want to use Quarto to generate a Confluence page
and put it in Confluence if you're an Atlassian customer,
that's something you can do also.
Super cool.
Cool, cool, cool.
So, sorry, coming back.
Yeah, a bit of a tangent.
I'm a huge, in summary, i'm a huge in summary i'm
a huge fan so no we'll definitely link it in the show notes for check it out so a couple years back
you wrote this post announcing ursa laughs in which you mentioned these three traps about
people working in open source in terms of like how they get funding. I thought this was really cool because it kind of ties different parts of your
career sort of together.
I like how you're like, yeah, I have direct directly experienced
some variant of all these problems.
I'm a big believer of experiential learning.
So I think that, right, that's the only way to really get
understanding of problems.
So I thought that we can kind of go into these
different traps, interweave things together. So the first one is the consulting trap. And I think
that kind of maybe ties back to like pandas. So to kind of get us started, this is like early on
in your career after college. So you worked in finance at a hedge fund and that's where you
started building pandas and eventually made it public.
And then shortly after that, you actually decided to pursue a PhD in stats at Duke.
So you mentioned this yourself, that financial institutions, they're not really charitable to open source.
We're both very curious, like how did you manage to convince them to open source it?
It wasn't easy to convince them. I will say that in the last 15, I guess,
17 years since I first got involved in working in finance in the mid 2000s,
that financial firms have become a lot, have seen the value of making things open source.
And so not only AQR where I work, but Two Sigma, Bloomberg, Jane Street, these companies have
released a lot of open source software. But to get companies like this that value their
intellectual property so highly to dip their toes in open source was not easy. I think at the time,
it took maybe six months of discussions and convincing. And ultimately, I made the argument
that yes, we'd be giving
away potentially some secret sauce that would help our help the company's competitors, like be able
to work with data more easily. But I also talked about the that the likelihood that Python would
become more widely used. I think at the time, like de Shaw, for example, had begun to use Python for certain
things. And so there's a little bit of a cost benefit. So if you release a piece of open source
software, you have a better chance of your thing becoming the main thing. And that creates a lot
of network effects and value within the open source ecosystem. But if you don't open source,
and then somebody else, somebody else open sources is their thing and then that becomes popular then you're sort of on an island and so you it's like building bridges and doing trade
with your neighbors versus having a very isolationist mindset and so there's definitely
pros and cons to to the different approach if you create something really valuable maybe you want to
hoard that invention and use it to use it to your maximum benefit, but there's also downsides. And I think
it helped that I was very, I was very keen to engage with the open source community. And so
I made the argument that I would use pandas like early pandas as a tool to better engage with the
open source community, use it to recruit people to come work at the company. Maybe if the project became popular,
then people would learn how to use it
and they would want to come work at the company
to be able to have a job
where they could use pandas as part of their jobs.
And thankfully, I think all of that
has basically come true.
And so now AQR can hire new college grads
and they show up on their first day
and they know how to use pandas and Python and they can be productive working for the company pretty much right away,
which is very different from the old way that many financial firms used to operate,
which is they have these very proprietary tool sets, proprietary data analysis tools and systems.
And so new employees would face a pretty significant learning curve to be able to
get up and running.
And there was a lot of debates about licensing.
I think some of the lawyers wanted to use the GPL.
And of course, the Python ecosystem is not very GPL friendly.
And so if you put GPL on something, a lot of Python users, almost as a matter of principle, are not going to touch the library because they're concerned about the viral effect of the GPL. But eventually we agreed on using the new three-clause BSD license
and putting it out there. But I think I initially started having the conversation about open
sourcing it sometime in early to mid-2009 and was only able to really push it through at the end of,
I think the first Panda 0.1 was released on New Year's Eve 2009. So I was ignoring my
family or friends, or I can't remember where I was to get it up on PyPI. I'll tell you the anxiety
of publishing my very first Python package. It was pretty intense. It got easier after that. But
yeah, the first time was hard. Wait, what was that like? So this was like,
literally New Year's Eve, and then you're just like pushing it?
Or how did that go?
Yeah, I mean, if you look at,
if you look at, you know, pypi.org,
oh boy, there's so many like projects
that are not pandas in PyPI,
which is, let's see here.
Pandas 3, my goodness.
I swear the Python package index is full of malware.
But yeah, if you go all the way back to,
no, I had it wrong.
It was released on Christmas, Christmas 2009,
which is even worse in the sense of like,
of neglecting my family.
But I don't
remember where i was at the time but uh you know i had time over the holidays and so given that
that the open source side of panda started out as my side project and side interest and as long as
continue to be maintained and work well for my job that was enough nice nice and i mean that's like a
lot of work to go to have gone through right six months of like discussions and you know meetings and then really pushing for it um i think a lot of people to go to have gone through, right? Six months of like discussions and, you know, meetings,
and then really pushing for it.
I think a lot of people would have given up, right?
Especially at that point,
it wasn't clear that this is going to have the kind of impact that it does
today. What gave you the conviction of like, you know,
this is worth me like pushing for?
Yeah. I'm trying to place myself back in that, back in that mindset.
It's been 15 years, but yeah, I felt that there was a lot of potential. I thought whole like research set of research tools that we created
in python were so much better than what i had used was using prior to that so i i felt like
there was this potential to create have like a really transformative effect on people's
productivity or just making data analysis data science a lot more accessible and making it open source. And so I clearly had a strong conviction and it was something that I really wanted to do and see
about. But yeah, I would say that it languished for a little while for maybe, languishes maybe
putting it strongly, but for about a year, because I got busy doing some other things,
I applied to grad school, I started a PhD. And it was only when I
started getting contacted by other companies who wanted my advice on switching to Python
from other things that I realized that this was the time and it's now or never. And I need to
spend all my time on this to help the ecosystem develop into something that people can adopt and
be successful using.
I mean, it wasn't just Pandas.
There were a lot of other things that needed to fall into place to make it all happen. But Pandas was an important part of the solution.
I see.
That must have been cool to get validation, right?
Even after all this time to have the inbound interests.
Yeah.
I mean, my kind of self-deprecating way of looking at it is that I,
you know, I was in the I was in the right place at the right time. And it's, you know, certainly
more than being in the right place at the right time, like I had to take a lot of actions in order
to make it in order to make it happen. I had to make, you know, personal sacrifices, I sacrificed
my sacrifice, my personal life, like I took time away from friends and family to work on it. I made significant
career diversions because I believed that it would create more interesting opportunities for me in
the future. I could have continued to work in finance and had a very comfortable and lucrative
career working in quant finance. So some of the people that I worked with in those years,
some of them are still involved with the same companies that I worked with in those years, like are still, some of them are still involved with the same, you know, the same companies that I collaborated with in those
days.
And so I could have stayed on that path, but I chose to take a risk and put a lot of energy
into it, basically sweat equity, I suppose.
And I, but I was in a very fortunate situation.
I had no, I had no student loans which is like I
think an under underappreciated uh benefit and that I was able to take a risk and I wasn't I
wasn't digging myself into that much of a financial hole I had some savings from like I had lived very
frugally in my first few years of working and had maybe like you know a year's worth of living
expenses saved up and so I was what I told myself at the time was,
okay, I'm going to work on this full time.
I'll find like a little bit of consulting work on the side
to help pay the bills,
but don't do too much consulting
that I'm not able to spend most of my energy
improving pandas.
And then after a year or so,
I can see where I find myself,
whether this makes sense
or like whether I'm getting the kind of return on my time,
like return on investment that justifies continuing to do this right right by the way i um like that reflection i love your post uh that you wrote when you turned
30 just to like reflect on things um one sentence that stood out to me was like right you talked
about um mit it was more about yeah being smart and then like uh in New
York he was like being wealth uh and then or also in San Francisco like we had a very similar kind
of conversation with Josh Wills uh about that and I thought yeah it was quite cool that you were
like you know given all that let me try to figure out like what do i actually want what makes me happy um i thought that was very impactful yeah i think in you know going through all these different professional
situations and deciding how to spend my time and what to work on i think there there is like an
underlying like search for meaning search for like what like what actually matters to you. Like do you value like recognition or fame or do you value money?
Do you value comfort?
Like what are the underlying motivations,
the things that will make you feel satisfied,
like be happy with your life?
And I think in retrospect, I went a little too far at times
and made some significant, you know,
personal sacrifices.
I will say like in my twenties, my, my personal relationship suffered as a result of my, at
times maniacal focus on working on pandas and working on this project.
Like I have a bit of an obsessive personality and anyone who knows me well is familiar with,
with that side of me.
Like, oh, like Wes has his projects and
sometimes he sometimes the projects become like an obsessive like an obsessive focus and so I think
learning to find some learning to find some balance and the importance of like relationships
and friendships and things like that I think it was it was good that I went through all that I
think it was very helpful personal growth but I've learned about myself that I'm very motivated by very motivated by impact and to be able to have impact
in a sustainable way. But I also have to take care of myself. Like I have to be a, like a happy and
like resilient person. Like if I'm depressed all the time and like, don't have like the, can't
bring myself to care enough, like to start a new project or to like drive drive forward projects like through the uh the tough times because creating open source software
like it's very difficult and for me it's been very emotionally draining because there's a lot of like
you have to soldier through like the dark days of the project where there's not that many people
that care and you have a conviction and a belief that what you're doing is important and is going to have impact but that impact is going to be realized like far into the
future like work that you're doing today you're not going to see the impact of that or feel
recognition or see the value of that work for at least six months like probably even more than that
and so it's like it's very deferred gratification. So you have to tell yourself, okay, this is tough.
Like, gosh, like the build keeps breaking and like, oh, the release is like, oh, this
windows build.
And there was like a dark time when like building stuff on windows is really hard.
And so every time like I would fire up VirtualBox to build windows binaries, I'd be like, oh,
this really sucks.
Like why?
It's like, why must I go through
this misery? And often like that, that, that it's like silent suffering. Cause no, and you can
always tell someone like, you know, like, Oh man, it sucks that I had to like spend four hours,
like fixing the windows build and like getting these binaries out so I could release. And people
would often remind me like, Oh, you, you chose this life. Like if you wanted to, if you want it to be more comfortable or to not have like to be all on your own building or just
like feel chronically like understaffed working on these projects and making them happen, you know,
this was a choice. And I guess it helps to remind yourself that it's like, it's always a choice.
And yeah, if you're not happy with, you know, ultimately happy with what you're doing,
yeah, there's going to be like good days and bad days.
But hopefully you have more good days than bad days.
I don't know.
I think it was like Steve Jobs who said if you have like a certain number of bad days in a row or it doesn't seem like you're not getting any positive feedback, then you should probably, you need to make changes.
Interesting.
Two follow-ups. So one, do you think obsession is an important ingredient to push like projects like these where, like you said, right, it is so hard to have the conviction of like,
it can be I, I mean, for me, that's just my personality. And so I don't know that it's an essential ingredient ingredient. It worked for me, I think that an obsessive personality can also lead to unhealthy behavior.
So earlier in my life, I got involved in video game speed running.
And so we were playing the game GoldenEye 007.
And that is a special kind of obsession to play the same stretch of a video game hundreds of times in a row to try to try to get the fastest fastest time
and like perfect all of the little details in order to set in order to set a record or break
your previous personal best and so i think i fell into those kind of like obsessive patterns and
patterns of self-improvement and uh and efficiency and yeah it's very much like yeah i've been that
way since i was since I was a child.
So not something that I would recommend to everyone. And I don't think it's the only way to do open source successfully.
And particularly now that open source has become a fixture and a strategy for businesses.
So I think the model of like the lone wolf, like obsessive lone wolf hacker working on
their nights and weekends to build a project is more or less going by the wayside. And I think also it's become harder and harder for individuals
to mount successful efforts because we've solved a lot of the easy problems. And so in many cases,
it was like, okay, well, we need just to need an open source solution and an individual can
scrap together an open source solution to this problem
relatively straightforward in a reasonable amount of time. But what if you have a problem that is
much more difficult that requires that needs 50 person years of effort or 100 person years of
effort. So an individual can't possibly do 100, even if they are 10 times more productive
than the next person, or they overwork,
or they work 80 hours a week, or 100 hours a week, maybe they can muster in in one year,
the same amount of work output that somebody else might do in three or four years, but you want to
deliver results on the order of single digit years rather than, you know, 100 years or 25 years or
something like that. So I think as the problems have become
more difficult, it's required a different approach and reject and explicitly rejecting the lone wolf,
like the lone wolf mindset, which was a feature of like the early days of pandas. But I think
there's fewer and fewer projects like that. That being said, like, you know, we have Polars and
Python, which was a, which has been a lone wolf project from Richie Fink until recently.
He founded a company and is now hiring people to help him.
So we still do see successful scenarios like that, but it would be disappointing to me if that was the only way to be successful in open source,
is to engage in this objectively unhealthy behavior. And I think a lot of my, yeah, like I said earlier,
a lot of the stuff that I think I did,
I definitely made a lot of decisions
and worked at the expense of like my mental
and physical health in my 20s.
And so I've had to make it a mindful choice
to reject that and to not continue to do that to myself.
Also, I'm getting older and I can't work long hours
like I used to, and I need sleep,
like, and I have other things that I like doing in life.
So anyway, balance is a good thing.
And so to be able to build important open source projects
while also having balance in your life,
I think is something worth striving for.
So I wanna ask this question,
and this is a recurring theme I've seen.
So aspects of what you said I relate to in terms of sometimes being so narrowly focused
on one problem that you neglect everything else at the cost of your personal life at
times.
And then many folks we've spoken to on the podcast, this theme comes up as like early
in the career, yes, super driven, super focused on this one problem, made a lot of progress,
but then also resulted into self-awareness, which is like, hey, this is not really sustainable.
But that surge in the initial period does result in impact, recognition, or even I would say future opportunities that you weren't thinking about at the time.
At that time, you just wanted to get this thing to work so when this aspect of balance comes in i say that
when i've seen almost this advice consistently that make sure you have that balance so that you
have some extra energy in your pool to do other projects or you're behaving well your personal
life is good but for people who are starting out would you say that they should it's okay to have
that narrow focus yeah imbalance in life it's like
hey that's okay if you don't have let's say for example no student loans to worry about
you don't have family you're responsible for yeah maybe it's okay go crazy well it's it's uh
i mean for yeah it's important to point out like it's things that i did when i was 25 i think
wouldn't be practical for a lot of people like they have they have a family they have a family
to support or maybe they have a family to support
or maybe they have student loans to pay.
Like they have other obligations in their life
that makes it hard for them to work
from 7 p.m. to 1 a.m. every day.
And if you have a demanding job,
then spending time on your nights and weekends,
maybe you need to work a second job to make ends meet.
And so I think fundamentally,
like the early story of open source software, I think part of the reason that the open source
world has significant inclusivity and diversity issues is indeed because open source development
is fundamentally a privileged activity or started out as a very privileged activity.
And so I think what's great now is that large companies have,
and startups and large companies
have made open source
an essential part of their strategy.
Microsoft, from the Steve Ballmer days,
has transformed itself
into being a very open source friendly company.
And Guido van Rossum works at Microsoft,
working on making CPython faster.
And Microsoft has made enormous contributions
to the open source world
and out of like the major tech companies like the Magnificent Seven, I would say that Microsoft is
probably the best place to go and be able to work on open source software for a living.
And so that means that to take the software development, yes, you're giving away software,
building software and giving away for free on the internet. But also it allows people to be able to have more balanced lives, to treat it as a job rather than like
something that's coming at the expense of like your friends and family and like your life outside
of your day job. And so I think that's, it's essential. And I, yeah, I think that it would be
better for the volunteer model of open source to more or less go away because
it's not very sustainable. It leads to significant maintainership problems, maintainer burnout,
particularly when somebody is working on a project outside of a day job or some other
responsibilities that they have. And so it's common that you see maintainers,
volunteer maintainers burnout. So one of the solutions to maintainer burnout is for people to do open source as their job. And yeah, and so I think Linus Torvalds works on the Linux
kernel, has worked on the Linux kernel as his full time job for a long time. And so yeah, I think,
yeah, I recognize like, I did the lone wolf thing, like I did a lot of volunteer. Early days,
eventually, I arranged to get paid to work on open source. And so that's
made things. I've been continuously paid to work on open source projects in the last eight or nine
years. But that was partly a reaction to the open source model. It's like, this is going to cause me
to be burnt out and miserable. And I need to make this my vocation, my profession. And so I've given
a lot of talks and I've given a lot of talks
and I've written a lot about how it is important
for open source to become like a true vocation,
like a job and not something that's like
this privileged activity that people do on their free time.
So great lead into, so the first trap of consulting trap,
can you tell us more about that?
Yeah, so the consulting trap is where you get,
you have an open source projects and
project and you find consulting gigs or consulting projects where you work for a company that's using
the open source project and maybe they partly are paying you to fix bugs and customize the project
for their needs. But what can happen is that you end up spending a lot more time working on
the company's internal software
projects. You become more or less a software developer of that company and your work on the
open source project can become incidental or something that you do on the side. Or ideally,
you would spend 50% of your time working on building custom software, building things for
the company, 50% of the time on the open source project, or even more time on the open source project. But it's not uncommon to see the shift and it being 10, 20%
of your time on the open source project and 80, 90% of your time building custom solutions for
the client. So I've seen that happen a number of times. And so it's, yeah, there's good situations
and bad situations. I've seen very productive open source consulting type relationships.
I think it's gotten easier as time has gone on.
But I think nowadays when a company engages a consultant who is an open source maintainer,
they understand that partly what they're doing is paying this person to work on the open
source project because maintaining it is good for them as well.
But it's still a risk.
And I think it's a trap in the sense that some fraction of the time,
you end up being kind of a substitute,
more or less a fungible employee working within that company.
And the work on the open source project is something that's on the back burner.
Like ways to avoid that trap as someone that's getting started doing that.
Would that be just being very clear about setting time
boundaries and how you should allocate your time in the contract? Yeah, I think it's just being
clear about the expectations and the contract and the statement of work and yeah, setting clear
boundaries. I think, yeah, sometimes, yeah, if people go into the contract with, yeah, just the
kind of, if it's sort of hand wavy, like, yes, yes, like improve, improve the open source project, keep fixed bugs and things like that.
It's easy to underestimate how much time that, how much time that really takes.
And so, yeah, so just, yeah, I think setting those boundaries or the expectations that say, if it's your goal to spend 50% of your time on the project on a steady state, that you have that to carve out and you protect that time.
I think this goes back to making open source your vocation.
That should be a full-time job if you want to do open source.
Going back to the second trap that you talked about, which is the startup trap.
Can you tell us more about that?
Yeah, the startup trap is where you create a company, you raise some venture capital,
and you build a product that is either an explicit
commercialization of the open source project, or you build some kind of a vertical solution that's
powered by the open source project. And so there's a couple of issues that can happen here. So one
issue is where you create a conflict between the needs and the business needs of the startup and the open source project and its user base.
And so that would take the form of, I've seen any number of things from license changes to
holding back features, like basically maintaining a private fork of the project and reserving
like pro features or features that you don't want to release to the open source project because
it will might undermine your edge in your business there can also be governance challenges because
there can be governance challenges because you as a startup you want to be able to move fast you
don't if your goal is to create a healthy relationship with the contributors that are
outside of the company it does create an implicit negotiation with contributors that are not your colleagues. And so what can sometimes happen is that the company will become like a, you know,
pejorative term would be like a backroom call. So they communicate in private, they decide to
make changes, and then they push through, and they push through changes in the project without
getting the buy-in and convincing the other maintainers. And so the other contributors might feel demotivated because they feel like second class citizens
if they're not working at the startup that is commercializing the open source project.
Another thing that can happen that is also very common is that the investors in the startup
can take operational control of the company as a result of firing
CEO or the founders losing board control. And that may lead to a shift of shifting of budget
and more or less like developers being laid off or reallocated to work on other parts of the
company that are deemed to be more in line with generating a return on
investment for the investors. And so sometimes you can see like, okay, the company is really
engaged in this project. And then at some point there's a shift of, there's a leadership change,
or there's some other shift in the company status. And then the developers just disappear. And it's
like, well, my boss says i have
to work on something else and so suddenly like you're no longer getting paid effectively to work
on the open source project so sudden suddenly getting like defunded to to work that can
definitely happen and relatedly i mean projects can also be dependent on development infrastructure
provided by a company and so that that can create another source of risk that if that suddenly disappears, then yeah.
So anyway, we've seen all of these things
and this is one of the issues
that causes communities to fork.
Like if they don't, if they, you know,
if they like this, like a fork,
like this happened with Presto,
like the SQL engine.
So there was the fork to PrestoDB and Trino.
And this wasn't a startup issue per se,
but it was provoked by, my understanding,
it was provoked in part by a governance conflict
between Meta, Facebook, and the open source community
of developers working on the project
who did not work at Facebook at the time.
Yeah, that was interesting to see, by the way, to actually see Presto being forked to Trino. I read
the post, I think, at five-year anniversary for Trino. They wrote about some of this historical
context and how Trino came to be. And this was one of the things they highlighted there. Like,
these are the reasons for actually doing a fork. And if you look at things right now, at least I
know at LinkedIn, we used to use Presto very heavily.
But since this fork,
over the last, I want to say,
at least three plus years,
we have been mostly using Trino.
I shouldn't say completely,
but most of it,
like a lot bigger part of our infrastructure
is moving there.
And you see that community shifting
over to Trino as well.
I was following that space for a little while,
so I saw some of this shift at the time.
Okay, the third trap, the corporate user trap.
Can you tell us more about that?
Yes, like the big company trap.
That is similar.
I think there's similar think what you see there is that, that it's easier for
developers to get to, to shift around or get moved off of a project. And so developers shift in and
out of working on, like, I was just looking at some component in a Microsoft open source project,
and there was a developer who just left Microsoft. And so essentially this did disappear from the,
from the project. And so I guess this could happen with developers,
a developer working at a startup that's working on an open source project,
but particularly in big companies,
priorities and budgets can change
on a quarterly to annual basis.
And so this can,
and some companies are notorious
for their priorities shifting
or being somewhat flippant.
Especially in this environment, for sure.
Yeah.
And so whenever a project becomes too dependent
on the generosity of a particular big company,
that can also become a source of risk
because you're dependent on having the support
of a particular vice president or senior vice president
who believes that the project is important,
something important for the company to to be maintaining and contributing to but that that
could change based on the vicissitudes of the company and its quarterly performance and things
like that so and yeah and then you also see some of the some of the government some of the
governance conflicts where decisions are how decisions decisions are getting made. Like there's product managers involved and like other corporate apparatchiks. And so, yeah, it's again, open,
like big corporate open source can be done well. I mean, look, I think Microsoft has done,
has done an outstanding job, but we've seen plenty of scenarios where, where things have gone,
things have gone the other way. And I mean, look at, I think if you look at the MySQL, MariaDB,
there was a community fork in part because of bristling or challenges
working with Oracle, I think, right?
Oracle, yeah, Oracle, MySQL.
And so it's a very common story,
and in particular when an open source project is part of some product line
or is related to some profit center of the business.
Ultimately, corporations have, in most cases, have a primary obligation to their shareholders.
And so, yeah, that can easily come in conflict with the needs of the open source community.
I think over time, as you mentioned, like this idea of having lone wolves working on an open source project is changing with a bunch of companies doing open source.
And in many cases, I think the successful open source projects you see are not the ones which have only one company behind it.
The ones which have multiple big companies behind it because not one company will have dominance or won't be able to govern the entire project themselves.
It becomes more of a community thing.
And you're not dependent on only one company at that point.
And this is something we see very commonly in many of the cloud offerings that companies
build on top of.
So like the open source products that companies build cloud offerings on top of, where multiple
companies are incentivized to improve that offering, for example.
And that essentially translates to some
of the things they're offering as a cloud.
But it's not necessarily true everywhere.
Like I don't know if you saw this recently in the XZ compression library, there was this
backdoor injected where this person did like social engineering for, I don't know, three
years or something like that.
I might be misremembering that.
Yeah.
Maybe two years.
Yeah.
Yeah. like that i might be misremembering that but yeah maybe two years yeah yeah i think the xe uh
lib lzma thing that was you know the level of sophistication it must have been a it must have
been a state actor like a whole a whole shop of black hat security hackers creating obfuscated
back doors into you know a important kind of component in the linux in the linux supply chain
i think one thing i yeah I guess we didn't really mention
is how the at times predatory relationship
between the major cloud vendors and open source projects,
and that's precipitated license changes
and like the anti-AWS licenses,
like source available licenses.
Like you may do anything you want,
but with this open source project,
you can except operate a cloud service in a company with more than $10 billion a year in revenue.
And so there's only like a handful of companies that-
Whose name starts with A.
Yeah.
So I think this has like the corporate part kind of has like a cool tie-in to kind of
your decision of leaving and joining
Posit, but before getting into that, I want to kind of rebuy rewind a little
bit to the, uh, the startup trap.
And so you founded, um, or what led to Voltron data given sort of these challenges?
Like how did you deal with it when you were starting a company?
Yeah.
So we worked on, so we created Ursa labs in 2018, which was a not-for-profit
development group that was funded by RStudio, which is now Posit, and Two Sigma and NVIDIA,
Intel, like some other financial firms like Bloomberg. So I wanted to create like a non-profit
industry consortium to fund aero development. And that was going great for a couple of years.
And we were seeing significant demand to put a lot more firepower into the aero ecosystem
and companies that were interested in having support, like a formal relationship,
development relationship with a company behind the aero ecosystem. And so it was an interesting challenge
to set up a company to create, pursue a product vision,
but to create guardrails
and to have that startup trap in mind.
How could we build an open source team
that's driving forward progress in the Arrow ecosystem
and some of the peripheral projects
while at the same time having investors
and doing
enterprise product development. And so I think partly it helped that when we created Voltron
data that we had very clear expectations with our investors that open source was a huge dimension of
how the company would be successful over time, that creating open standards and protocols
and building this
open source composable data stack was an essential aspect of how we would be successful. And so for
people who are not aware, like what the company is doing is while we do enterprise support and
open source partnerships for the Arrow ecosystem, but the company also builds a accelerator native,
like GPU accelerated execution engine, which can be incorporated into different data processing systems to essentially enable modular GPU acceleration.
And it's all arrow based.
And so it's something that needs to be able to plug into all of these different systems. And so to develop these open source projects and standards and protocols to
make that all work seamlessly is an essential aspect of how that will succeed. So getting that
buy-in from investors, I think, helped us avoid the startup trap. And the company has a team of
20-some developers who are largely working full-time on open source. And it's over a period
of many years. So to be able to invest decades of person years
in the open source ecosystem has been a game changer for Arrow.
In this case, you mentioned this company was not for profit?
No, this was Ursa Labs.
Yeah, so Ursa Labs were functionally like a satellite of RStudio Posit.
So we operated independently.
They handled the back office,
like payroll, health insurance for us-based employees, things like that. And, uh, and so
in 2020, we spun out from, we spun out from, from our studio to create Ursa computing. And we raised
a venture round in August, 2020. And then at the, at the beginning of 2021, we, we joined up with
the leadership from Rapids and, and Blazing Sequel to sort of mash everything together.
And we created a new brand identity Voltron Data. And, and then we, we raised more money for,
for Voltron Data, a couple rounds, kind of one and seed round in 2021, supersede, seed two, I guess,
as we'd raised a seed for URSA computing,
and then a series A in January, 2022.
I see, got it.
Yeah, the reason I was asking is because from a
not-for-profit perspective, like if that is the case,
then it might become harder to hire engineers because
at some point you have to figure out compensation
for people working on this.
And if it's not competitive enough as compared to
other companies, for example, then you don't have the right quality of engineers working on the
problem. Yeah, that's true. And that was, I think that that was indeed a challenge in the, in the
Ursa Labs era that, that there, there were really talented engineers that I was interested in,
in hiring to work, work full time in, in thetime in the Aero ecosystem
and simply because of the economics of Ursa Labs,
like the funding model and what we could afford to pay
in terms of salaries and there was no, you know,
really no equity to offer because it was, you know,
a not-for-profit endeavor.
And so, you know, I think we had a great team,
but to be able to scale up and also to hire, you know, I think we had a great team, but to be able to scale up and also to hire, you know, to hire people who could easily go work for, you know, the big tech companies or Google and make a lot more money.
So I think that that was partly that was partly the motivation, not only to have have a larger team to be able to put more resources objectively into into aero development, but also to be able to hire individuals that have a lot of, a lot of career opportunity. So, um, I guess historically,
would it be fair to say that one of the cons has been right, uh, compensation since you can't offer
stock, but in terms of pros, in addition to the mission, um, the flexibility, right. In terms of
location, because I feel like a lot of the great people
that have helped me, I think when I was first getting into Kubernetes, like there were like
two people that really helped me out. And then they were just like living in the middle of nowhere
in the States, where I imagine, I guess, at least a few years back, it would have been difficult to
kind of go to like a bigger tech if you want to have that lifestyle. But then I guess that's also changing now
since companies are more open to remote.
Would that be fair to say?
That's, yeah, that's definitely true.
I think COVID definitely helped with changing culture
as far as like hybrid, you know, hybrid and remote.
But yeah, I've been, you know, working in a remote,
remote only capacity for, you know, the last, yeah, six years or so.
And, you know, it has its pros and cons.
But for, you know, for open source development, it's ideal because you can hire people where they are.
I've worked with, like, a lot of people in Europe.
And I think, you know, Europe is really friendly for open source developers because health insurance is separate from employment.
And so if you are in between maybe in between full time jobs and you want to pick up like a contract to do some open source development, that's something that you can do without putting your family's health at risk.
Whereas in the United States, I think there's definitely, there's definitely a psychological burden of losing continuity of healthcare coverage. And that does,
you know, lead people to, to, you know, to not not not make decisions like that. And so having
managed a global, you know, global workforce, you know, people around the world and different
countries, so I've gotten to see like the different like the psychological impact of that you know yeah that the health insurance question has on people so I
think open source will be much better off if everyone had had you know at
least a guaranteed level of you know a basic health care right right interesting
and just going back a little bit so about Voltron data so you mentioned you
were able to avoid
some of the startup trap
when you were being very clear
and with the venture funding.
How did you, so like being very specific.
We've avoided it for now.
Yeah, maybe not forever,
but we're really doing our best
and we wish to be good stewards
in the open source projects
that we're involved in.
And I think by choosing investors that understand that as well,
I think is part of ensuring that that will remain the case.
Right. So being specific about how do you deal with...
Because one of the issues I saw to me that's like, oh yeah, that is very hard,
is how do you balance like which features to, um, uh, to open source versus
what to keep for your enterprise, um, version?
Like, how did you guys go about making those decisions at Voltron Data?
Well, at Voltron Data, I mean, anything related to core arrow or anything that is like projects
that we want people to, uh, like interfaces protocols, interfaces, protocols, like we've developed, been working a
lot on database, better database connectivity, like ADBC, which is the Arrow Database Connectivity
API standard, and FlightSQL, which is a wire protocol for databases to offer SQL support.
And then we've, you know, gone and partnered with, you know, partnered with Snowflake, for example, to integrate that into their drivers and to make AeroNative connectivity work better for Snowflake users.
And so there, all of the pieces of technology related to that need to be fully open source.
And so there's nothing that's held back um i think the company's main product uh theseus
it's a gpu accelerated uh modular execution engine um i think there's there's very clear separation
between like this this um this system that runs on a rack of you know rack of uh you know a100 or
h100 gpus it requires kubernetes like it it requires Kubernetes, it requires basically an enterprise data center type
setup to use. And so it's a pretty clear delineation between software that's involved
with building and operating Theseus and also the types types of users, uh, like you need to have
certain types of hardware available to use the system at all. And so I think at least at the
moment, um, it, it's not open source. It may be that it becomes source available or in some
capacity in the future. Um, I, you know, I, it's hard for me to predict and, and, you know, but
that there's, you know, it's, it's a, it's a specialized product for organizations that have very large data sets and like the over 10 terabyte type data sets where you can get 10x or 50x performance improvement or efficiency improvement by using racks of GPUs to do that processing. Or maybe you've got a data center,
like you've built a sort of infrastructure
for doing LLMs and machine learning,
and you wish to also be able to do your analytics
and future engineering directly on that hardware
so you can shorten the whole pipeline,
run it on one sort of consistent set of hardware
and get a lot better performance that way.
So in a sense, like the kind of the market or the user base for that type of system is
a lot narrower than say, you know, say PyArrow, which is, you know, a Python library and has,
you know, millions of users and downstream and, you know, tons of downstream projects
that depend on it.
So yeah, so I think ultimately it comes down to a question of like, who is the audience?
Like who are the potential users? Are people is something like a project you intend people to build other open source projects on top of? Or is it like a solution kind of an end solution in and of itself? are a lot less sensitive to copy left licenses like the GPL
because in a sense,
like the development environment is itself,
it is in self an end.
So you could build extensions to it,
but people don't really need to depend on the project
in the kind of sense that you would like a project
like NumPy or Pandas where like this is,
these are projects are like essential library dependencies of building something else and so
if they were if they had gpl licenses that would uh constrain constrain use and you know the same
logic applies to close for software um so it's like you know it's like what you know what
aspirations do you have for a piece of software and so i think and so you know like, you know, it's like what, you know, what aspirations do you have for a piece of software? And so I think in so, you know, when we, you know, kind of made all of our early decisions and, you know, what to build and, you know, licensing and things like that with Voltron data.
Ultimately, like our decisions were about like, how do we how do we grow the composable data stack happen faster,
enable these modular pieces, modularization,
like what open standards are missing?
How do we design those open standards?
How do we build libraries to make it easier for people to use them?
And so that's, you know, so there's, yeah.
So we've been very busy with a lot of things from Substrate to Arrow to these Arrow kind of new protocol projects, IBIS for Python.
Yeah, our open source footprint of the company is pretty significant.
Nice, nice. like as open source become more critical components of any software business do you
envision like innovations in terms of funding models so like patreon but like a very like
i don't know right like something that makes it like uh that's kind of opens up to a new traps
but like uh that's a bit different from like what we've seen before
yeah i think that's something we didn't really get into is like other other funding models people have had for you know for open source um but there's open collective there's there's patreon
there's github sponsors there's a new uh there's a new platform called polar
um which is kind of like a GitHub sponsors
or Patreon alternative.
And there have been a number of developers
that have been able to successfully support themselves
and get a lot of sponsorship these ways.
It can be hard to get big dollars
to be able to pay a full-time full-time team of developers.
But, but, um, in a number of cases you have individual project maintainers that are able to
support themselves as individuals, if they have like a prominent, prominent enough role in the
project. I think what, partly what they've been doing is, is monetizing access to themselves,
either creating like exclusive content or like having a private Slack channel
or a private Discord,
where if you're a sponsor or a patron of the project
that you get exclusive access to talk
with the developer about your needs
versus like, you know, if you're on GitHub,
like anyone in the world can open an issue
and bother you at any time of day or night.
It's like getting ahead of the line.
Yeah, so there have been some successful examples
of doing that.
And I think that these models like Open Collective
and like crowdfunding or, you know,
crowdfunding platforms for open source support,
they're definitely very helpful.
And we didn't have them.
We didn't have them a decade ago.
So it's been a big improvement
for projects that have been able
to make it work.
So a bit of a hard pivot, I guess.
So you recently became a general partner
at Compose Ventures, which does early stage investing in data infra and AI companies.
I feel like throughout your career, like, right, you're like really good at picking up new skills, like, you know, open source, writing a book, building a company.
So I imagine venture capital is not like a new skill that you're like building so i'm kind of curious like what's like generally what's the approach that you use to like learning new skills that
like you've developed over time and how you're applying that to a venture
yeah so i so there are a couple of things there so i mean as we um as we built out the era project
and made it more successful, people started reaching out to
me to get feedback on projects that they were thinking about founding or new projects that
they're working on. And just for getting my advice on technical matters or asking me for favors for
other things. And at some point I started asking them if they, you know, would let me invest in their funding rounds, whether their friends and family round or their seed round or something like that.
And I was never investing a lot of money, but, you know, it was just a way for me to be involved and to have some skin in the game to, you know, help, help, you know, help, help, help these companies be, be successful. And,
you know, as time, as time went on and the, and the ecosystem has gotten a lot bigger,
I think there's a couple of things. So partly I wanted to be able to,
so I, some investors got in touch with me and wanted, wanted me to invest some of their money
on, on their behalf and into the types of investments
that I'd made in the past
since I have an interesting network
and can get in touch with companies
maybe before when they're raising
only a small amount of money
before they go to raise like a larger round.
And I also wanted to create more content
and messaging around the super trend of these composable
data systems and the composable data sack like what we've seen with you know modular acceleration
as similar to what similar to what we're doing with voltron data but also we're seeing modular
acceleration projects out of out of meta and out of out of apple and um and different things there. And so basically what's happened
is that people are building new versions of old products,
but with these really high quality off the shelf,
open source components.
And that's in a sense like what we wanted to happen.
And so the fund gives me a way to invest in those companies,
but also to create more awareness of
this, this trend that is taking place with all these different companies, which are building on
building on arrow. We're building on some arrow offshoot projects like data fusion, which is a
rust, um, based, um, embeddable query engine, a modular query engine, um, building on duck DB or
things like that. Cause we've worked very hard to enable all these pieces to exist and to make them fit
together nicely.
And so it seems like a healthy sort of ecosystem shift that's taking place.
And so this gives me a way to be involved with founders and to, uh, uh, to help companies, um, you know, uh, get off the ground. Um, but
also for, for people to be like aware of like, you know, different people working on different
approaches to solving, you know, old problems with these new kind of, uh, um, open source tools.
I see. How do you like it so far? I mean, my goal is for it to not become a full-time job.
I, uh, so it's something that I'm doing part-time and, you know, my, my full-time job. So it's something that I'm doing part-time.
It's, you know, my full-time engagement is with,
full-time engagement was with Posit.
I still, you know, I'm an advisor at Voltron Data.
I advise a couple of other companies,
LanceDB, Union.ai.
So I have, you know, kind of one leg
in the startup venture world. And then, you know, one I have, you know, kind of one leg in, in the startup venture world. And then,
you know, one leg is, you know, as a software architect at, at, at Posit. But yeah, I, so I,
I, I've enjoyed it so far. You know, the fund just, just started in, in January. And so
I've made a couple of, you know, the first couple of investments, but, uh, it's, uh, yeah,
I, I, I don't currently have plans to become a full-time investor or to raise a large,
like a large fund, but, uh, to have a small fund that enables me to, you know, write, um, you know,
medium sized, like angel checks or like super angel type type checks, um, and be helpful to
founders. Uh, yeah, I think it gives me a meaningful way to be involved and type checks, um, and be helpful to founders. Uh, yeah, I think gives me
a meaningful way to be involved and, uh, um, you know, and yeah, I, I maybe, you know, maybe the
investments will make, maybe the investments will make money, but I'm, I'm not doing it as like a,
you know, as a way to become, um, you know, strictly, I would clearly like, you know,
I'm putting, you know, putting capital at risk. And so I hope that, that, you know, the investments
will make, you know, as much or more money much or more money than buying real estate or investing in the stock market.
But my primary goal is I wish to accelerate innovation in the space and help people succeed.
Nice.
So I did a little incubator a few years back.
Had a terrible idea.
So I feel like I'm qualified to ask this question.
So let's
not pick on the worst idea you've heard but what's the second worst idea that someone's pitched you
the worst idea that somebody has pitched to me the second worst but the second worst
well i have a hard time sort of remembering but yeah probably wouldn't be appropriate for me to
share sorry i don't want to hurt anybody's feelings.
Sorry, sorry about that.
I tried to, you know, my bad.
No, that's fine.
I'm actually going to ask you, mention this new trend, which is composable data sacks.
So I work more on the compute infrastructure side, less on the data infrastructure side.
I just know a little bit about the data space.
And historically, I've seen a lot of these projects being open source all the way from like storage data
like Hadoop, for example, or processing layers like Spark
and then streaming layers like Link.
And then you look at data formats,
protobufs and thrift, whatnot.
Apache Arrow is another example.
When you see data scientists wrangling data,
they use Pandas, NumPy.
So in my mind, data stacks have been composable, but I'm not sure
what you mean by the new trend. So it would be good if you could
describe what you were referring to.
Yeah, so the, the general idea is that is is building a
building a system, while making use of as many, you know, open
standards or protocols
for different layers of the stack.
So, for example, at your storage layer,
projects like Parquet and Iceberg.
So Iceberg is an open standard
for kind of an open source data lake format
that's interoperable across many different execution engines.
Parquet, an open standard for file format for, for analytic data storage. Um, there's
execution engines, which, um, can be, um, the goal ultimately is to be able to hot swap or,
uh, to be able to, uh, sort of, uh, hand off work, um, like choose which execution engine to use
based on like what will deliver the best performance
or the best efficiency for a certain workload.
And so to be at the query optimization level
or the user interface level,
if your user interface and your query optimizer
is loosely coupled to the execution engine
and to the storage,
this enables you to make a different decision
about which engine to use
and kind of other decisions about...
You can also incrementally make improvements to the stack
or incorporate new components
in a way that's less disruptive.
And so it's challenging right now
because I think some of these, these things are still,
um, in, in their early days, but they, um, they're, they're, you know, rapidly developing.
And, um, and so, you know, it's our hope that, you know, kind of in the coming years that,
that, uh, that it will be a little, that building systems like this will be a bit less bleeding edge
and like a more obvious and like the, the, what's considered to be the best like the best choice uh for how to build new new
data systems makes sense it sounds like uh even open source projects have this thing of buy-in
in a way it's like uh sorry lock-in in a way it's like yes you can change it but changing is super
expensive what it sounds like is these modular systems can make it easier for you to swap one out versus the other. Right. That's right. Makes sense. Well, Wes, this has been an awesome chat.
Thank you so much for taking the time today. We learned a lot through this conversation,
and I'm sure our listeners will too. Thank you so much for joining the show.
Yeah. Thanks for having me. I enjoyed it. Awesome. Thanks a lot.
Hey, thank you so much for listening to the show.
You can subscribe wherever you get your podcasts and learn more about us at softwaremisadventures.com.
You can also write to us at hello at softwaremisadventures.com.
We would love to hear from you.
Until next time, take care.