Software Misadventures - Open sourcing LinkedIn's Derived Data Platform | Felix GV (LinkedIn)
Episode Date: November 28, 2023What's it like to open source an internal project at a big tech company like LinkedIn? When should a company open source a project and what are the benefits and challenges that come along with it? If ...you want to open source an internal project, how should you go about advocating for it? Félix is a Principal Staff Engineer at LinkedIn where he works on the data infrastructure team that builds Venice. Venice is a distributed derived data store which LinkedIn open sourced in the fall of 2022. He joins the show to chat about his experiences leading the open source efforts for Venice, as well as his thoughts on balancing leadership with execution, delegating responsibility and fostering a culture of ownership, and growth within a team. --- Show Notes: Check out Venice: https://github.com/linkedin/venice Félix's linkedin: https://github.com/linkedin/venice --- Stay in Touch: ✉️ Subscribe to our newsletter: https://softwaremisadventures.com 👋 Let us know who we should talk to next! hello@softwaremisadventures.com --- Segments: [0:01:36] Introduction [0:02:32] Career Choices and Job Satisfaction [0:08:34] Understanding Venice: LinkedIn's Distributed Derived Data Store [0:22:37] The Journey of Open-Sourcing Venice [0:26:36] Understanding the Business Perspective of Open Source Systems [0:30:28] How and when to advocate for open-sourcing an internal project [0:39:32] Challenges and Strategies in Open Source Project Maintenance [0:46:40] Balancing Leadership and Execution in Engineering Roles
Transcript
Discussion (0)
So I think what most people assume to be the benefit of open source is the contributions
you get from the community.
But I think in practice, it takes a very long time before contributions start coming in.
And when the contributions come in, they take effort from your end to review and steer and
so on. For me personally, the way I see it is open source
is an incentive for people to do their best work because it's in the open. It's sort of a
public portfolio. In open source, I think there's a bit more of a built-in incentive to have a
higher level of craft because people are maybe a bit more
self-conscious, right? That their code is out there in the open. The other thing is in order
to make a system be useful across many different organizations, many companies, it requires a
certain level of modularity and versatility and quality that is not necessarily going to be
present in a proprietary system. Welcome to the Software Misadventures podcast.
We are your hosts, Ronak and Gwan. As engineers, we are interested in not just the technologies,
but the people and the stories behind them. So on this show, we try to scratch our own edge by sitting down with engineers, founders, and investors
to chat about their path, lessons they have learned, and of course, the misadventures
along the way.
Hello, everyone. This is Rodan Athani, and welcome to another episode of the Software
Misadventures podcast. Our guest in this episode is Felix Givi. Felix is a principal engineer at
LinkedIn where he works on the data infrastructure team that builds Venice. Venice is a distributed
derived data store which LinkedIn opens sourced in the fall of 2022. In this episode, we discuss
what Venice is, how it's used for various applications, and the process of open sourcing it.
We talk about when a company should open source a project and what are the benefits and challenges
that come along with it.
We also discuss when and how engineers should advocate for open sourcing a project.
As a principal engineer, Felix also shares his thoughts on balancing leadership with
execution, delegating responsibility, and fostering a culture of ownership and growth
within the team.
Please enjoy this fun conversation with Felix Givi.
Felix, super excited to have you with us today. Welcome to the show.
Hi, thanks for having me.
So we thought we would start with asking you about something that I saw on your LinkedIn profile.
So in the about section, you have this line which says,
we'll consider opportunities to work as an astronaut or to research a cure for type 1 diabetes mellitus.
Don't bother with any other job offers.
What prompted this thing on your LinkedIn profile?
Yeah, I'm pretty happy with my job right now working on Venice at LinkedIn.
I've never really been the type to just job hop for, you know, an extra 10K or whatever, it doesn't really enter
my compass, really. I like to stick to something good when I got it. And there are very few things
that would make me consider doing something else. And I have become diabetic eight or nine years
ago, maybe 10 years ago. And so that has been something that is very
interesting to me to learn more about. And it is a kind of an unsolved problem in a sense. It is
a disease that is treated, but which has no cure. So yeah, if I had a chance to work on that, sure,
I would consider it. And of course, you know, being an astronaut sounds cool, so why not?
But besides that, I'm pretty happy with where I'm at.
And I hope my manager doesn't hear this because maybe they'll get some ideas.
But yeah, anyway, that's roughly the rationale behind it.
So you mentioned that the job hopping just for some little higher base salary is not something that is something that you consider as much.
So when thinking about a job,
and you have been at LinkedIn for I think about eight, 10 years at this point, since 2014,
since 2014. So about nine years now, I'm sure there have been times when you've thought about,
hey, maybe trying to do something else. But when you've thought about just working at LinkedIn,
or just finding another job, like, how do you think about job in that case? Like, what aspects
are important to work at a company for you?
For me, I like to be challenged. I like to keep learning. And I've been fortunate to discover the
field of distributed systems and big data and stuff like that, which I think is one of the
fields that has both a lot of depth and breadth. So in my opinion, you can stick around in that
field and do a different job every year, right? And still be doing the same job in a sense,
but there are so many different aspects to it. So these days, for example, I'm very interested in
getting into the nitty gritty details of performance. But before that, there were
other phases like working on scalability,
stability, and of course, all of these things are constantly being mixed together on a day-to-day
basis. So it keeps me always on the edge of my seat, especially with the growth that we need to
deal with. Things are never standing still. So for me, it's been a really interesting journey. And I think I
can relate to people who maybe have the same inclination as me to want to keep learning and
challenge themselves and that they do that via job hopping. But I think for infrastructure in
particular, the projects just operate on a longer timeframe, right? If you're working on
the product side, then I think it is a much more fast-paced thing where you build something,
it's in production for a year and a half, and then you're building the next thing and the old
thing gets killed. And you're just always renewing what's on the site. Whereas for infrastructure,
you put something out, you're still supporting it five years
later.
And so it's a very different mindset, I think.
And if you do infrastructure in terms of like two year stints in a bunch of different places,
then I don't think you get the most out of that field, in my opinion, because you're
barely scratching the surface and you're already bouncing off to something else. So it's sort of kind of like drive-by coding, right? Like, oh,
well, I did something. And before I get to see the gnarly details of how my thing doesn't scale,
I'll be out onto the next adventure. It's fine for some people, but for me, it's just not very
aligned with my own view of the work. I would agree with that.
So you made a post a few weeks ago, sorry, a bit of hard pivot,
about how watching the Smurfs helped your kids be less shy about speaking Spanish.
And then we were doing some LinkedIn stalking.
And so you speak natively, so English, obviously, French, as well as Spanish.
And you decided to speak to your child only in Spanish. Like,
how is that working well so far? Yeah, yeah, it's working great. So my mother is Quebecois.
So I learned French from her and my father is Argentinian. So I learned Spanish from him.
And obviously I learned English because everybody learns English these days.
I feel like I've been talking to you for, you for you know five minutes and i think with that you know we're
already friends so i feel like i can make the joke so when you speak french like you won't be
people won't be able to understand you in france and then when you speak spanish with the argentine
accent people won't be able to understand you in lat America. Is that what you're telling me? Yeah, basically, yeah. Yeah, I mean, the people from France don't understand the Quebecois accent,
although we understand them. So it's very much a one-way street. But the Argentine accent has
some little quirks, but I think it's probably closer, I would think, to the rest of the Latin
American Spanish accents.
Sorry, maybe that was more commentary on my bad Spanish.
Because when I was there for two months, man, the first few days was rough.
I was like, what is this?
Well, from my perspective, the odd one out in Spanish are the people from Spain.
Because their S's are like this.
And it's very different.
And they're the minority, population-wise.
So yeah, we speak various languages at home, and so yeah, it's working out great.
So doing a hard pivot back to the world of data infrastructure and distributed systems,
so you mentioned Venice there. Can you tell us more about what Venice is and what problem does it actually solve?
Yeah, sure. So Venice is what we call a derived data store. So the derived data is what we define
as data that's been processed out of other data, right? So there's a bunch of different kinds of
derived data. I did a talk on what is derived data at QCon in London.
So you could check that out if you want to learn more.
One of the kinds of derived data is ML feature data, right?
So this could be like embeddings or it could be essentially it's floating point values of one kind or another that are used as input to various machine learning
models, right? So this can be the output of some machine learning training jobs. And Venice is
particularly focused on catering to that segment where the AI engineer is going to produce a bunch
of data. Maybe they'll refresh their entire corpus every
day or a few times a day, or they'll be doing it in streaming fashion. And all of that gets
pumped into Venice at a very high throughput of writes. So that is one of the defining
characteristics. It supports very high throughput of ingestion. And then the data is going to be read by online applications that do ML inference, basically.
Right?
So these are sort of the two of the activities of the ML engineer or the ML workflow is that there's training and then there's inference.
And Venice sort of sits in the middle, right?
It bridges the gap between the offline environment where the training
is happening and the online environment where the inference is happening. That's sort of where
Venice sits in our ecosystem at LinkedIn. So this question would come across and it will show
how much I don't know about the field of machine learning. So I've heard this term these days called a vector database.
Is Venice pretty much a vector database of sorts? Is that a fair description?
Yes and no. So we started working on Venice, I think, way before vector databases were
part of the vernacular. I would say they are some limited support for the types of operations
vector databases do. So in Venice, there are a few different APIs. You can interact with it as
if it were a key value store. That's what most people do. But there are also some more advanced
operations where you can push down some vector math into the store, into the backend.
So things like doing a dot product or a cosine similarity, these types of vector math operations
can be pushed down and so they're performed more efficiently because they're co-located
with the data, right?
So that I think is a bit closer to the concept of the vector database.
But nowadays, the vector databases that are coming out
have way more functionality that they do as well.
So yeah, Venice doesn't have as wide a feature set
as some of the other vector databases.
It might go there in the future.
I don't know.
But yeah, hopefully that helps place things
in context to one another.
For sure.
So, UANIS is an open source system
and we want to talk about the entire journey
of getting to open source UANIS.
Before we go there,
LinkedIn also had a distributed data store
called Voldemort before this.
So did Venice just replace Voldemort in our case?
Yeah, so when I joined LinkedIn, I joined the Voldemort team.
Voldemort has kind of an interesting history in and of itself.
If you want like the two-minute version, essentially, at first there was only Oracle, but Oracle didn't scale. So then the myth,
which may be true or not, I'm not sure, is that Jay Kreps read the Dynamo paper from Amazon
and decided to build an open source version of it on the Caltrain. And so he hacked that,
you know, as a side project, and then it became kind of a thing of its own.
And then at that point, there were two
choices, just Oracle and Voldemort. If you needed scale, you would use Voldemort, but the feature
set was quite limited, kind of a key value store. And then everything started getting built on top
of Voldemort. You had search, graph, OLAP, essentially cubing, and then a bunch like
counter services. And of course course the output of ml jobs
was one of those things so all of that was getting pumped into voldemort and then
over time we started specializing more so the graph stuff went into another system specialized
for that the search stuff went into galeen and the OLAP stuff went into Pino. The
counter stuff as well went into Pino. And then what was left after everybody had chipped a part
of it was what we called Voldemort read-only, which was the act of batch pushing massive
data sets every day that were immutable. What we saw at the time was that stream processing was
up and coming, right? And we thought that was the next big thing. But when you did something with
your stream processing, the output of that was like a Kafka topic or something like that. And
then that was not really amenable to bringing it into the online experience, right?
And so we saw that the engineers were very productive with Voldemort, where they could
do their offline jobs, do very complex things with, you know, at the time it was like Pig
and Hive, but then later on Spark and other things.
And they could do whatever they wanted there and very easily push the data into Voldemort
and have it be servable online.
And so with Venice,
we wanted to kind of bridge that gap
for streams as well, right?
So essentially we wanted to do
what Voldemort did for offline.
We wanted Venice to do that for streams
and also do it for offline as well, right?
So essentially kind of a
different scope. And so that's how we got started on Venice. And then eventually we were running
both systems in parallel for a few years. And eventually we decommissioned Voldemort in 2018,
just in the nick of time for the GDPR deadline, basically. So that was the forcing function to get Venice to full scale.
Yeah.
So that's kind of the brief history
of data infrared LinkedIn.
One quick follow-up.
So when you mentioned
kind of merging the real-time
and the offline sort of
like feature engineering, right?
So getting your features in.
I remember when I was working on this
at a small startup,
so we use Redis,
which I guess also QVal store.
And then I remember the challenge part
being that it was kind of this catch-up game, right?
Where the offline stuff takes a very long time to run.
So then, you know,
it's almost like a Lambda architecture thing
where you have basically the job
takes in like 10 hours and then you're basically doing like the real-time stuff for like
the streaming for like the nine hours or to kind of bridge the gap so that the data is not stale
is that a lot of the use cases when you say like kind of merging the two or is it are they like
just different types of features yeah so so Lambda architecture is one of the workloads we support
in Venice. In the traditional or maybe I should say original presentation of the Lambda architecture,
it was presented as like two completely different systems, right? You had your batch job pushing
into a batch friendly store, like maybe Voldemort read only or something else.
And then you had a stream processing job pushing into like more of an online kind of store that
can be mutated one row at a time, which was sometimes referred to as the speed layer.
And then the online application needed to read both and it needed to do some kind of reconciliation of the two, right?
Like if I get the record only in stream, then I return that.
But if I get the record in both, then maybe I need to check like some timestamp or whatever and decide, right?
So that was the original Lambda architecture. and when designing venice we wanted to cater to that but we thought the this original design of
lambda architecture was not that great because there were lots of moving parts and especially
in the online application because you needed to interact with two systems you were bound to suffer
from the weakest link in the chain right so your latency was going to be the
slowest of both and your reliability was going to be the worst of both so with venice we are pushing
the merging part upstream right instead of happening in the online path at read time it's happening in the ingestion path at write time so venice is a a single database
that will ingest batch input and ingest streaming input and it orchestrates the merging of these two
so that when the read comes in it's just a single request that reads the like the outcome essentially
of this merging so that's kind of the philosophy
we went with. I see. And then the trade-off there is that even though there might be some
latents or staleness to the data, but the performance is much better. Yeah, yeah, exactly.
So there is staleness kind of built into the architecture. Everything is eventual, but we
thought that was not a big deal
because the output of the batch processing
and the output of the stream processing
are anyway stale in any case, right?
Yeah, that makes sense.
So it doesn't really matter that the last mile,
like between the stream processor
and the online key value store,
if that part is strongly consistent,
it doesn't really matter
because everything that
came before that is stale anyway. So we can kind of do the design trade-off of extending the
staleness end-to-end into the architecture. And by doing that, we achieve much better ingestion
throughput and we didn't really lose anything from the end-to-end standpoint.
Cool. And you mentioned that the idea of building Gwyneth was a few years ago. Can you place us where in time that was? Yeah. It originally started as far back as 2014,
the year that I joined. We were looking for a solution to this kind of unlocking the power of the stream processor output.
And we actually had a prototype of something that was doing that on top of Voldemort, but it was very janky and not very robust.
And so we started thinking, like, what would a first class system for this look like?
And then we got an intern and we got that person
working on Venice. Actually, the first commit in the Venice code base is from that intern called
Clement. And then we got the project approved, you know, beyond kind of just internship scope.
And then unfortunately, we got sidetracked after that for a few years. We had competing priorities, so Venice was shelved for a while as we needed to make some big investment inside Voldemort. we had a fully staffed team that started working full-time on it and we worked on it throughout
2016 and by the end of 2016 we landed in production so within so about one year of full team full-time
development and at that point we supported only the batch functionality so we didn't have stream
yet but then we got started immediately immediately on stream ingestion as well.
We had that in production about half a year later.
Around the same time, we stopped the self-service onboarding of new data sets onto Voldemort,
instead redirecting them to Venice.
And then, so at that point, we're like in mid-2017. And then right then started the GDPR craze, right?
Where everything had to be rethought.
And Voldemort didn't have any access control, no authentication, no authorization.
So it wasn't going to cut it.
And we had built those security features from the beginning in Venice, anticipating that, you know, for a next
gen system, that was a must have, right? So, so we kind of took the opportunity of all the migrations
related to GDPR that were happening anyway, to say, well, you know, it's time to retire Voldemort.
And so that was a pretty massive migration. We migrated maybe 500 data sets in the span of two or three quarters.
So Venice went from like zero to one pretty quickly.
And then that brings us to mid-2018, where finally we could kind of shed all of the operational burden of Voldemort, focus completely on Venice.
And then after that, the scope and scale of Venice
has continued to increase year after year.
So, yeah.
And you open sourced Venice sometime, I think, last year,
I think around September 2022.
So was it always...
2021, okay.
So was it always a plan to open source Venice? No, actually, you're right. Sorry about that. 2022. So was there always a plan to open source?
No, actually, you're right.
Sorry about that.
2022.
No worries.
We'll actually link your talk that you gave at, I think it was at Strange Loop.
Yeah, I think we'll link your talk in the show notes so that people can watch that talk.
I think it was a good one.
So yeah, getting to the open source like, was there always a plan to open
source Venice eventually? Yeah, actually, there was. So the Voldemort team was developing Voldemort
in the open. And so for, for that team, it felt natural that the replacement of Voldemort should
be open source as well. And we already had kind of the habit of separating the open source part of the code from
the proprietary part of the code, like this kind of decoupling that is critical to making the
project open sourceable was already kind of in our DNA in a sense. So we started Venice with that
structure from the beginning, which I think helped a lot. So yeah, it was the intent from the beginning.
And what triggered us to open source it in this case?
Like what made you realize that, okay, at this point, it's good enough that now is the
time to open source it?
So although we started with the right structure in place, over the years, unfortunately, we did let a little bit of tech debt slip in and we did become, I wouldn't, well, I'm hesitant to say tightlyced, but we still wanted to do it, right?
So we were working hard to try to avoid making it harder to open source, but we didn't really have the necessary effort going on to unwind these pieces of tech debt, right, that were kind of holding
it back from open sourcing. And that is why it took many, many years to open source, because
if it weren't for that, I think we could have open sourced it many years ago. But
then what happened is there were some opportunities which were coming from internal needs, right, to refactor some parts of the system.
And I think those are always great opportunities to rethink not just a very specific need in hand, but the bigger picture as well, right?
So whenever you rewrite a component, you should think, well, you know, what else is wrong with that component? Can we kill two birds with one stone, right? So whenever you rewrite a component, you should think, well, you know,
what else is wrong with that component? Can we kill two birds with one stone, right?
So as part of that, we improved the reliability and the performance of our
replication architecture, the cross-region replication. And that was one of the pieces
that was tied up to some internal stuff. So we got rid of that.
And then after that, what was left that was still internal was extremely minimal, right?
So at that point, it became kind of realistic to say,
well, look, we can put one person working for a few months and then it'll be done, right?
And fortunately, LinkedIn has a strong culture of
open source, especially in the infra teams. So there was support for doing it. But the support
was there, I think, because we were so close to begin with, right? Like if we had said, well,
look, it's just going to take half the team for a year, then that would not have flown, right?
It is a lot of work on a continuous basis to keep the project in a shape that is open sourceable.
Like for you guys, what was the draw to open source it?
Was it like the publicity you get out of it?
Was it like the community around it that you can like get future and get help from the community?
Like what was the draw to make it open sourceable in the first place?
So I think what most people assume to be the benefit of open source is the contributions
you get from the community.
But I think in practice, it takes a very long time before contributions start coming in and when the
contributions come in they take effort from your end to review and steer and so on plus the
contributions may not be in the direction you're interested in anyway right so contribution wise
i don't think that's the main draw right for? For me personally, the way I see it is open source is an incentive for people to do their best work because it's in the open.
It's sort of a public portfolio of sorts. Subjects could be cutting corners and because all that matters is the deadline or shipping, etc.
In open source, I think there's a bit more of a built-in incentive to have a higher level of craft because people are maybe a bit more self-conscious, right, that their code is out there in the open. The other thing is, in order to make a system be useful across many
different organizations, many companies, it requires a certain level of modularity and
versatility and quality that is not necessarily going to be present in a proprietary system. And so that's another angle where quality
gets uplifted or has the potential at least to be uplifted. That's super interesting. I would never
have guessed that. But that makes a lot of sense. We're talking to Nathan Mars and then he mentioned
that they have this system where they randomly like pairs people for coding together or just in pairs and then that
to me i was like oh shit yeah that would definitely keep me on my toes in terms of
you know keeping my code more in thing i think yeah here it makes a lot of sense that if you're
building in public the power of git blame is nice nice and then from the angle of the business, you know, well, quality is nice, but is it going
to be the main incentive business wise?
I don't know.
And if we don't get much from contribution, at least not for a long time, then what is
there for the business, right?
So for the business, I think the incentive is mostly around like talent brand you know i was talking to a director or something
like that in the hadoop world offline world and he was saying the best tool in my toolkit to hire
people is that we're the company that open sourced kafka and they're not even working on kafka right
they're sort of fairly far off from Kafka. They're doing the offline stuff.
But still, just because in another pocket of the company that that's a project that came out,
it encourages people to say, oh, well, this is a place where, you know, people go do their best
work, right? So yeah, in terms of recruitment and retention, I think there's a benefit.
Although retention is kind of a tricky one because sometimes open source projects end up causing kind of a brain drain.
But I think if the company positions itself well with the community and doesn't kind of antagonize and so on, then it can serve as a positive on retention as well.
It has that potential at least.
So as you mentioned, there's a cost to open sourcing a project,
not just from starting it,
like making sure it has either no or loose coupling with internal systems,
but also making sure when a company puts its name to an open source project,
the quality is really high, the craft is really good.
So it takes extra effort beyond just making sure, hey, this thing works, it's good quality, it's production ready, but you want to make sure it shines when it's open source to the public.
So there is extra work that goes in there.
And there has to be a business incentive for a team to spend time and effort to do it. And this is a question that many engineers ask either themselves or their managers,
like, hey, we should open source a system.
And more often than not, they get a no.
And for valid reasons, which may or may not always make sense,
but from a business perspective, it does.
So for engineers to kind of think about this business perspective,
have you thought about what's a good way for engineers to kind of think about this business perspective, have you thought about
what's a good way for engineers to one, think about when they should pitch to open source system
and how they should go about pitching it to their managers? Yeah. So like you said, we want to put
out good work, right? We don't want to put out some shoddy half-baked thing. So you have to kind of take a hard look at the project and really kind of be honest with
yourself and, you know, is this really some first class work or are we just kind of riding
along with it because it's tech debt and it's too costly to unwind, but not that great to
share either, right?
So going to open source,
yes, I think there are lots of good reasons for doing it. But there should be a question,
is this particular system within the company first class in any way compared to what's out there?
Or is it the first of its kind to be open sourced or not, right? And if the answer is no, and there are other open source alternatives that are better,
then it may be the case that it's actually better to just scrap the internal project
and adopt some open source thing instead.
Maybe, right?
And that's always a tough decision for the people involved in the project, but it's a
decision that needs to be made sometimes.
So yeah, essentially it comes down, in my opinion, to would this contribute to the state
of the art in the industry and along some dimension, maybe not along all dimensions,
but is it like the best in terms of security?
Is it the best in terms of performance?
Is it the best in terms of anything, right?
Or at least is it close to being the best right like maybe it's not the
best in any one thing but the the mix of all these various dimensions we care about is sort of high
enough in a way that maybe no other system strikes that same trade-off right that becomes i'm
assuming that becomes a conversation with with the team to say, this is exactly why we should place the system in the open source.
And this is the work that it would take.
And these are the potential benefits you would get out of it.
So before actually going into pitching this,
it makes sense for engineers to kind of do this homework or research themselves
to figure out, is it even worth open sourcing in the first place?
And whether it has, whether it at least meets the bar for a project that should be open
source and then think about the business incentives.
In my experience, I haven't seen the conversation happen in this way.
Like a lot of the work needs to come from the team. I think it's like a lot of upfront work
needs to come from the intrinsic desire to do this.
And that's like pre-work, right?
That's like before you even consider,
like it's not like you flip the switch
of making it open source
and everybody's habits will change overnight, right?
If the habits to do it are not there to begin with, then it doesn't look too good to go there.
So like I was alluding to earlier, I think you have to be very close to pulling the trigger before it even makes sense to have that conversation in a sense, right?
If you're like many person years away from being able to open source the thing, then it's not even worth having
the conversation. But if you're very close and it's really just a matter of, you know, tidying
up a little bit of documentation and changing some access rights on the GitHub or whatever, then,
then of course, then we have the conversation like, well, we have this opportunity right here.
You know, we think it's first class in these dimensions.
And the team is interested in doing this different way of working, right?
Which in some ways is going to be higher effort, but the team is motivated to do it.
So then at that point, at that point, the rest becomes a bit of a formality.
Although there are lots of formalities to jump through.
But there are kind of formalities, I would say.
Did you have to convince your teammates to pursue this together in that?
A little bit.
On a day-to-day basis, it always comes up in the sense of we need to build this new thing, this new module or whatever. Are we going to build some
abstraction around like this proprietary thing we're integrating with or do a direct integration,
right? And of course, there tends to always be a bias towards the simplest path, the path of least
resistance, which is an okay bias to have for the most part, right? But you have to kind of drill down into the team,
this habit of, well,
is this the better approach in terms of craft, right?
And if the team is not interested in it,
then it's probably not worth forcing it, right?
But if the team is consistently making the choice
to go in that direction,
then at some point it builds momentum
and it sort of becomes self-reinforcing.
Go about building that culture within the team.
So there are two parts here, right?
One, there is a strong leader within the team, both on the technical side, as well as support
from the manager to put in this kind of work and create this culture.
And then the second aspect of maintaining that culture over the years.
How do you go about doing that?
Lots of code reviews, I guess.
Because like, as you said, if the team's not interested, having those conversations seem
forceful or to a point exhausting, to be honest. So at one point, a person might get to a place
like, you know, why even bother? But let's say they want to change that. Do you have any thoughts
around how one could go
about changing that well i think one part of it is picking your battles right so a team you know
usually a team doesn't work on a single project right like it it's like maybe a single umbrella
where everything is kind of coherent and related together. But at the, you know, concrete level,
they're working on many 10 or 20 different Git repositories, right? And so sometimes the,
it's just a matter of making the right initial decision that kind of railroads the following
decisions in a certain way, right? So for example, you say, well, this particular piece,
we know it will need to be tightly coupled to a bunch of internal things.
So should we build it from the get-go
as a proprietary thing rather than, you know,
build half of it inside the open source
and then have like a very large surface area
of abstractions to kind of hook it all together. And then if you end
up ripping the two apart, then the half that remains in open source doesn't even make sense
on its own anyway, right? So it's a matter of trying to find where the integration point should
be, right? So for example, like we have a system within the Venice team at LinkedIn, which handles our self-service onboarding for our internal users.
Well, that thing is proprietary and it's tied up with a bunch of other proprietary things.
And the only interface it has with Venice is like the sort of admin interface that can be used by either by a human operator or by this self-service console, right?
But if we had tried to pull parts of the self-service console into the open source,
then we would have a much larger surface area of internal things to integrate with via abstractions. So a lot of it is about choosing scope
and then following decisions can be easier.
I like that though.
It's asking yourself the question of,
oh, right, would this project be open sourceable?
Or like, do we want to just having,
like thinking about whether that would be an option
and that kind of changes
how you think about the decision boundaries
and how you want to scope the project.
It's cool.
So once you open source awareness,
maintaining an open source project
is different from doing your day job
on projects which are internal to the company.
So what does it look like
to maintain an open source project? So in one sense, it's the company. So what does it look like to maintain an open source project?
So in one sense, it's the same. And in another sense, it's different, right? So
when you're writing code, it doesn't really matter whether you're pushing that to a private
Git repo or a public one, right? You're just writing code. And either way, you have a code review process,
and it's basically the same.
The difference happens at the boundaries, right?
Like, which is sort of what we were talking about earlier.
The integration points at the edges of the project
is where the decision-making is different.
Then the other aspect, which is like kind of non-coding related but
more about the rest of the processes is like how do you make the design process open how do you make
the let's say release certification process more transparent and and so on and that part is a journey that that we're
still going through in venice and i think we're not as far along as some other projects that are
more mature in the open source space so although we we want to go in the direction of opening up these developmental processes.
We've only partially opened them up so far. And that I think is another thing that requires ongoing effort,
like kind of encouraging the team to work more in the open.
But then that brings up sort of a similar concerns in the sense that,
well,
if you're writing a document about some new functionality, like one part of that's going to be the justification for it, then maybe the justification is some internal use case.
Well, now, if the document is proprietary, you kind of have to abstract away the justification saying, well, we think that some people are interested in this category of workloads, but then you avoid saying that it's this particular team or that particular product that is requesting it.
And there may be some very specific product details details that, you know, you don't
want to leak through that channel. And maybe in the most extreme case, it might even end up being
like that, that you have two documents instead of one, and you have sort of the
internal counterpart to the open source document. And so it's a bit of extra overhead.
So, so yeah, that's still kind of a transition we're going through. I don't know if we'll go all
the way. I hope we will. But yeah, we'll see. We'll see where that takes us.
One aspect of having an open source project be more successful is also the community around it.
So adoption matters. Like internally, there's one system to use. And if people want to write
to a distributed data store, like, well, Venice is the choice.
In the open source world, a project becomes popular when it's adopted by different companies or different teams.
And eventually they also want to contribute to it.
So have you been thinking about or spending time on building a community around it?
Yeah, yeah.
So we have an open source Slack that people can join. It's all linked from the main page, venicedb.org.
So people can join that. And we have some people trickling in that are asking questions and so on.
There's also this thing, I don't know if you've heard of it, called Linen. Linen is some service
that allows you to import your whole Slack in a way that is publicly accessible and
searchable. And it skirts around the fact that free Slack instances have a retention limit on
the messages. So you can go there and search all past conversations of the Venice Slack instance.
And there's some good kind of troubleshooting and just general advice lingering in there so yeah that's one
aspect we we try to do a community sync up every two weeks we haven't been doing it exactly every
two weeks because you know sometimes the people on the slack tell us oh well we won't be available
that week or whatever so then we just skip it but But it's there if some people want to have a more
kind of direct interaction. Interestingly, the community members that we got so far
are mostly in Central Europe. So time zone wise, it's a little bit of a challenge,
but we settled on a schedule where it's like 8am.m pacific 11 a.m eastern time and 5 p.m central european time
a bit of a stretch for the pacific and the europeans but at least we we can get everybody
together and hash things out so yeah that's what we've been doing so far and one of the things we
want to chat about was like your principal engineer at
LinkedIn. And for folks who might not know, but being a principal engineer is being on the
individual contributor track, which is equivalent to being a director if you are on the managerial
track. So at this point, working on Wennes for over the years, how do you think open sourcing Wenis played a role in
just your personal brand within the community? Because as you mentioned, you're showing off
your best work in a way. And at this point, you've been talking about Wenis in the community,
giving talks at conferences. So how has that impacted just your personal brand?
So the honest answer is I'm not sure.
Maybe some external observers might have an opinion.
So before joining LinkedIn and towards the beginning of my time at LinkedIn, I was doing
the conference circuit like a lot more as an attendee, not as a speaker.
And I was really into that and always sort of keeping my finger on the pulse of everything
that's coming out. And then after many years at LinkedIn, I, you know, I was so busy with the internal stuff that I kind of lost track of things in the rest of the industry for a while.
And now I'm sort of reemerging out of that and trying to get to grips with the 10 million systems that came out in the last few years and all that
stuff. So I'm going back to the conference circuit and it's been interesting so far.
Yeah, it is nice to share the work and speak to people about it. Although I don't know if that
answers your question really. No, it definitely helps. It definitely puts a perspective as to
at least what it looks like from your side. And I understand an external observer might
be able to share more about what it looks like from a brand perspective.
Okay, before we close off, we're getting towards the end. I mentioned that you're a principal
engineer at LinkedIn. And a lot of things would keep you busy. They would involve like a bunch of meetings,
working on awareness, mentoring other people. And there's an aspect of as you grow as an engineer,
you spend more time on leadership and relatively you spend less time on execution.
What does that look like for you and how do you usually make trade-offs or balance that? So there's been
a few different phases to my career at LinkedIn. At first, I started off as an entry-level engineer
and I was doing a lot of hands-on work and then gradually started being involved into more
meetings and such and more of the leadership side as you
were saying and then that that went to the kind of far extreme of that spectrum where for a few
years i was literally in in meetings all day like from morning to to evening and and there were
times where i would look back on like my contribution history and I did like, let's say, a one line config change in the whole quarter.
Like that's all I did, technically speaking.
Right.
Everything else was sucked up by meetings, maybe some documents. a little while but after two or three years of that i felt like i was starting to lose grip on
on the project in terms of like what was really happening on the ground and i could see it also
like i never stopped being on call throughout this right and i could see that my on-call shifts were
getting like harder i i just wasn't as familiar with things anymore and then at that
point you i think everybody in that situation gets to some kind of fork in the road right like
some people say well i should just get off the on-call rotation and focus on the the thought
where right the documentation and people talking and so on, or you let the pendulum
kind of swing back and the other way, right? So for me, I felt like I was missing the technical
work, the hands-on work, and I wanted to get back into that. And interestingly, that sort of
coincides with when the pandemic happened.
And for a while, the pandemic was sort of business as usual, like we're doing just as
many meetings as before and so on, except it was remote.
But then at some point, I think things shifted where maybe because some interactions moved
to Slack or whatever instead of face-to-face meetings.
But the upshot is the meeting workload started diminishing a little bit.
Not a lot, but it was still like a large part of my time.
But it was starting to diminish.
And it also made me realize I can do my work without face-to-face meetings.
I can do it work without face-to-face meetings. I can do it remotely,
right? And then around that time, for family reasons, we decided that it was best for us to
move back home to Quebec, Canada and on the East Coast. And I was fortunate to be granted
the opportunity to do that while staying in the same team and company. And so now that I was on
the East Coast with the rest of my team on the West Coast, I had this large chunk of time in
the morning without any meetings. So then I could really reshape my work in a direction that I had been wanting to go, but I couldn't really go in earnest. And so I essentially
took the opportunity of this move to make it more formalized that I will now focus a bit more on the
hands-on work and I'll still be available for meetings in what is, you know, my afternoon and the Pacific time morning.
But essentially, it's sort of like half of my day now instead of my whole day.
And I have been coding a lot more than before, which I like, and getting to grips with the project again.
And so, yeah, that's been kind of my own personal journey.
And yeah, it's been working out.
Well, there's definitely something I would take from that experience.
Personally, I need to do that too.
And I moved to the East Coast as well.
And some of the morning time is bliss.
Like that's been helping.
The other question that I have on similar lines is when you stop doing
some of the meetings you were going to, someone else has to take up that responsibility. Like
usually as a lead engineer on the project, you spend a lot of time either telling people how
to use a system or how not to use a system, or have conversations about how it should be
integrated with X, Y, or Z, or the new features that need to come in. If you do less of that, someone else has to do more of it. So when you went through
that process of formalizing some of this, what did that look like? Yeah, so I've definitely
had to develop the habit of delegating a lot more. And I used to be in the loop for everything, right? And I liked it this way for a
while. But now I have to make a conscious effort to get out of the loop, essentially, because
my time can easily become a bottleneck scheduling-wise, and I don't want to slow down
the team, right? So whereas before,
when somebody throws a meeting invitation at me that conflicts with something else,
I would like,
my default answer in the past would have been,
well, you know, I can't make it at that time,
but this looks important.
So let's find another slot.
And here's maybe two or three choices
you could pick from,
or I would kind of go out of my way to make it happen some other time. And now my default answer
is, well, I can't make it to that slot, but that's okay, right? There's a this or that other person.
So instead of saying like, here are two or three other slots that you could get me, here are two
or three other people you could get info about this thing you're interested in, right?
And I think that has given more space also for the rest of the team to grow.
And I definitely lean a lot on the other leads of the team.
This is definitely not like a single lead team.
We have a lot of very good people working in the team that I lean on. So it's kind
of a shift in the default response. And then sometimes, not very often, but sometimes I would
override that default and say, well, no, this is actually something I do want to be in the loop
with. But then that will be a very intentional decision, right? For a very narrow set of things. And concretely, it doesn't happen
that often, but every once in a while. Essentially, if I think there's a good probability that with
the right people in the meeting, the right decision will be taken or one of the right
decisions, because oftentimes there is not a single right decision. But if one of the right decisions is taken, then it's fine.
But if I suspect that there's a chance that I might have some grievance
with whatever decision is going to be taken,
and then it ends up needing to be redone,
then that's a waste of time for everyone, right?
But that doesn't happen very often.
And in some cases, it's the other people that insist,
well, no, actually, we would
prefer Felix to be there. So they will try to reschedule. But I guess that doesn't happen that
often either. But every once in a while, it does. It's sort of like picking your battles, I guess.
I thought you were going to say, that's not my problem, but you took the more humble way. I like
it. I like it. So I have this question. It's not fully framed in my head. So if it falls off, we can cut it out.
But as you mentioned, there are multiple leads on the team, and they're focused on different parts of the project.
Oftentimes, when this happens on a big project like this, so let's say you present it at Strangelove.
This was around the time LinkedIn open sourced to Ennis.
What does that look like where you got the the opportunity to present was it you saying i want
to do this or this should be a strategy to open source in a way was there someone else who was
interested in also doing that maybe or presenting when is somewhere else like what does that dynamic
look like within the team yeah good question i guess i've been doing most of the kind of shopping around for conferences, I guess, within the team.
But it's not always me that does the conferences, right?
So, for example, the next one we're doing is at QCon San Francisco, and it's going to be presented by one of my teammates called Gaudier.
He's going to be co-presenting actually with another guy called Alex from the performance team.
So it's going to be a talk focused on the performance aspect of Venice,
which is, like I said earlier, it's a subject I'm very interested in.
But I figured, you know, since the talk is in San Francisco and there's like the whole team there, I felt it's kind of silly for me to fly, go nab the talk, you know,
while there's a bunch of very
competent people that could be doing it instead. And yeah, I also encourage, you know, other team
members to submit to the call for presenters. We don't always get taken, of course. But yeah,
I encourage others to do it as well. So yeah, I try not to hug the spotlight all for myself. There is many
people working behind Venice and I want to make sure they get recognition. And also, you know,
conferences are fun, but at the end of the day, I don't want to be on the road all the time. Like
I've got a family here and so on that I want to spend time with. So, you know, maybe like one
conference a year or something
is what I feel is the right balance for me at this moment. And if there are more opportunities
than that, then I would probably redirect them to my teammates. In this case, like, does your team
as a group decide like which conferences you want to go to or what kind of topics you want to give
like present on? Like, is there some sort of an internal discussion or coordination or this is just individuals coming
up with ideas and then going with it it's a bit of both i guess like in in some cases so so we
present also at meetups not just that's at conferences so in in some cases it's like
this meetup is coming up we are told we could get a slot in it, but, you know, we have to come up with a subject
and then it becomes kind of a team exercise to brainstorm what subject are we going to
do?
Who's up for doing it?
Obviously, it's work to prepare that, right?
For most conferences, it's a bit more of a formal process, right?
You have to kind of decide upfront
what your subject is going to be
and you pitch that to the organizers
and there's the CFP process and so on.
And then there are some conferences like QCon
where it's not really a call for presenter.
It's more like the track leads
are handpicking who they want.
So it's more of a, you know, you got to know the right people and they might ask you and stuff like that.
So, yeah, it's kind of a mix depending on the venue.
Felix, what's your favorite story of, what is the parentheses, software misadventure?
Hmm.
What's my favorite misadventure?
At some point, I learning mandarin without knowing it
because some of my teammates were talking in in mandarin among themselves and then one of my
teammates asked me like to ship something and i gave him some feedback on the on the pull request and then
then he went to someone else who was sitting in like right in front of me and talked in mandarin
about it and and he basically asked the other person hey can you ship it for me because felix didn't want to or something and then and and then i don't know
exactly how that unwound itself but like i i sort of gave them a look or something and then
and then the other person was like well you know just you know if you got that feedback you gotta go address it i mean i don't
know if he might have given the ship it otherwise but yeah i mean that was kind of a funny moment
i thought i thought you put you put on a like the live google translate audio and then like
pointed over and then the direct translation was, Felix won't ship it for me.
Nice, nice, nice.
Cool, cool, cool.
I'll take it.
Pretty good.
Thanks.
No, that's a good one.
I want to learn more languages at this point,
which may be a personal goal.
So Felix, this has been an awesome conversation.
Thanks for sharing so much about Venice
and your personal journey.
Before we bring this to a close,
is there anything else you would like to share with our listeners?
Not really.
Thanks for having me.
I just want to say again,
the website for Venice is venicedb.org.
So check it out if you're interested.
We got our Slack there, our blog, all our resources.
So that's all.
Thanks a lot for having me.
We'll link to everything about awareness
as well as Felix Yu in our show notes
where people can find awareness
as well as learn more about yourself.
And again, thank you so much for joining us today.
This was a lot of fun for us
and I'm sure it'll be a lot of fun for our listeners too.
All right.
Well, thanks Felix.
Have a nice day.
Hey, thank you so much for listening to the show. You can subscribe
wherever you get your podcasts and learn more about us at software misadventures.com. You can
also write to us at hello at software misadventures.com. We would love to hear from you.
Until next time, take care.