CppCast - Envoy Proxy
Episode Date: April 30, 2020Rob and Jason are joined by Matt Klein from Lyft. They first discuss an update to Microsoft's Guidelines Support Library with changes to span. Then they talk to Matt Klein who dicusses Envoy Proxy and... how it's used in Cloud Native applications. News Tweet re: SPMD Lambdas CppCon 2020 Call for Submissions GSL 3.0.0 Release Links Envoy Proxy Envoy Proxy on GitHub Sponsors PVS-Studio. Write #cppcast in the message field on the download page and get one month license Read the article "Checking the GCC 10 Compiler with PVS-Studio" covering 10 heroically found errors despite the great number of macros in the GCC code. Use code JetBrainsForCppCast during checkout at JetBrains.com for a 25% discount
Transcript
Discussion (0)
Episode 245 of CppCast with've got CLion, an intelligent IDE,
and ReSharper C++,
a smart extension for Visual Studio.
Exclusively for CppCast, JetBrains
is offering a 25% discount on yearly
individual licenses on both
of these C++ tools, which applies
to new purchases and renewals alike.
Use the coupon code JetBrains
for CppCast during checkout
at JetBrains.com to take advantage of this deal.
In this episode, we discuss an update to Microsoft's Guidelines Support Library.
Then we talk to Matt Klein from Lyft.
Matt talks to us about Envoy Proxy and how it's used in cloud-native applications. Welcome to episode 245 of CppCast, the first podcast for C++ developers by C++ developers.
I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Jason, how are you doing today?
I'm all right, Rob. How are you doing?
I'm doing fine. I did some work over the past few days on my deck,
which I took apart like a year ago today, and now we finally have it stained,
so it's pretty much all done now.
Took it apart a year ago.
I know. It's been a long time but pretty much done
now it only took me like three years to finish my basement once i started yeah some of these
projects can be pretty uh time intensive anything you want to share no at the moment largely just
waiting to see you know what conferences are gonna go through really, really. Okay, well, on that note, at the top of our episode, I'd like to read a piece of feedback.
This week, we got an email from Jeff Troll.
He says, a pretty credible person on Twitter is making a fairly
strong claim about Lambda Overhead, and I wanted to bring it to your attention in case
you might want to discuss it on your next show. Either way, all the best to the two of you.
See you at the next conference, whenever that may be. And yeah, so he sent the
tweet, which was from Richard, saying, I implemented more SPMD control flow using macros versus the
original CPP SPMD lambdas. I think I've got all the key stuff working um and i guess yeah he's finding some performance
differences did you take a look at this at all jason i did and it has absolutely no details that
i can dig into in any way at all it's a screenshot of some macro calls right and nowhere else in any
of the thread is there links to the c to the lambda version for there to be a comparison. But macros
are macros. They're not function calls. And lambdas are function calls. So the idea that
the compiler is going to do something different with them is virtually guaranteed. I wish that
I had had more details so that I could have actually played with it to see if it depended
on compiler optimization level or what the differences were.
Yeah, it'd be nice if the author of these tweets would write an actual blog post to do a comparison instead of just a tweet.
Okay.
Well, we'd love to hear your thoughts about the show.
You can always reach out to us on Facebook, Twitter, or email us at feedback at speakcast.com.
And don't forget to leave us a review on iTunes or subscribe on YouTube. Joining us today is Matt Klein. Matt is a software engineer at Lyft
and the creator of Envoy. He has been working on operating systems, virtualization, distributed
systems, networking, and making systems easy to operate for nearly 20 years across a variety of
companies. Some highlights include leading the development of Twitter's L7 edge proxy and working
on high-performance computing and networking in Amazon's EC2. Matt, welcome to the show. Thank you so much for having me.
So I'm kind of curious, have you been actually directly involved in operating system development
then? Yeah, I actually started my career back at Microsoft, and I originally, this is dating myself a lot, but I worked on Windows Mobile, which was pre-iPhone, pre-Windows phone.
And that was based on Windows CE.
So I actually started my career doing operating system stuff around Windows CE.
And then from there, I actually worked on HD DVD, which was the competitor to Blu-ray.
So I'm also dating myself.
This is pre-streaming.
And HDDVD failed as a standard, but I did operating stuff, operating system and embedded stuff for HDDVD.
And then from there, I worked on Windows NT.
I spent a bunch of stuff in my early career working on concurrency.
So I built a user mode threading system for Windows NT back in the Windows 7 timeframe.
And then since then, I've mostly switched to Linux.
So when I was at Amazon, I did a fair amount of stuff on Zen, kind of low-level networking systems, hypervisor operating system type stuff.
And then since then, I've actually progressively moved up the stack.
So lately, I've been working more on Layer 7 networking systems.
So Rob actually used to do development on Windows Mobile.
And so I'm curious, is there anything that you two need to work out?
Are there any apologies that need to be made or anything?
You know, this is so long ago now that I have these vague memories of comm
and just being super scarred by that.
But, you know, we're now going back like 15, almost 20 years.
So I think I've put most of that out of my mind.
I was starved by having to write my own com objects in C++.
I originally came from Visual Basic, where everything's a com object.
And then like, wait a minute, this takes so much work in C++.
It's funny how I actually haven't thought about this in a very long time,
but now it's all coming back.
You know, you have to make this object, and then there's this grid,
and you have to, like, put it in the registry somewhere,
and there's, you know, 17 layers of indirection
until you get your thing created.
It's extremely horrifying.
I just had someone, I asked a question
on Twitter, I don't even remember what the question was now, but
someone said, oh, I was dealing with DLL
nonsense. And they're like, well, maybe
you need to run reg server. And I'm like,
whoa, no.
No, I haven't needed to run
that. This is
a blast from the past, wow.
Fortunately, that was not
actually the answer.
Right.
Yeah.
Yeah.
All right, Matt.
Well, we got a couple of news articles to discuss.
Feel free to comment on any of these.
Sure.
We'll start talking more about Envoy Proxy.
Sounds good.
So this first one is CPB Con 2020, Call for Speakers.
And I think we mentioned just last week that meeting c++ uh is on and cpp con
2020 is also starting to make preparations uh you know there's always a chance that if you know the
situation is not great come september that this conference will not actually happen but you need
to kind of plan for it and and hope that it will be able to happen and make the call for
speakers and everything. Right, Jason? Yep. June 5th. June 5th. June 5th is the call for speakers
deadline. Yep. Awesome. Right. And then that will be sent out July 13th. And the actual planned
date for the conference is September 13th to 18th. And I am giving a class at this conference, although I don't know if it's
actually been officially announced now that I say that out loud. Pre-con or post-con?
Pre was what I submitted. They don't have any of the 2020 news up yet. So I know that John did
mention it when I was on CPP chat the other week. So it is at least mostly official,
even if it's not on the website yet.
Uh,
but they don't have any call for,
you know,
people can't register anything yet.
Anyhow.
Right.
Been thinking through the probabilities with something like this,
either,
uh,
let's see possibilities.
Uh,
the world returns to normal.
We all forget this happened and we have a full conference by September.
Um,
the conference is allowed to happen, but few people show up,
relatively speaking, because they're nervous, or it's moved online and or canceled. That's what
I'm thinking about for the next few conferences that I have up. No idea what's going to happen
with any of them. Yeah, it seems super hard for me to believe that there'll be many conferences this year.
Yeah.
Yeah, I mean, at least in the C++ world, basically everything over the summer months up until like August was canceled.
Or moved online.
Or moved online, but yeah.
Yeah, that's been the same in the cloud-native space.
And, you know, I just personally can't believe even that the fall-winter conferences are going to happen.
It seems not very likely that we're going to be able to get 5,000-10,000 people to come to one place,
which is sad, but that just seems like the reality that we're in right now.
C++ conferences, the biggest C++ conference brings about 1,500 people or something now, Rob?
Yeah.
So we're still relatively small fish, but...
Still a big crowd.
Yeah.
Okay, next article we have is the GSL 3.0 release.
This is on Microsoft's Visual C++ blog.
We haven't talked about GSL in a long time, Jason, I feel like.
But yeah, the C++ core guidelines support library is still out there, still being updated.
A lot of the changes in this one seem to be about span to bring it in line with changes to span in C++ 20.
Yes.
You have something you wanted to add to that, Jason?
Oh, it's just this, you know, like three year long argument about whether or not size should be signed or unsigned in standard containers
unsigned one and so you know the gsl microsoft implementation started out with size being
signed which caused then people who actually trying to use it having to do cast everywhere
to avoid errors to avoid warnings but now they got in line with the c++ 20 span which
has unsigned size t for its size value uh it's just you know but for people who want the signed
representations we've got s size now coming throughout the standard so it's there for you
if you want it but it made me the reason i laughed is because the only comment on here is, oh, my goodness, the bliss changing to signed is unsigned
is going to save us so much code. It's the only response to the release announcement.
It's funny. Okay, well, Matt, could we start off by you giving us a kind of overview of what exactly Envoy Proxy is?
Sure, of course.
And I'm assuming, I should assume for the listeners of this podcast, that people have very little to no context, right?
So I should start from scratch, basically.
Assume that your listener understands C++ and nothing else.
I'm sure there are some out there who know what it is
and maybe are users, but the vast majority
are probably not familiar with it.
Yep, sounds good.
So Envoy at a super high level is a network proxy.
It's an extensible network proxy.
So the projects that it would be most similar to
that people would likely have heard of
would be projects like HAProxy and GeneX.
Those are also obviously Layer 7 network proxies.
Envoy was created by me.
I started working on it about five years ago at Lyft.
It was open sourced about three and a half years ago.
It's become very popular in what we call the
cloud native space right now. So, you know, those are applications like Lyft, like, you know, Slack,
you know, things like that, that, you know, grew up essentially running in the cloud. So these are applications that are built on microservices.
They're highly scalable, so they scale up and scale down. And so there's a whole number of
patterns that have been built up around making these microservice architectures work, right?
So for those of your listeners that are not super
familiar, you know, that's the context by which Envoy came from. So for me, sorry.
If you don't mind, sorry, before we move on, in the seven-layer burrito model, what is layer seven?
Layer seven is the application layer. But, you know, but even there, it gets extremely confusing, right?
Because the layers that most people would know about would be layer three, which is IP, right?
Okay.
And then layer four is TCP or UDP.
So those are the layers that most people will be familiar with at the operating system layer.
Okay.
And then where it gets a little hazy and a little confusing
is that you have protocols like TLS, you know, transport layer security.
Some people call that somewhere between layer five and layer six, right?
And then you have protocols like HTTP, right?
Is HTTP layer seven or is it below layer seven?
Because then you have application stuff on top.
You know, I think there's probably technical definitions of where all of these sit.
I think for my purposes, those technical definitions don't matter that much, right?
In the sense that we have, you know, TCP, IP, UDP. On top of that, we have very
common application layer transport protocols like HTTP. And then on top of that, people are building
microservices, right? So, you know, they're building services where their APIs might be
developed using REST. They might be using IDL systems like Protobuf. And, right, so, you know,
they're layering on top of these protocols. And at the end of the day, they end up building
applications like Lyft. And, you know, for most of these modern, you know, quote, cloud native
applications, there's like a reference architecture where, you know, you have
your phone typically, there's an app that runs on your phone, you're going to talk from your phone
via some protocol to a set of edge load balancers that sit at the edge of your application. And then
those load balancers, you know, might terminate something like api.lyft.com, and then they're
going to fan out that traffic to a whole set
of backend services. So, you know, services like Lyft or services like Facebook, they're typically
composed of hundreds, if not thousands of different backend services, right? So what we see in this
space where, again, it's these systems are scaling up and scaling down. They're highly lossy, right?
It's like there's failures happening all the time.
There are very difficult problems in this space, primarily around networking.
So how do I find all the back ends that I actually have to talk to?
How do I load balance between them?
And then the general topic of observability being things like,
how do I get my stats from the system?
How do I get my logs? How do I get my metrics?
It's like, how do I build my dashboards? How do I operate this system?
So for a very long time, people have been using these great projects like HAProxy and like NGINX.
And again, these are fantastic proxies.
These were proxies, though,
that were built in a slightly different time.
They were built in a time
when architectures were not as dynamic
as they are now, right?
You know, there weren't virtual machines
coming up and coming down,
and now we have systems like Kubernetes, right,
where we've moved away from virtual machines to now we're thinking mostly around containers.
And these containers come up and come down, right? So we have a lot of failure, we have a lot of
auto scaling, we have, you know, very sophisticated load balancing that actually has to happen.
So that's the background context. So for history, I worked at Twitter prior to Lyft.
And at Twitter, I built Twitter's front proxy.
And that's the gateway that basically accepts all traffic
and then it fans it out to Twitter's backend system.
And that was also written in C++.
And that system was never open sourced.
And amazingly, that system to this day, as far as I know, is still serving all of Twitter's
traffic.
And that was first deployed in the run up to the 2014 World Cup.
So in the 2013-2014 timeframe.
So when I was at Twitter, I gained a lot of experience of building these edge systems.
I also had a lot of experience with the way Twitter did what we call service-to-service
traffic.
So within these systems, there's two main types of traffic.
There's the edge traffic, so there's the internet
traffic, right, that comes in from your web browser or your phone. And then when all these services
talk to each other, you know, it's also obviously networking, but it's what we would call service
to service traffic. So this is like your user service calling your payment service or something
like that, right? But you have many of the same problems. You know,
you have what we call service discovery. You have to figure out where your services are,
what their IP addresses are, things like that. You have to do load balancing. You have to do stats.
You might be doing security, RBAC, et cetera, et cetera, et cetera. And, you know, what we found
at Twitter is we had these two separate systems, right?
It's like we had this edge system that was doing edge networking,
and then we had a separate set of libraries that were doing service-to-service networking.
And one of the other trends that we've seen within the industry over the last, I'd say, five to ten years
is we've moved from a world in which most systems are built in a single language.
So if you look back 10, 15 years, you'd have a lot of C++, you'd have a lot
of Java, to now, you look at a lot of these modern applications, and just the trends are
either because of what people do, or because we've acquired companies, you know, there's
applications that are now written in six, seven, eight different languages, right?
It's like you see Go, C++, Java, JavaScript, like Rust.
I mean, the list goes on and on and on.
And a major problem that we've seen over the last five or ten years is that you have all these common concerns, right?
Again, you have service discovery, load balancing, observability.
So you really have two ways that you can solve this.
You can build a library that runs in every single language, or you can build a proxy
that we would call it a sidecar proxy or a client-side proxy that tries to wrap a bunch
of these use cases into one piece of code,
and then we don't have to implement it over and over and over again in each language.
So Envoy was really born out of a couple of realizations.
One of them is that Lyft is having all the same problems that Twitter had in terms of microservice architecture, having trouble around networking, around observability.
But also this realization that we live in this polyglot world now where people program their microservices in lots of different languages, but we have all these common concerns. So what if we could build, you know, one proxy, right, one piece of code that's highly
performance, you know, and very extensible? And what if we could use this in all the places? What
if we could use it on the edge? What if we could use it as a sidecar or a client side proxy? You
know, wouldn't that be great from an operational standpoint? Wouldn't that be great from a developer efficiency standpoint?
So that's where we set out.
And we started Envoy five years ago with the goal of building a single proxy.
And I can go into how it's different from the systems that people might know about.
But we built one code base that we can use in all these different places,
and then it works in the Polyglot architecture,
and it allows us to handle
both edge load balancing, service discovery,
client-side load balancing, service discovery, et cetera.
So I've just been talking a lot.
So I will stop there
and just see if that gives you some good background context
before I go into the details a bit more. Yeah, definitely. So I guess one of the questions I
have to start off with is you're talking about all these different languages, you know, polyglot
world. So does that mean you have to have, you know, multiple bindings for Envoy proxy,
if that's all written in C++? Right. So what we don't, so there's So there's a multi-layer answer to that question. Historically, the answer
is no, there were no bindings. We would effectively build thin clients, right? So at Lyft, we had like
an Envoy client written in Python, which is a couple of hundred lines, similar for Go, PHP, etc.
And these are thin clients that knew how to reach Envoy on a particular port, but it would work like a normal network proxy.
It's like you would connect to Envoy on localhost, 8080, and then Envoy would go and do its thing. So from the perspective of the application,
you would connect to this proxy on localhost.
The proxy would do this magic.
It would find all the backends.
It would do the load balancing.
It would send the request.
It would get the response.
And the application would have a transparent network,
meaning it would have an abstracted network
where the application wouldn't know about the larger network.
It would just know logically,
I need to reach this other service.
I'm going to let Envoy do that.
And in the last few years, at least in the cloud-native space,
there's this very popular buzz phrase,
which we call service mesh.
And what we're talking about, basically,
is the service mesh pattern.
So the service mesh pattern is that you have an application,
you use some type of sidecar element,
whether that be a library or a proxy,
and that proxy is going to abstract the network
and it's going to do a bunch of this plumbing for you.
Now, the reason that I say that it's complicated,
and we can talk about this later,
is that in the last year,
we've been building what we're calling Envoy Mobile.
And Envoy Mobile is Envoy embedded directly as a library
into an iOS and an Android application.
And we have shipped that now,
starting to ship that to production at Lyft.
So the reason that I said that it's complicated
is that Envoy started its history
as a true separate process network proxy,
very similar to HAProxy or Nginx.
But now Envoy is continuing its life
also as an embedded library.
But again, 90 to 99% of the code is the same.
And that's, I think, what makes it so powerful
is that we can build this common code base,
which has a lot of contributors,
a lot of eyes on it,
and we can make sure that we get it right once,
and then we can use this code base in a variety of different places.
Okay, I find myself definitely slowly catching up here.
Sorry.
I don't work in a corporate environment,
so I never have to deal with proxies.
I don't do any kind of distributed
really work these days. So again, I'm not, I'm like, the words that you use are words that I've
heard before. The last proxy that I actually configured and used was a SOX proxy, which sounds
different. I mean, it's different, yes, but it's all solving similar problems.
And there's a joke in the distributed systems world, which is that any problem can be solved with another layer of proxies.
So from a distributed system standpoint, proxy is a very popular pattern, mostly because it allows a separation of concerns.
So it, you know, kind of like the microservice architecture,
theoretically allows a separation of concerns.
It allows teams to operate independently across some API.
A proxy also, you know, can,
firewall is probably not the best word, but it can be a bulkhead, essentially, between a bunch of fairly sophisticated functionality where one side doesn't necessarily need to be aware of what's happening on the other side. Envoy is doing a huge amount of functionality but the entity that's calling through Envoy
doesn't necessarily know all the things
that Envoy is doing on its behalf.
So I might, whatever, have an app
that needs to talk to a thing
on the other side of the proxy.
I say, hey proxy, I need to perform this action
and the proxy is going to say,
okay, whatever, transaction server number 13, that's the one that is going to say, okay, you know, whatever transaction server
number 13, that's the one that's going to take that workload. That is, that is just one use case.
And I think, I think Envoy has become extremely popular in a short period of time for a couple
of different reasons. But one of the reasons that Envoy has become extremely popular is it's very, very extensible.
So though many modern applications use HTTP,
Envoy at this point supports Redis,
it supports Redis Cluster,
it supports Postgres,
it supports MySQL. I mean, it supports all these different protocols.
So you can think of Envoy at its core
as a layer three,
layer four proxy. And by layer three, layer four proxy, I mean that in the IP sense, right? IPTCP
sense. And Envoy at its core, you know, it has a bunch of basic functionality for finding backends, doing load balancing, building filters that operate on bytes, right?
So it's like bytes come in, bytes can get filtered, bytes get written out to some backend.
Then on top of that, we've built all of these filters and extensions that do things like offer security, offer RBAC, offer rate limiting, build protocol support for things like HTTP, Redis, Postgres, MySQL.
And then on top of that, there's more extensibility.
So within the HTTP subsystem, we have a further set of what we call filter chains, where you can build filters that operate not on bytes, but on HTTP concepts like headers and body and things like that so that you can then build HTTP-focused rate limiting
or RBAC or anything else.
And at this point, we have a very rich set of extensions around Envoy,
and we're currently taking that even to the next level
where we have extensions now that you can build using wasm so i mean this is just
absolutely mind-blowing where the future of envoy extensibility is actually going to be through
wasm so this is already implemented uh it's already uh in use in a project called istio
which uses envoy um but envoy internally loads the v8 WASM runtime, and then it runs an
extension model where now you can compile your extensions in Rust or C++ or TypeScript or
whatever language compiles down to WASM. And then Envoy runs that code, you know, in a WASM VM, so it's sandboxed, and we can get some element of code safety there.
And it's a pretty fantastic model.
So there's just so much to unpack here,
but it's a pretty exciting time and a pretty exciting system.
If you don't mind, I want to try to just latch on one little thing here.
So in previous lives, I had done a fair amount of database work.
And when you said that it can act as a proxy to Postgres, I'm like, what would that do?
What does that look like?
What does that mean to me?
Yeah.
So for most of the database stuff that we have today, and I would include Kafka there also. So for things like
MySQL, Kafka, Postgres, most of what is doing today is around stats. And we also have this
for MongoDB. So actually, we support, I'm trying to think in terms of pseudo databases like Kafka,
MongoDB, DynamoDB, Postgres, MySQL, there might be others.
What most people are doing with this right now is actually protocol parsing and stats.
Because what you find in a lot of these managed systems, or a lot of these high availability,
you know, quote, cloud native systems, is, again, you have this problem where people
are talking to their databases in seven different
languages. The client libraries are of variable quality. They have different types of logging,
different types of stats. Or maybe you're running your MongoDB or your Postgres or your MySQL.
You're not actually running that database. You're using a cloud service, right? And maybe the cloud
service doesn't give you all the observability to
understand what's going on. So for a lot of the database protocols, people just want consistent
stats from the clients on, you know, what operations are being performed, what is the
latency of those operations. And the ability to get this consistently across your clients who may be talking to these databases in different languages is extremely powerful, right?
Because now, you know, you don't necessarily have to think about, oh, like, was there a difference or a bug in the Python driver for that database versus the Go one, right? It's like you can have some level of confidence that the stats
and the information that you're getting from the proxy is consistent across. And from a real-time
operation standpoint, that's super powerful. So it's almost like a Wireshark window onto
what your whole thing is doing. It is exactly a Wireshark window. And in fact, Envoy has some filters that perform tapping of various types and can actually produce PCAPs.
And it's even more powerful than a raw TCP dump, right?
Because in a lot of modern architectures, you know, you're doing what we
call zero trust networking. So you're doing TLS between every hop, so all the data is actually
encrypted. So that's great, potentially, from a security perspective, it's horrendous from an
operations and a debugging perspective, right? It's like, you know, if you're debugging an issue,
and you're trying to figure out what's going on,
it's like if you can't see the traffic,
you know, it's very difficult.
And not to mention that modern protocols
are typically not text-based anymore.
They're all typically binary.
So, you know, even with Wireshark
or a sophisticated plugin,
you know, you have to have the right protocol parser
and like a bunch of other stuff.
With Envoy already doing all the protocol parsing, Envoy can, if told to,
can filter and match and spit out data, you know, on various types of parameters. So, you know,
I like to think of Envoy as a, you know, like as a network operating system. It's a platform by which you can plug in,
you can almost think of them as programs, right?
You can plug in extensibility features that allow you to operate modern networks.
Okay.
So performance is absolutely critical.
Performance is important, yeah.
And that's why I think it's funny. It obviously won't be to your listeners, but I think a lot of people in the cloud-native space or in the modern application space have moved beyond C++ now. So, you know, Go is obviously very popular.
People are still using Java.
Obviously, there's Node.js.
There's lots of other higher-layer languages.
So I think in the ecosystem that I typically operate in,
there's a lot of skepticism around C++, right?
I mean, there's a lot of people that say,
oh, it's like, why did you write this thing in C++?
And, you know, look, like,
if I were starting Envoy today, would I have written Envoy in C++ versus something like Rust?
No, that's actually probably debatable. Like, I might have considered using Rust.
But when I started Envoy five years ago, you know, I tend to be a very late adopter. So I've worked in C++ my entire career,
you know, I'm very, very well aware of the robustness of the ecosystem. So for me at the
time, five years ago, C++ was kind of a no brainer, right? I mean, it was the only platform at that
time that I felt could deliver the performance,
and performance not just in throughput, but also in tail latency, right?
It's like, we don't want garbage collection, we want, you know, a very consistent tail latency,
because again, from an operations perspective, if you're looking at your proxy to give you these observability stats,
and you're trying to get stats at P99
for your application calls,
and these application calls might be one millisecond
or even measured in microseconds.
If your proxy is having garbage collection pauses
and is having this behavior that is hard to understand,
it's like if you can't trust the thing
that's doing the measuring,
that's not a very good thing to build on.
So that's why C++ was chosen at the time.
Okay. Makes sense.
I feel like I have an idea what you do now.
It's fun, actually, to talk about this with a set of people or a set of listeners that are not very steeped in this, just because,
you know, it's like, I talk about this all the time, but I typically talk about it with people,
you know, who are a little more familiar with building these types of applications.
So it's a, it's a fun challenge for me to, you know, to, to, to figure out how to,
how to speak about it in a, in a, a more ground up way. Right. Sure. Yeah. I mean, I've worked with people who specialize in, you know, to figure out how to speak about it in a more ground-up way.
Right. Sure. Yeah, I mean, I've worked with people who specialize in, you know,
making hardware that can do live network sniffing. Like, it's considered expensive,
I guess. High-performance, critical kinds of things.
Well, and that's why, you know, you talked about perf. And perf is very important, but I think from Envoy's perspective is that from a project,
we're not crazy about perf to the extent that we don't sit and do a huge amount of micro-optimization.
There are certainly hot paths that people have heavily optimized,
and we write micro-benchmarks and all of those things,
and we have a set of very good engineers.
But a lot of what we do is we're always trying to balance performance
with developer velocity and developer productivity,
and also, frankly, security, right? right because envoy is used as an edge proxy and um you know we we live in a very scary world now
security wise and it's a really interesting thing these days just to figure out what is the right
balance of perf developer velocity security etc um but you know again that's where being able
to build on a lot of the tooling that we have in the native code ecosystem you know it gives us a
lot of confidence right so it's like we're a big user of clang we use all the fuzzers we use all
the sanitizers um so just like being able to build
on this 20 or 30 year history of all of this tooling, it's not that other languages don't
have these things, but there's so much effort put into the C and C++ ecosystem in this area,
you know, that it's a lot of stuff that we can build on, which is great.
I want to interrupt the discussion for just a moment
to bring you a word from our sponsor, PVS Studio.
The company behind the PVS Studio Static Code Analyzer,
which has proven itself in the search for errors, typos,
and potential vulnerabilities.
The tool supports the analysis of C, C++, C Sharp, and Java code.
The PVS Studio Analyzer is not only about diagnostic rules,
but also about
integration with such systems as SonarCube, Platform.io, Azure DevOps, Travis CI, CircleCI,
GitLab CI, CD, Jenkins, Visual Studio, and more. However, the issue still remains,
what can the analyzer do as compared to compilers? Therefore, the PVS Studio team
occasionally checks compilers and writes notes about errors found in them.
Recently, another article of this type was posted about checking the GCC 10 compiler.
You can check out the link in the description of the podcast.
Also, follow the link to the PVS Studio download page.
When requesting a license, write the hashtag CppCast and receive a trial license not for one week, but for a full month.
So you said you started on Envoy about five years ago.
What version of C++ is it built in?
Have you kept up to date with changes in the language?
Yeah, we started on C++ 11 and then migrated to 14.
We're still technically on 14,
though I think we're in the process of moving to 17.
We use AppSeil from Google. So,
you know, we have access to most of the interesting standard library features beyond 14.
So, you know, it's like there hasn't been necessarily a super compelling reason to move
forward. I am personally pretty interested in coroutines from 20
but you know
it's not something
that I've personally been able to play with
even if you adopted bleeding edge compilers
today you still wouldn't
really be there
and that's where too there's always
you know doing an open
source system like Envoy
we can push the edge but we still have to be cognizant of where people run Envoy, what compilers they have available.
So it's not like we can do whatever we want.
And in fact, now that we are doing Envoy Mobile and we actually have to work with the Android compiler and the iOS Xcode compiler,
it limits us a little more
just because those compilers and toolchains
tend to be a little bit far behind.
But I think my experience is that
by using a library set like AppSeal,
we get most of the benefit
without having to be super bleeding edge.
And from a project perspective, I tend to take the mindset from C++ that I actually
like C++ a lot, but I also think that C++ is a language where it's easy to do the wrong
thing, right? So it's like from an Envoy coding standard perspective,
we tend to use, I would say, a very simplified version of C++, right?
It's like we tend to try to shy away from templates
unless there's a very good reason to do so.
And, you know, so it's like we attempt to keep things relatively simple
just so that it's easy to grok and not get too far into the weeds.
And that has served us well so far.
I guess, could you tell us about some of the unique C++ challenges that you faced when developing Envoy?
I'm trying to think.
From a language perspective, there hasn't been a ton. I've been doing C++ development for a super long time. We now have a ton of very experienced C++ developers and people that come from very rich C++ ecosystems.
So I wouldn't say that there's been any particular challenge.
What I will say is that I think build, build in CI, to be honest, has probably been our biggest challenge.
We were a very early adopter of Bazel.
That is Google's build system.
And I'm not going to lie,
as we probably adopted too early in the beginning,
it was very painful.
I will say that now having used Bazel for years and seeing Bazel develop,
it's very hard for me to actually imagine
doing a C++ project without Bazel.
Like when Bazel works, it works really well. And again, we're lucky because we have people
that work at Google. We have a good relationship with the Bazel team. So we're not really on our
own. It's like if we have a problem, we have people that are helping us. So I still, you know, I still think Bazel is probably a bit too complicatedparty dependencies and the way that it handles its library resolution and things like that.
From projects in the past, I have all these horrific memories
of sitting there with CMake and reordering the link library's order
for an hour trying to get it to link properly
because the order's not right or whatever.
You just don't have that problem with Bazel.
Like Bazel just figures it out.
So from that perspective, I think Bazel has been good.
We've definitely run into some edge cases
just around having the right tooling to do code coverage
and like have code coverage merging
and differences between GCC and Clang
and all of the typical stuff.
So, yeah.
Is Bazel one of the ones that automatically does distributed builds as well?
Yes, and that is really, really amazing.
So for Envoy CI,
we actually use a Google-hosted service called RBE,
which is Remote Build Execution.
And we run our primary CI on Azure Pipelines.
And Bazel farms out our build
to 50 or 100 backend computers,
compiles all the binaries, links them, runs the tests.
And that's kind of what I'm saying,
is that when it works,
it is absolutely mind-blowingly magic.
It's also do object caching?
Yep.
It does caching.
I mean, and so this is basically Google's internal build system that they've been slowly,
effectively open sourcing.
And it's still early days, but there have been times where I can access our RBE cluster from my local laptop.
So there are times when if I'm doing a big build or a big CI run
and I don't want to hammer my laptop and do like 45 minutes of compiling and testing,
I just point it at our RBE cluster, and literally it just goes off,
runs all the builds, does all the caching, minimizes network transfer, runs the tests, and gives me back the results.
It is absolute magic.
So on the topic of open sourcing, we haven't mentioned this yet.
Envoy is open source.
Yes, Envoy is open source.
It was open sourced about three and a half years ago.
It is in a foundation called the CNCF, the Cloud Native Computing Foundation.
It's the same foundation that hosts Kubernetes, which is a project that many people will likely have heard of.
And yeah, we have had a lot of success in a very short period of time.
It's been a fantastic run.
Envoy is now used by
pretty much all of the major cloud
providers who are offering products.
It's used by
many major internet companies
that you would have heard of.
It's used by products
that have built
on top of Envoy.
So we've built a really incredible ecosystem.
It's been quite amazing.
I wanted to ask more about the Wasm support you mentioned earlier.
We've talked about WebAssembly a couple times, but the thing you mentioned about doing plugins
via Wasm
sounded interesting.
Could you talk a little more about that?
Yeah, so Envoy has used static extensions for quite some time.
And again, it's a fairly stereotypical C++-type extension system
where we allow extensions to be bound into the binary
and they
auto-register. And that has served us very well. The downside of that system, obviously, is that
Envoy, you know, has to be compiled with the extensions that people want to use. That's
downside one. Downside two is that we don't really have a stable ABI, right? So it's like you kind of have
to compile your extension with the version of Envoy that you're going to use. You know, third,
there's no extension safety, so there's no sandboxing, right? So there's a number of
different problems. The biggest one, though, is probably the static compilation, because what we've seen in the industry is that Envoy has become so popular so fast, and it's now offered by a lot of cloud vendors, that many cloud vendors want to offer a service, which we would call something like bring your own Envoy container, but you want to have a way, right,
where given the stock Envoy container,
we can still run extensions and we can run them safely.
So now we've got this fundamental problem.
We've got this security problem.
We've got this ABI problem, et cetera.
So Google, you know, has been dealing with this problem
for quite some time and they thought,
well, you know, it's like we're obviously talking about Wasm in the browser, but there's, in the last thought, well, you know, it's like, we're obviously talking about
Wasm in the browser, but there's, in the last year or two, you know, there's been a lot of
discussion of server-side Wasm, and Google thought, well, let's build Wasm into Envoy.
And, you know, we're still early days, but it's a pretty fantastic solution to pretty much every one of the problems that I listed.
If Envoy can host a V8 runtime, we can have a stable ABI that is well-versioned, right? So extensions can be compiled out of tree against the Envoy WASM ABI.
And these are similar APIs to what Envoy supports native extensions today.
So things like, you know, here's your data, here's your headers, like things like that.
But we can do it in a stable way
across the WASM application boundary.
But better yet, the code runs in a sandbox.
You can write, quote, safe C++, right?
Which won't escape that sandbox.
You can write your extension in Rust.
You can write it in TypeScript or in Minigo
or like whatever you want.
So it's an incredibly powerful way to solve lots of different problems. And we're hoping to have Wasm support
upstream. It's actually, the MVP is implemented. We're hoping to have it upstream in Envoy
probably within the next month or two. And then my hope is that in the next couple of years,
really, we're going to move away from extensions being written in what I would call native C++,
just because it doesn't make a lot of sense anymore, because it doesn't have those properties
of safety, of stable APIs, etc. And I would expect an entire ecosystem, you know, kind of like Docker Hub, right, to form
for Envoy extensions, where people can have extensions that we can test out of ban, we can
sign, like we can do all these things. And then when you run Envoy, you know, in on prem or in a
cloud environment, whether it be an edge proxy, or a service mesh proxy, or things like that,
now we can go to this marketplace of extensions,
and those extensions might be around protocols or observability or security or things like that.
So it's a very exciting time, and I think we're at very early days, right?
I mean, I would call this alpha or pre-alpha, but I believe that this is the
future for sure. Cool. Very cool. Okay. Jason, do you have any other questions you want to ask?
So I was a little bit curious more about the history of this project. So it's owned by Lyft,
but did you start it before you started at Lyft? No, I started it right when I started at Lyft.
So I started at Lyft just about five years ago.
So it'll be five years and a couple of days.
And I started on it very shortly after I joined.
So it sounds like you're fortunate enough
to be paid full-time to work on an open-source project.
Yeah, so I split my time.
It's not quite full-time. I lead my team at Lyft.
And then I spend about 50% to 60% of my time working with the industry.
And so, yes, I am very fortunate.
It would be a whole other show to talk about the trials and tribulations of open source.
But it's a double-edged sword, right?
This has been the success of Envoy is a once-in-a-lifetime type thing.
And it's been a truly fantastic experience.
It's also been very tiring.
So, yeah.
Okay.
Well, Matt, it's been great having you on the show today. Thank you for
filling our listeners in on the
cloud-native world. Thank you so
much, and I would say to folks that
if you're interested in learning more about
Envoy, we've got our website
at envoyproxy.io.
We are on GitHub.
Very welcoming community, always
looking for people to help out very cool
thank you thank you so much thanks so much for listening in as we chat about c++ we'd love to
hear what you think of the podcast please let us know if we're discussing the stuff you're interested
in or if you have a suggestion for a topic we'd love to hear about that too you can email all
your thoughts to feedback at cppcast.com we We'd also appreciate if you can like CppCast on
Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at
Lefticus on Twitter. We'd also like to thank all our patrons who help support the show through
Patreon. If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast.
And of course, you can find all that info and the show notes on the podcast website
at cppcast dot com.
Theme music
for this episode was provided by podcast