CppCast - Envoy Proxy

Episode Date: April 30, 2020

Rob and Jason are joined by Matt Klein from Lyft. They first discuss an update to Microsoft's Guidelines Support Library with changes to span. Then they talk to Matt Klein who dicusses Envoy Proxy and... how it's used in Cloud Native applications. News Tweet re: SPMD Lambdas CppCon 2020 Call for Submissions GSL 3.0.0 Release Links Envoy Proxy Envoy Proxy on GitHub Sponsors PVS-Studio. Write #cppcast in the message field on the download page and get one month license Read the article "Checking the GCC 10 Compiler with PVS-Studio" covering 10 heroically found errors despite the great number of macros in the GCC code. Use code JetBrainsForCppCast during checkout at JetBrains.com for a 25% discount

Transcript
Discussion (0)
Starting point is 00:00:00 Episode 245 of CppCast with've got CLion, an intelligent IDE, and ReSharper C++, a smart extension for Visual Studio. Exclusively for CppCast, JetBrains is offering a 25% discount on yearly individual licenses on both of these C++ tools, which applies to new purchases and renewals alike.
Starting point is 00:00:40 Use the coupon code JetBrains for CppCast during checkout at JetBrains.com to take advantage of this deal. In this episode, we discuss an update to Microsoft's Guidelines Support Library. Then we talk to Matt Klein from Lyft. Matt talks to us about Envoy Proxy and how it's used in cloud-native applications. Welcome to episode 245 of CppCast, the first podcast for C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner. Jason, how are you doing today?
Starting point is 00:01:50 I'm all right, Rob. How are you doing? I'm doing fine. I did some work over the past few days on my deck, which I took apart like a year ago today, and now we finally have it stained, so it's pretty much all done now. Took it apart a year ago. I know. It's been a long time but pretty much done now it only took me like three years to finish my basement once i started yeah some of these projects can be pretty uh time intensive anything you want to share no at the moment largely just
Starting point is 00:02:19 waiting to see you know what conferences are gonna go through really, really. Okay, well, on that note, at the top of our episode, I'd like to read a piece of feedback. This week, we got an email from Jeff Troll. He says, a pretty credible person on Twitter is making a fairly strong claim about Lambda Overhead, and I wanted to bring it to your attention in case you might want to discuss it on your next show. Either way, all the best to the two of you. See you at the next conference, whenever that may be. And yeah, so he sent the tweet, which was from Richard, saying, I implemented more SPMD control flow using macros versus the original CPP SPMD lambdas. I think I've got all the key stuff working um and i guess yeah he's finding some performance
Starting point is 00:03:06 differences did you take a look at this at all jason i did and it has absolutely no details that i can dig into in any way at all it's a screenshot of some macro calls right and nowhere else in any of the thread is there links to the c to the lambda version for there to be a comparison. But macros are macros. They're not function calls. And lambdas are function calls. So the idea that the compiler is going to do something different with them is virtually guaranteed. I wish that I had had more details so that I could have actually played with it to see if it depended on compiler optimization level or what the differences were. Yeah, it'd be nice if the author of these tweets would write an actual blog post to do a comparison instead of just a tweet.
Starting point is 00:03:55 Okay. Well, we'd love to hear your thoughts about the show. You can always reach out to us on Facebook, Twitter, or email us at feedback at speakcast.com. And don't forget to leave us a review on iTunes or subscribe on YouTube. Joining us today is Matt Klein. Matt is a software engineer at Lyft and the creator of Envoy. He has been working on operating systems, virtualization, distributed systems, networking, and making systems easy to operate for nearly 20 years across a variety of companies. Some highlights include leading the development of Twitter's L7 edge proxy and working on high-performance computing and networking in Amazon's EC2. Matt, welcome to the show. Thank you so much for having me.
Starting point is 00:04:30 So I'm kind of curious, have you been actually directly involved in operating system development then? Yeah, I actually started my career back at Microsoft, and I originally, this is dating myself a lot, but I worked on Windows Mobile, which was pre-iPhone, pre-Windows phone. And that was based on Windows CE. So I actually started my career doing operating system stuff around Windows CE. And then from there, I actually worked on HD DVD, which was the competitor to Blu-ray. So I'm also dating myself. This is pre-streaming. And HDDVD failed as a standard, but I did operating stuff, operating system and embedded stuff for HDDVD.
Starting point is 00:05:19 And then from there, I worked on Windows NT. I spent a bunch of stuff in my early career working on concurrency. So I built a user mode threading system for Windows NT back in the Windows 7 timeframe. And then since then, I've mostly switched to Linux. So when I was at Amazon, I did a fair amount of stuff on Zen, kind of low-level networking systems, hypervisor operating system type stuff. And then since then, I've actually progressively moved up the stack. So lately, I've been working more on Layer 7 networking systems. So Rob actually used to do development on Windows Mobile.
Starting point is 00:06:04 And so I'm curious, is there anything that you two need to work out? Are there any apologies that need to be made or anything? You know, this is so long ago now that I have these vague memories of comm and just being super scarred by that. But, you know, we're now going back like 15, almost 20 years. So I think I've put most of that out of my mind. I was starved by having to write my own com objects in C++. I originally came from Visual Basic, where everything's a com object.
Starting point is 00:06:36 And then like, wait a minute, this takes so much work in C++. It's funny how I actually haven't thought about this in a very long time, but now it's all coming back. You know, you have to make this object, and then there's this grid, and you have to, like, put it in the registry somewhere, and there's, you know, 17 layers of indirection until you get your thing created. It's extremely horrifying.
Starting point is 00:07:04 I just had someone, I asked a question on Twitter, I don't even remember what the question was now, but someone said, oh, I was dealing with DLL nonsense. And they're like, well, maybe you need to run reg server. And I'm like, whoa, no. No, I haven't needed to run that. This is
Starting point is 00:07:20 a blast from the past, wow. Fortunately, that was not actually the answer. Right. Yeah. Yeah. All right, Matt. Well, we got a couple of news articles to discuss.
Starting point is 00:07:31 Feel free to comment on any of these. Sure. We'll start talking more about Envoy Proxy. Sounds good. So this first one is CPB Con 2020, Call for Speakers. And I think we mentioned just last week that meeting c++ uh is on and cpp con 2020 is also starting to make preparations uh you know there's always a chance that if you know the situation is not great come september that this conference will not actually happen but you need
Starting point is 00:08:00 to kind of plan for it and and hope that it will be able to happen and make the call for speakers and everything. Right, Jason? Yep. June 5th. June 5th. June 5th is the call for speakers deadline. Yep. Awesome. Right. And then that will be sent out July 13th. And the actual planned date for the conference is September 13th to 18th. And I am giving a class at this conference, although I don't know if it's actually been officially announced now that I say that out loud. Pre-con or post-con? Pre was what I submitted. They don't have any of the 2020 news up yet. So I know that John did mention it when I was on CPP chat the other week. So it is at least mostly official, even if it's not on the website yet.
Starting point is 00:08:47 Uh, but they don't have any call for, you know, people can't register anything yet. Anyhow. Right. Been thinking through the probabilities with something like this, either,
Starting point is 00:08:54 uh, let's see possibilities. Uh, the world returns to normal. We all forget this happened and we have a full conference by September. Um, the conference is allowed to happen, but few people show up, relatively speaking, because they're nervous, or it's moved online and or canceled. That's what
Starting point is 00:09:16 I'm thinking about for the next few conferences that I have up. No idea what's going to happen with any of them. Yeah, it seems super hard for me to believe that there'll be many conferences this year. Yeah. Yeah, I mean, at least in the C++ world, basically everything over the summer months up until like August was canceled. Or moved online. Or moved online, but yeah. Yeah, that's been the same in the cloud-native space. And, you know, I just personally can't believe even that the fall-winter conferences are going to happen.
Starting point is 00:09:46 It seems not very likely that we're going to be able to get 5,000-10,000 people to come to one place, which is sad, but that just seems like the reality that we're in right now. C++ conferences, the biggest C++ conference brings about 1,500 people or something now, Rob? Yeah. So we're still relatively small fish, but... Still a big crowd. Yeah. Okay, next article we have is the GSL 3.0 release.
Starting point is 00:10:15 This is on Microsoft's Visual C++ blog. We haven't talked about GSL in a long time, Jason, I feel like. But yeah, the C++ core guidelines support library is still out there, still being updated. A lot of the changes in this one seem to be about span to bring it in line with changes to span in C++ 20. Yes. You have something you wanted to add to that, Jason? Oh, it's just this, you know, like three year long argument about whether or not size should be signed or unsigned in standard containers unsigned one and so you know the gsl microsoft implementation started out with size being
Starting point is 00:10:53 signed which caused then people who actually trying to use it having to do cast everywhere to avoid errors to avoid warnings but now they got in line with the c++ 20 span which has unsigned size t for its size value uh it's just you know but for people who want the signed representations we've got s size now coming throughout the standard so it's there for you if you want it but it made me the reason i laughed is because the only comment on here is, oh, my goodness, the bliss changing to signed is unsigned is going to save us so much code. It's the only response to the release announcement. It's funny. Okay, well, Matt, could we start off by you giving us a kind of overview of what exactly Envoy Proxy is? Sure, of course.
Starting point is 00:11:49 And I'm assuming, I should assume for the listeners of this podcast, that people have very little to no context, right? So I should start from scratch, basically. Assume that your listener understands C++ and nothing else. I'm sure there are some out there who know what it is and maybe are users, but the vast majority are probably not familiar with it. Yep, sounds good. So Envoy at a super high level is a network proxy.
Starting point is 00:12:16 It's an extensible network proxy. So the projects that it would be most similar to that people would likely have heard of would be projects like HAProxy and GeneX. Those are also obviously Layer 7 network proxies. Envoy was created by me. I started working on it about five years ago at Lyft. It was open sourced about three and a half years ago.
Starting point is 00:12:43 It's become very popular in what we call the cloud native space right now. So, you know, those are applications like Lyft, like, you know, Slack, you know, things like that, that, you know, grew up essentially running in the cloud. So these are applications that are built on microservices. They're highly scalable, so they scale up and scale down. And so there's a whole number of patterns that have been built up around making these microservice architectures work, right? So for those of your listeners that are not super familiar, you know, that's the context by which Envoy came from. So for me, sorry. If you don't mind, sorry, before we move on, in the seven-layer burrito model, what is layer seven?
Starting point is 00:13:37 Layer seven is the application layer. But, you know, but even there, it gets extremely confusing, right? Because the layers that most people would know about would be layer three, which is IP, right? Okay. And then layer four is TCP or UDP. So those are the layers that most people will be familiar with at the operating system layer. Okay. And then where it gets a little hazy and a little confusing is that you have protocols like TLS, you know, transport layer security.
Starting point is 00:14:12 Some people call that somewhere between layer five and layer six, right? And then you have protocols like HTTP, right? Is HTTP layer seven or is it below layer seven? Because then you have application stuff on top. You know, I think there's probably technical definitions of where all of these sit. I think for my purposes, those technical definitions don't matter that much, right? In the sense that we have, you know, TCP, IP, UDP. On top of that, we have very common application layer transport protocols like HTTP. And then on top of that, people are building
Starting point is 00:14:54 microservices, right? So, you know, they're building services where their APIs might be developed using REST. They might be using IDL systems like Protobuf. And, right, so, you know, they're layering on top of these protocols. And at the end of the day, they end up building applications like Lyft. And, you know, for most of these modern, you know, quote, cloud native applications, there's like a reference architecture where, you know, you have your phone typically, there's an app that runs on your phone, you're going to talk from your phone via some protocol to a set of edge load balancers that sit at the edge of your application. And then those load balancers, you know, might terminate something like api.lyft.com, and then they're
Starting point is 00:15:43 going to fan out that traffic to a whole set of backend services. So, you know, services like Lyft or services like Facebook, they're typically composed of hundreds, if not thousands of different backend services, right? So what we see in this space where, again, it's these systems are scaling up and scaling down. They're highly lossy, right? It's like there's failures happening all the time. There are very difficult problems in this space, primarily around networking. So how do I find all the back ends that I actually have to talk to? How do I load balance between them?
Starting point is 00:16:19 And then the general topic of observability being things like, how do I get my stats from the system? How do I get my logs? How do I get my metrics? It's like, how do I build my dashboards? How do I operate this system? So for a very long time, people have been using these great projects like HAProxy and like NGINX. And again, these are fantastic proxies. These were proxies, though, that were built in a slightly different time.
Starting point is 00:16:51 They were built in a time when architectures were not as dynamic as they are now, right? You know, there weren't virtual machines coming up and coming down, and now we have systems like Kubernetes, right, where we've moved away from virtual machines to now we're thinking mostly around containers. And these containers come up and come down, right? So we have a lot of failure, we have a lot of
Starting point is 00:17:14 auto scaling, we have, you know, very sophisticated load balancing that actually has to happen. So that's the background context. So for history, I worked at Twitter prior to Lyft. And at Twitter, I built Twitter's front proxy. And that's the gateway that basically accepts all traffic and then it fans it out to Twitter's backend system. And that was also written in C++. And that system was never open sourced. And amazingly, that system to this day, as far as I know, is still serving all of Twitter's
Starting point is 00:17:56 traffic. And that was first deployed in the run up to the 2014 World Cup. So in the 2013-2014 timeframe. So when I was at Twitter, I gained a lot of experience of building these edge systems. I also had a lot of experience with the way Twitter did what we call service-to-service traffic. So within these systems, there's two main types of traffic. There's the edge traffic, so there's the internet
Starting point is 00:18:26 traffic, right, that comes in from your web browser or your phone. And then when all these services talk to each other, you know, it's also obviously networking, but it's what we would call service to service traffic. So this is like your user service calling your payment service or something like that, right? But you have many of the same problems. You know, you have what we call service discovery. You have to figure out where your services are, what their IP addresses are, things like that. You have to do load balancing. You have to do stats. You might be doing security, RBAC, et cetera, et cetera, et cetera. And, you know, what we found at Twitter is we had these two separate systems, right?
Starting point is 00:19:05 It's like we had this edge system that was doing edge networking, and then we had a separate set of libraries that were doing service-to-service networking. And one of the other trends that we've seen within the industry over the last, I'd say, five to ten years is we've moved from a world in which most systems are built in a single language. So if you look back 10, 15 years, you'd have a lot of C++, you'd have a lot of Java, to now, you look at a lot of these modern applications, and just the trends are either because of what people do, or because we've acquired companies, you know, there's applications that are now written in six, seven, eight different languages, right?
Starting point is 00:19:44 It's like you see Go, C++, Java, JavaScript, like Rust. I mean, the list goes on and on and on. And a major problem that we've seen over the last five or ten years is that you have all these common concerns, right? Again, you have service discovery, load balancing, observability. So you really have two ways that you can solve this. You can build a library that runs in every single language, or you can build a proxy that we would call it a sidecar proxy or a client-side proxy that tries to wrap a bunch of these use cases into one piece of code,
Starting point is 00:20:27 and then we don't have to implement it over and over and over again in each language. So Envoy was really born out of a couple of realizations. One of them is that Lyft is having all the same problems that Twitter had in terms of microservice architecture, having trouble around networking, around observability. But also this realization that we live in this polyglot world now where people program their microservices in lots of different languages, but we have all these common concerns. So what if we could build, you know, one proxy, right, one piece of code that's highly performance, you know, and very extensible? And what if we could use this in all the places? What if we could use it on the edge? What if we could use it as a sidecar or a client side proxy? You know, wouldn't that be great from an operational standpoint? Wouldn't that be great from a developer efficiency standpoint? So that's where we set out.
Starting point is 00:21:29 And we started Envoy five years ago with the goal of building a single proxy. And I can go into how it's different from the systems that people might know about. But we built one code base that we can use in all these different places, and then it works in the Polyglot architecture, and it allows us to handle both edge load balancing, service discovery, client-side load balancing, service discovery, et cetera. So I've just been talking a lot.
Starting point is 00:22:00 So I will stop there and just see if that gives you some good background context before I go into the details a bit more. Yeah, definitely. So I guess one of the questions I have to start off with is you're talking about all these different languages, you know, polyglot world. So does that mean you have to have, you know, multiple bindings for Envoy proxy, if that's all written in C++? Right. So what we don't, so there's So there's a multi-layer answer to that question. Historically, the answer is no, there were no bindings. We would effectively build thin clients, right? So at Lyft, we had like an Envoy client written in Python, which is a couple of hundred lines, similar for Go, PHP, etc.
Starting point is 00:22:47 And these are thin clients that knew how to reach Envoy on a particular port, but it would work like a normal network proxy. It's like you would connect to Envoy on localhost, 8080, and then Envoy would go and do its thing. So from the perspective of the application, you would connect to this proxy on localhost. The proxy would do this magic. It would find all the backends. It would do the load balancing. It would send the request. It would get the response.
Starting point is 00:23:18 And the application would have a transparent network, meaning it would have an abstracted network where the application wouldn't know about the larger network. It would just know logically, I need to reach this other service. I'm going to let Envoy do that. And in the last few years, at least in the cloud-native space, there's this very popular buzz phrase,
Starting point is 00:23:44 which we call service mesh. And what we're talking about, basically, is the service mesh pattern. So the service mesh pattern is that you have an application, you use some type of sidecar element, whether that be a library or a proxy, and that proxy is going to abstract the network and it's going to do a bunch of this plumbing for you.
Starting point is 00:24:05 Now, the reason that I say that it's complicated, and we can talk about this later, is that in the last year, we've been building what we're calling Envoy Mobile. And Envoy Mobile is Envoy embedded directly as a library into an iOS and an Android application. And we have shipped that now, starting to ship that to production at Lyft.
Starting point is 00:24:28 So the reason that I said that it's complicated is that Envoy started its history as a true separate process network proxy, very similar to HAProxy or Nginx. But now Envoy is continuing its life also as an embedded library. But again, 90 to 99% of the code is the same. And that's, I think, what makes it so powerful
Starting point is 00:24:56 is that we can build this common code base, which has a lot of contributors, a lot of eyes on it, and we can make sure that we get it right once, and then we can use this code base in a variety of different places. Okay, I find myself definitely slowly catching up here. Sorry. I don't work in a corporate environment,
Starting point is 00:25:20 so I never have to deal with proxies. I don't do any kind of distributed really work these days. So again, I'm not, I'm like, the words that you use are words that I've heard before. The last proxy that I actually configured and used was a SOX proxy, which sounds different. I mean, it's different, yes, but it's all solving similar problems. And there's a joke in the distributed systems world, which is that any problem can be solved with another layer of proxies. So from a distributed system standpoint, proxy is a very popular pattern, mostly because it allows a separation of concerns. So it, you know, kind of like the microservice architecture,
Starting point is 00:26:11 theoretically allows a separation of concerns. It allows teams to operate independently across some API. A proxy also, you know, can, firewall is probably not the best word, but it can be a bulkhead, essentially, between a bunch of fairly sophisticated functionality where one side doesn't necessarily need to be aware of what's happening on the other side. Envoy is doing a huge amount of functionality but the entity that's calling through Envoy doesn't necessarily know all the things that Envoy is doing on its behalf. So I might, whatever, have an app that needs to talk to a thing
Starting point is 00:26:56 on the other side of the proxy. I say, hey proxy, I need to perform this action and the proxy is going to say, okay, whatever, transaction server number 13, that's the one that is going to say, okay, you know, whatever transaction server number 13, that's the one that's going to take that workload. That is, that is just one use case. And I think, I think Envoy has become extremely popular in a short period of time for a couple of different reasons. But one of the reasons that Envoy has become extremely popular is it's very, very extensible. So though many modern applications use HTTP,
Starting point is 00:27:33 Envoy at this point supports Redis, it supports Redis Cluster, it supports Postgres, it supports MySQL. I mean, it supports all these different protocols. So you can think of Envoy at its core as a layer three, layer four proxy. And by layer three, layer four proxy, I mean that in the IP sense, right? IPTCP sense. And Envoy at its core, you know, it has a bunch of basic functionality for finding backends, doing load balancing, building filters that operate on bytes, right?
Starting point is 00:28:08 So it's like bytes come in, bytes can get filtered, bytes get written out to some backend. Then on top of that, we've built all of these filters and extensions that do things like offer security, offer RBAC, offer rate limiting, build protocol support for things like HTTP, Redis, Postgres, MySQL. And then on top of that, there's more extensibility. So within the HTTP subsystem, we have a further set of what we call filter chains, where you can build filters that operate not on bytes, but on HTTP concepts like headers and body and things like that so that you can then build HTTP-focused rate limiting or RBAC or anything else. And at this point, we have a very rich set of extensions around Envoy, and we're currently taking that even to the next level where we have extensions now that you can build using wasm so i mean this is just
Starting point is 00:29:06 absolutely mind-blowing where the future of envoy extensibility is actually going to be through wasm so this is already implemented uh it's already uh in use in a project called istio which uses envoy um but envoy internally loads the v8 WASM runtime, and then it runs an extension model where now you can compile your extensions in Rust or C++ or TypeScript or whatever language compiles down to WASM. And then Envoy runs that code, you know, in a WASM VM, so it's sandboxed, and we can get some element of code safety there. And it's a pretty fantastic model. So there's just so much to unpack here, but it's a pretty exciting time and a pretty exciting system.
Starting point is 00:30:00 If you don't mind, I want to try to just latch on one little thing here. So in previous lives, I had done a fair amount of database work. And when you said that it can act as a proxy to Postgres, I'm like, what would that do? What does that look like? What does that mean to me? Yeah. So for most of the database stuff that we have today, and I would include Kafka there also. So for things like MySQL, Kafka, Postgres, most of what is doing today is around stats. And we also have this
Starting point is 00:30:32 for MongoDB. So actually, we support, I'm trying to think in terms of pseudo databases like Kafka, MongoDB, DynamoDB, Postgres, MySQL, there might be others. What most people are doing with this right now is actually protocol parsing and stats. Because what you find in a lot of these managed systems, or a lot of these high availability, you know, quote, cloud native systems, is, again, you have this problem where people are talking to their databases in seven different languages. The client libraries are of variable quality. They have different types of logging, different types of stats. Or maybe you're running your MongoDB or your Postgres or your MySQL.
Starting point is 00:31:17 You're not actually running that database. You're using a cloud service, right? And maybe the cloud service doesn't give you all the observability to understand what's going on. So for a lot of the database protocols, people just want consistent stats from the clients on, you know, what operations are being performed, what is the latency of those operations. And the ability to get this consistently across your clients who may be talking to these databases in different languages is extremely powerful, right? Because now, you know, you don't necessarily have to think about, oh, like, was there a difference or a bug in the Python driver for that database versus the Go one, right? It's like you can have some level of confidence that the stats and the information that you're getting from the proxy is consistent across. And from a real-time operation standpoint, that's super powerful. So it's almost like a Wireshark window onto
Starting point is 00:32:18 what your whole thing is doing. It is exactly a Wireshark window. And in fact, Envoy has some filters that perform tapping of various types and can actually produce PCAPs. And it's even more powerful than a raw TCP dump, right? Because in a lot of modern architectures, you know, you're doing what we call zero trust networking. So you're doing TLS between every hop, so all the data is actually encrypted. So that's great, potentially, from a security perspective, it's horrendous from an operations and a debugging perspective, right? It's like, you know, if you're debugging an issue, and you're trying to figure out what's going on, it's like if you can't see the traffic,
Starting point is 00:33:08 you know, it's very difficult. And not to mention that modern protocols are typically not text-based anymore. They're all typically binary. So, you know, even with Wireshark or a sophisticated plugin, you know, you have to have the right protocol parser and like a bunch of other stuff.
Starting point is 00:33:31 With Envoy already doing all the protocol parsing, Envoy can, if told to, can filter and match and spit out data, you know, on various types of parameters. So, you know, I like to think of Envoy as a, you know, like as a network operating system. It's a platform by which you can plug in, you can almost think of them as programs, right? You can plug in extensibility features that allow you to operate modern networks. Okay. So performance is absolutely critical. Performance is important, yeah.
Starting point is 00:34:02 And that's why I think it's funny. It obviously won't be to your listeners, but I think a lot of people in the cloud-native space or in the modern application space have moved beyond C++ now. So, you know, Go is obviously very popular. People are still using Java. Obviously, there's Node.js. There's lots of other higher-layer languages. So I think in the ecosystem that I typically operate in, there's a lot of skepticism around C++, right? I mean, there's a lot of people that say, oh, it's like, why did you write this thing in C++?
Starting point is 00:34:43 And, you know, look, like, if I were starting Envoy today, would I have written Envoy in C++ versus something like Rust? No, that's actually probably debatable. Like, I might have considered using Rust. But when I started Envoy five years ago, you know, I tend to be a very late adopter. So I've worked in C++ my entire career, you know, I'm very, very well aware of the robustness of the ecosystem. So for me at the time, five years ago, C++ was kind of a no brainer, right? I mean, it was the only platform at that time that I felt could deliver the performance, and performance not just in throughput, but also in tail latency, right?
Starting point is 00:35:29 It's like, we don't want garbage collection, we want, you know, a very consistent tail latency, because again, from an operations perspective, if you're looking at your proxy to give you these observability stats, and you're trying to get stats at P99 for your application calls, and these application calls might be one millisecond or even measured in microseconds. If your proxy is having garbage collection pauses and is having this behavior that is hard to understand,
Starting point is 00:35:59 it's like if you can't trust the thing that's doing the measuring, that's not a very good thing to build on. So that's why C++ was chosen at the time. Okay. Makes sense. I feel like I have an idea what you do now. It's fun, actually, to talk about this with a set of people or a set of listeners that are not very steeped in this, just because, you know, it's like, I talk about this all the time, but I typically talk about it with people,
Starting point is 00:36:32 you know, who are a little more familiar with building these types of applications. So it's a, it's a fun challenge for me to, you know, to, to, to figure out how to, how to speak about it in a, in a, a more ground up way. Right. Sure. Yeah. I mean, I've worked with people who specialize in, you know, to figure out how to speak about it in a more ground-up way. Right. Sure. Yeah, I mean, I've worked with people who specialize in, you know, making hardware that can do live network sniffing. Like, it's considered expensive, I guess. High-performance, critical kinds of things. Well, and that's why, you know, you talked about perf. And perf is very important, but I think from Envoy's perspective is that from a project, we're not crazy about perf to the extent that we don't sit and do a huge amount of micro-optimization.
Starting point is 00:37:23 There are certainly hot paths that people have heavily optimized, and we write micro-benchmarks and all of those things, and we have a set of very good engineers. But a lot of what we do is we're always trying to balance performance with developer velocity and developer productivity, and also, frankly, security, right? right because envoy is used as an edge proxy and um you know we we live in a very scary world now security wise and it's a really interesting thing these days just to figure out what is the right balance of perf developer velocity security etc um but you know again that's where being able
Starting point is 00:38:09 to build on a lot of the tooling that we have in the native code ecosystem you know it gives us a lot of confidence right so it's like we're a big user of clang we use all the fuzzers we use all the sanitizers um so just like being able to build on this 20 or 30 year history of all of this tooling, it's not that other languages don't have these things, but there's so much effort put into the C and C++ ecosystem in this area, you know, that it's a lot of stuff that we can build on, which is great. I want to interrupt the discussion for just a moment to bring you a word from our sponsor, PVS Studio.
Starting point is 00:38:49 The company behind the PVS Studio Static Code Analyzer, which has proven itself in the search for errors, typos, and potential vulnerabilities. The tool supports the analysis of C, C++, C Sharp, and Java code. The PVS Studio Analyzer is not only about diagnostic rules, but also about integration with such systems as SonarCube, Platform.io, Azure DevOps, Travis CI, CircleCI, GitLab CI, CD, Jenkins, Visual Studio, and more. However, the issue still remains,
Starting point is 00:39:17 what can the analyzer do as compared to compilers? Therefore, the PVS Studio team occasionally checks compilers and writes notes about errors found in them. Recently, another article of this type was posted about checking the GCC 10 compiler. You can check out the link in the description of the podcast. Also, follow the link to the PVS Studio download page. When requesting a license, write the hashtag CppCast and receive a trial license not for one week, but for a full month. So you said you started on Envoy about five years ago. What version of C++ is it built in?
Starting point is 00:39:49 Have you kept up to date with changes in the language? Yeah, we started on C++ 11 and then migrated to 14. We're still technically on 14, though I think we're in the process of moving to 17. We use AppSeil from Google. So, you know, we have access to most of the interesting standard library features beyond 14. So, you know, it's like there hasn't been necessarily a super compelling reason to move forward. I am personally pretty interested in coroutines from 20
Starting point is 00:40:25 but you know it's not something that I've personally been able to play with even if you adopted bleeding edge compilers today you still wouldn't really be there and that's where too there's always you know doing an open
Starting point is 00:40:41 source system like Envoy we can push the edge but we still have to be cognizant of where people run Envoy, what compilers they have available. So it's not like we can do whatever we want. And in fact, now that we are doing Envoy Mobile and we actually have to work with the Android compiler and the iOS Xcode compiler, it limits us a little more just because those compilers and toolchains tend to be a little bit far behind. But I think my experience is that
Starting point is 00:41:17 by using a library set like AppSeal, we get most of the benefit without having to be super bleeding edge. And from a project perspective, I tend to take the mindset from C++ that I actually like C++ a lot, but I also think that C++ is a language where it's easy to do the wrong thing, right? So it's like from an Envoy coding standard perspective, we tend to use, I would say, a very simplified version of C++, right? It's like we tend to try to shy away from templates
Starting point is 00:41:56 unless there's a very good reason to do so. And, you know, so it's like we attempt to keep things relatively simple just so that it's easy to grok and not get too far into the weeds. And that has served us well so far. I guess, could you tell us about some of the unique C++ challenges that you faced when developing Envoy? I'm trying to think. From a language perspective, there hasn't been a ton. I've been doing C++ development for a super long time. We now have a ton of very experienced C++ developers and people that come from very rich C++ ecosystems. So I wouldn't say that there's been any particular challenge.
Starting point is 00:42:50 What I will say is that I think build, build in CI, to be honest, has probably been our biggest challenge. We were a very early adopter of Bazel. That is Google's build system. And I'm not going to lie, as we probably adopted too early in the beginning, it was very painful. I will say that now having used Bazel for years and seeing Bazel develop, it's very hard for me to actually imagine
Starting point is 00:43:24 doing a C++ project without Bazel. Like when Bazel works, it works really well. And again, we're lucky because we have people that work at Google. We have a good relationship with the Bazel team. So we're not really on our own. It's like if we have a problem, we have people that are helping us. So I still, you know, I still think Bazel is probably a bit too complicatedparty dependencies and the way that it handles its library resolution and things like that. From projects in the past, I have all these horrific memories of sitting there with CMake and reordering the link library's order for an hour trying to get it to link properly because the order's not right or whatever.
Starting point is 00:44:25 You just don't have that problem with Bazel. Like Bazel just figures it out. So from that perspective, I think Bazel has been good. We've definitely run into some edge cases just around having the right tooling to do code coverage and like have code coverage merging and differences between GCC and Clang and all of the typical stuff.
Starting point is 00:44:49 So, yeah. Is Bazel one of the ones that automatically does distributed builds as well? Yes, and that is really, really amazing. So for Envoy CI, we actually use a Google-hosted service called RBE, which is Remote Build Execution. And we run our primary CI on Azure Pipelines. And Bazel farms out our build
Starting point is 00:45:12 to 50 or 100 backend computers, compiles all the binaries, links them, runs the tests. And that's kind of what I'm saying, is that when it works, it is absolutely mind-blowingly magic. It's also do object caching? Yep. It does caching.
Starting point is 00:45:29 I mean, and so this is basically Google's internal build system that they've been slowly, effectively open sourcing. And it's still early days, but there have been times where I can access our RBE cluster from my local laptop. So there are times when if I'm doing a big build or a big CI run and I don't want to hammer my laptop and do like 45 minutes of compiling and testing, I just point it at our RBE cluster, and literally it just goes off, runs all the builds, does all the caching, minimizes network transfer, runs the tests, and gives me back the results. It is absolute magic.
Starting point is 00:46:11 So on the topic of open sourcing, we haven't mentioned this yet. Envoy is open source. Yes, Envoy is open source. It was open sourced about three and a half years ago. It is in a foundation called the CNCF, the Cloud Native Computing Foundation. It's the same foundation that hosts Kubernetes, which is a project that many people will likely have heard of. And yeah, we have had a lot of success in a very short period of time. It's been a fantastic run.
Starting point is 00:46:46 Envoy is now used by pretty much all of the major cloud providers who are offering products. It's used by many major internet companies that you would have heard of. It's used by products that have built
Starting point is 00:47:02 on top of Envoy. So we've built a really incredible ecosystem. It's been quite amazing. I wanted to ask more about the Wasm support you mentioned earlier. We've talked about WebAssembly a couple times, but the thing you mentioned about doing plugins via Wasm sounded interesting. Could you talk a little more about that?
Starting point is 00:47:27 Yeah, so Envoy has used static extensions for quite some time. And again, it's a fairly stereotypical C++-type extension system where we allow extensions to be bound into the binary and they auto-register. And that has served us very well. The downside of that system, obviously, is that Envoy, you know, has to be compiled with the extensions that people want to use. That's downside one. Downside two is that we don't really have a stable ABI, right? So it's like you kind of have to compile your extension with the version of Envoy that you're going to use. You know, third,
Starting point is 00:48:11 there's no extension safety, so there's no sandboxing, right? So there's a number of different problems. The biggest one, though, is probably the static compilation, because what we've seen in the industry is that Envoy has become so popular so fast, and it's now offered by a lot of cloud vendors, that many cloud vendors want to offer a service, which we would call something like bring your own Envoy container, but you want to have a way, right, where given the stock Envoy container, we can still run extensions and we can run them safely. So now we've got this fundamental problem. We've got this security problem. We've got this ABI problem, et cetera. So Google, you know, has been dealing with this problem
Starting point is 00:48:59 for quite some time and they thought, well, you know, it's like we're obviously talking about Wasm in the browser, but there's, in the last thought, well, you know, it's like, we're obviously talking about Wasm in the browser, but there's, in the last year or two, you know, there's been a lot of discussion of server-side Wasm, and Google thought, well, let's build Wasm into Envoy. And, you know, we're still early days, but it's a pretty fantastic solution to pretty much every one of the problems that I listed. If Envoy can host a V8 runtime, we can have a stable ABI that is well-versioned, right? So extensions can be compiled out of tree against the Envoy WASM ABI. And these are similar APIs to what Envoy supports native extensions today. So things like, you know, here's your data, here's your headers, like things like that.
Starting point is 00:49:46 But we can do it in a stable way across the WASM application boundary. But better yet, the code runs in a sandbox. You can write, quote, safe C++, right? Which won't escape that sandbox. You can write your extension in Rust. You can write it in TypeScript or in Minigo or like whatever you want.
Starting point is 00:50:06 So it's an incredibly powerful way to solve lots of different problems. And we're hoping to have Wasm support upstream. It's actually, the MVP is implemented. We're hoping to have it upstream in Envoy probably within the next month or two. And then my hope is that in the next couple of years, really, we're going to move away from extensions being written in what I would call native C++, just because it doesn't make a lot of sense anymore, because it doesn't have those properties of safety, of stable APIs, etc. And I would expect an entire ecosystem, you know, kind of like Docker Hub, right, to form for Envoy extensions, where people can have extensions that we can test out of ban, we can sign, like we can do all these things. And then when you run Envoy, you know, in on prem or in a
Starting point is 00:50:59 cloud environment, whether it be an edge proxy, or a service mesh proxy, or things like that, now we can go to this marketplace of extensions, and those extensions might be around protocols or observability or security or things like that. So it's a very exciting time, and I think we're at very early days, right? I mean, I would call this alpha or pre-alpha, but I believe that this is the future for sure. Cool. Very cool. Okay. Jason, do you have any other questions you want to ask? So I was a little bit curious more about the history of this project. So it's owned by Lyft, but did you start it before you started at Lyft? No, I started it right when I started at Lyft.
Starting point is 00:51:45 So I started at Lyft just about five years ago. So it'll be five years and a couple of days. And I started on it very shortly after I joined. So it sounds like you're fortunate enough to be paid full-time to work on an open-source project. Yeah, so I split my time. It's not quite full-time. I lead my team at Lyft. And then I spend about 50% to 60% of my time working with the industry.
Starting point is 00:52:32 And so, yes, I am very fortunate. It would be a whole other show to talk about the trials and tribulations of open source. But it's a double-edged sword, right? This has been the success of Envoy is a once-in-a-lifetime type thing. And it's been a truly fantastic experience. It's also been very tiring. So, yeah. Okay.
Starting point is 00:53:06 Well, Matt, it's been great having you on the show today. Thank you for filling our listeners in on the cloud-native world. Thank you so much, and I would say to folks that if you're interested in learning more about Envoy, we've got our website at envoyproxy.io. We are on GitHub.
Starting point is 00:53:22 Very welcoming community, always looking for people to help out very cool thank you thank you so much thanks so much for listening in as we chat about c++ we'd love to hear what you think of the podcast please let us know if we're discussing the stuff you're interested in or if you have a suggestion for a topic we'd love to hear about that too you can email all your thoughts to feedback at cppcast.com we We'd also appreciate if you can like CppCast on Facebook and follow CppCast on Twitter. You can also follow me at Rob W. Irving and Jason at Lefticus on Twitter. We'd also like to thank all our patrons who help support the show through
Starting point is 00:53:56 Patreon. If you'd like to support us on Patreon, you can do so at patreon.com slash cppcast. And of course, you can find all that info and the show notes on the podcast website at cppcast dot com. Theme music for this episode was provided by podcast

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.