CppCast - Distributed Computing

Episode Date: April 28, 2016

Rob and Jason are joined by Elena Sagalaeva from Microsoft's Bing Ads team to discuss Distributed Computing with C++. Elena Sagalaeva is a Russian-born professional C++ developer since 2000. S...he was primarily a game developer working both for various studios and as an indie developer. She grad uated from the industry while being a tech lead at the head of a small dev team. Elena currently lives in U.S. with her family and works at Microsoft in Bing Ads. Her current interests focus on large scale distributed systems and the development of the C++ language. She has a popular blog on C++ in Russian and she is the author of the famed C++ Lands map. News Introducing the C++ Core Guidelines Red Hat at the ISO C++ Standards Meeting pybind11: Seamless operability between C++11 and Python Elena Sagalaeva Elena Sagalaeva's Blog @alenacpp Links Nexus Wireless Silent Mouse C++11 Lands Map

Transcript
Discussion (0)
Starting point is 00:00:00 This episode of CppCast is sponsored by JetBrains, maker of excellent C++ developer tools including CLion, ReSharper for C++, and AppCode. Start your free evaluation today at jetbrains.com slash cppcast dash cpp. CppCast is also sponsored by CppCon, the annual week-long face-to-face gathering for the entire C++ community. Get your ticket now during early bird registration until July 1st. Episode 55 of CppCast with guest Elena Sagaleva recorded April 28th, 2016. In this episode, we talk about the core guidelines and Python bindings. Then we talk to Elena Sagaleva from Microsoft's Bing Ads team. Elena tells us about distributed computing with C++ developers by C++ developers. I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Starting point is 00:01:35 Jason, how are you doing today? All right, Rob. How about you? Doing pretty good. I invested in a new mouse. Whenever I edited the show, I know it picks up a lot of my mouse clicks and it was really bothering me. I was trying to filter them out, but it just didn't work.
Starting point is 00:01:52 I guess my mouse was extremely loud and it just wouldn't filter out at all. So I got this new mouse and it's supposed to be silent and I can't hear anything. I'm clicking away right now, furiously. I wish that I was a fly on the wall when you were at Best Buy,
Starting point is 00:02:08 sitting there, click, click, click, click, click, click, click, trying to pick up the mouse. There's a website that I looked at. It's called endpcnoise.com, and they recommended this mouse as being silent. Fascinating. I had no idea. I think the brand was Nexus,
Starting point is 00:02:23 and they have four or five different mouses, and this is a nice wireless one. I really can't hear it. I can still feel it. I get the tactile response of clicking, but you really can't hear it at all, which is great. That's impressive. One less thing to filter out at the end of the show. It won't annoy the listeners anymore. I'm sure it was just as annoying as your old pen clicks.
Starting point is 00:02:47 I have no idea what you're talking about. Okay. Well, at the top of our episode, I'd like to read a piece of feedback. Last week, we talked briefly about Runtime Compiled C++, that article that was just released for free.
Starting point is 00:03:04 And the author of that library, Doug Binks, must be a listener of the show because he reached out to us over Twitter and said he'd be willing to come on the podcast. So we're going to have him on in a few weeks. That'd be awesome. Yeah. So we'd love to hear your thoughts about the show as well. You can always reach out to us on Facebook, Twitter, or email us at feedback at cpcast.com. And don't forget to leave us reviews on iTunes as well. Joining us today is Elena Sagaleva.
Starting point is 00:03:31 Elena is a Russian-born professional C++ developer since 2000. She was primarily a game developer working both for various studios and as an indie developer. She graduated from the industry while being a tech lead at the head of a small dev team. Elena currently lives in the U.S. with her family and works at Microsoft and Bing Ads. Her current interest focuses on large-scale distributed systems and development of the C++ language. She has a popular blog on C++ in Russian, and she is the author of the famed C++ Landsmap. Elena, welcome to the show. Hello.
Starting point is 00:04:03 Let's spend a second talking about the C++ lands map. How did that come about? Well, that was my idea, but I cannot draw. But fortunately, I have a friend who's an artist, quite a good one. And he agreed to help me with that. But he doesn't know C++ at all. So when we were making that map, he kept laughing. I have no idea what I'm doing.
Starting point is 00:04:27 I have no idea what I'm doing. But he put all my thoughts on paper very well. And the map became popular, not only among the Russian community, but all over the world. And people are still reaching me out, still asking me to add more features c++ 14 17 etc so it's it's quite popular it's a very beautiful map i mean it looks like it is something that you would see in like you know some fantasy novel or something like lord of the rings would have it at the front of the book yeah it's that quality yeah yeah it does actually i think look like the fold outs that you get in the extended releases of lord of the rings actually yeah it's very i think look like the fold outs that you get in the extended
Starting point is 00:05:05 releases of lord of the rings actually yeah it's very nice are you so are you planning an update for 2017 plus uh no i don't think so because my friend arty jim he's very busy he owns a business now so he probably won't have time for it anymore. That's cool. Too bad. But we will put a link to this in the show notes. It is very cool to check out. Nice work. Thanks. Okay, so we had a couple news items to go over. Elena, feel free to jump in
Starting point is 00:05:36 and comment on any of these. The first one is an article from Kate Gregory, a past guest on the show, and she's talking about the C++ core guidelines, and this is on visualstudiomagazine.com. And obviously it's something we talked a little bit about, but we still really need to go more in depth on the C++ core guidelines.
Starting point is 00:05:56 This is kind of just her introduction of that to the Visual Studio Magazine readership. Yeah. It's a good article. It is a good article. I'm not sure who would be the best person to get on the show to go into this content in more depth. Herb Sutter?
Starting point is 00:06:13 Yeah, who do you guess would be great? Or Bjarne. I've reached out to Herb before. He's always busy. I was there at CPPCon when Stravstrup announced C++ Core Guided Lens, and he had a very long presentation about it. And people around me were whispering
Starting point is 00:06:30 they're trying to make Rust out of C++. Look. Because with these guidelines, C++ starts removing Rust. Yeah. Okay, so the next article comes from the Red Hat developer blog. And this is from Torvald Riegel, one of the Red Hat C++ developers. And he was talking about the proposals that he was involved in at the latest meeting in Jacksonville. And I had not heard too much about this standard synchronic proposal before.
Starting point is 00:07:03 Jason, did you read into this one a bit? I read, well, his description of it, and I had not heard of it either, but it does look interesting. Yeah. It's basically a way to block on an atomic until it has changed, right? Yeah, and he's comparing it to Linux few texts, which is a feature I'm not familiar with.
Starting point is 00:07:22 Maybe you are? I may be familiar with them because I feature I'm not familiar with. Maybe you are? I'm maybe familiar with them because I'm a little bit familiar with spin locks and what can happen in the Linux kernel, but no, I mean, not by that name. Okay. Iona, have you been following the C++17 development at all? A little bit.
Starting point is 00:07:39 I like to see more and more concurrency features being voted in. That's nice. That will make cross-platform development easier. Yeah, definitely. So I did have one question about this, and that's the fact that they need a special proposal for floating-point atomics.
Starting point is 00:07:59 And I guess I don't know enough about CPU architecture to know why that would be so different from integer atomics. Anyone have any input? I have no idea. Yeah, I don't know. We'll have to find out a little bit more information about that one. All right. Yeah.
Starting point is 00:08:18 This last item is PyBind11, which is a new lightweight header-only library for binding your C++ to your Python. And the author is describing it kind of as Boost Python without Boost. Yeah. So yeah, if you aren't already using Boost and you want to bind to some Python and you don't want to bring in all of the associated Boost libraries, then this might be a good alternative to Boost Python. I think it basically has the same features, but it's built for C++11
Starting point is 00:08:50 and doesn't require the rest of Boost, which is nice. It might be useful for game developers because it's a common situation when you have a game engine written in C++, and you script it in Python or Lua or Unreal script, and you don't want to bring the whole boost with you, so you can use some small library like this one to script your game engine. But I briefly looked into it,
Starting point is 00:09:20 and it looks like you need to define your interfaces twice. First in C++ and then to expose it for Python. Right? It's not auto-generated. It's not auto-generated. You do have to tell it what functions you want exposed. Yeah. I think it should be auto-generated
Starting point is 00:09:40 just to eliminate this boring work. But other than it, it could be useful. Yeah, it looks very clean. I need to check it out myself. Yeah, and one of the nice things about this one is it does run on all the major compilers, Clang, GCC, Visual Studio 2015,
Starting point is 00:09:57 and the Intel C++ compiler, so that's very nice. Wow. Yeah, you don't usually see the Intel one listed there. Awesome. Yeah. Okay, so elena let's start talking about distributed computing can you tell us a little bit about distributed computing and and the types of problems that are involved in distributed computing uh so i'm working with for bing ads for most of my years i I use mostly C++, sometimes C Sharp, and I work with code which runs on thousands of servers, serves tens of thousands of requests per second,
Starting point is 00:10:35 and because of it, we want our code to be fast, and C++ is very good for it, and we want our 99% percentile latencies to be low and again C++ is very good for it because it doesn't have garbage collection for example and garbage collection can affect your 99th percentile latency. We also want a low level control on our code. For example, we want our data to fit good in memory. That way we can use cache-aware algorithm.
Starting point is 00:11:13 And again, it's very good for our latencies. So you've mentioned this concept of 99 percentile latency. What exactly does that mean? I'm not familiar with that. That's the 1% of slowest requests.
Starting point is 00:11:27 You have your distributed system, a cluster, right? And you probably want to measure your latencies there and split it by percentile. You'll get 100 of those. And the 1%, the slowest 1%,
Starting point is 00:11:44 your tail latency, it's a very important metric. Because you might think about it, it's like, why 1%? Who cares? But to process a user's query, you really need more than one request. You might need 100. And that means that most of your queries will be affected by the 99% latency. Interesting. So what are some of the specific benefits of using C++ with distributed computing? So because you don't have garbage collection, you don't have this nasty latency spikes.
Starting point is 00:12:24 And I see a lot of folks using other languages like Java or C Sharp, and they always fight with language, they always fight with framework to tweak the garbage collection, to make the effect of garbage collection on latencies lower. It doesn't apply to all the distributed systems.
Starting point is 00:12:46 Sometimes you're kind of fine with your tail latencies being affected by garbage collection. But most of the time, you'd prefer to use something else, to use, for example, C++ or Rust, which don't have garbage collection at all. So does this not wanting garbage collection,
Starting point is 00:13:08 I guess, if you will, how does that affect the way you program in C++ for distributed computing? Are you sensitive to creating things on the heap? Does that matter to you? Do you aim to the stack or do you just not really care? Just the fact that you don't have garbage collection, is that what is important?
Starting point is 00:13:25 Well, you should be very accurate with allocating memory. You don't want to allocate all the time. You might want to use arena allocation, you know, when you allocate a huge chunk of memory, and then you allocate out of it smaller parts, and then you deallocate all of it at once. So that's why you save on allocation and data fragmentation. That's actually a very popular approach used in game development as well.
Starting point is 00:13:56 Or you might want to use something like boost pool when you pre-allocate your objects and then just use them out of the pool. Okay. I'd like to interrupt the discussion for just a moment to bring you a word from our sponsors. ReSharper C++ makes Visual Studio a much better IDE for C++ developers.
Starting point is 00:14:18 It provides on-the-fly code analysis, quick fixes, powerful search and navigation, smart code completion, automated refactorings, a wide variety of code generation options, and a host of other features to help increase your everyday productivity. Code refactorings for C++ help change your code safely, while context actions let you switch between alternative syntax constructs and serve as shortcuts to code generation actions. With ReSharper C++, you can instantly jump to any file, type, or type member in solution. You can search for usages of any code and get a clear view of all found usages with grouping and preview options. Visit jb.gg slash cppcast-rcpp to learn more and download your free 30-day evaluation.
Starting point is 00:15:09 Are there any newer features in C++ 11 or 14 that have been particularly important to distributed computing? First of all, the language became much better, in my opinion, overall. So we're using a lot of features of C++ 11. But the most important one for me are move semantics, of course, because you don't need to copy, you know, can move. And memory fences.
Starting point is 00:15:34 Because we work with log-free stuff a lot. Log-free algorithm, log-free data structures, and thanks to memory fences, developers can develop their log-free data structures and thanks to memory fences, developers can develop their log-free stuff easier.
Starting point is 00:15:50 I do not write log-free, low-level algorithm or data structure. I just use them as they are written by somebody else. But I see a lot more and more projects being open source,
Starting point is 00:16:06 using C++11 memory fences, and they look good. And I have more choice now. There's a couple of videos from last year's C++ Now that was about lock-free data structure development. And it's a mind warp. It messes with your head. True.
Starting point is 00:16:27 That's why I prefer to use somebody else's libraries and not write it myself. Right. Is template metaprogramming a big part of distributed computing? I know, doesn't Facebook do several C++ CppCon videos about how much they use template-aimed programming? I'm assuming they use a lot of distributed computing scenarios.
Starting point is 00:16:50 And it's important to kind of reduce the power requirements of all those servers that are running your code. I can't say it's a very important part. Some teams use it, some don't. I use it from time to time. We use variadic templates, but not too much, because, first of all, your code becomes less readable because of it, and it's harder to debug.
Starting point is 00:17:18 But sometimes it's a good solution. What kind of inter-process communication do you guys use for talking between your server processes? Okay, today I cannot talk with you on behalf of Microsoft, so I can talk overall.
Starting point is 00:17:36 But I prefer RPC, remote procedure call, mostly because I was specializing on distributed computing way back in the university, and that's what I started with, RPC. But there are plenty of other stuff like MPI, OpenMP.
Starting point is 00:17:56 Okay. We talked a little bit about in your bio how you've done game development in the past. How has that experience kind of shaped you as you move from game development into distributed computing with Microsoft? Performance is important both in games and in distributed systems. I've already talked to you about
Starting point is 00:18:19 original allocation, which is used in games. For example, when you start a new level, you can allocate a huge chunk of memory for your objects, static objects, enemies, etc. When they've been removed from the scene, like killed or something like this, you just do not draw them, but they're still in memory. And when you deallocate your level,
Starting point is 00:18:43 you deallocate everything., you deallocate everything, and that's why you are not dealing with memory leaks, you don't have memory fragmentation, so it's a very useful approach. And that's used in distributed systems as well. Performance
Starting point is 00:18:58 measurement, performance measuring tools, I first met them when I was working on games, and it's very important to measure before you optimize, because you never know what actually slows you down. And being cache-friendly is also important, both in game development and distributed systems. Do you have any specific advice for how you can keep cache friendliness in mind
Starting point is 00:19:27 while you're programming in C++? I don't think you need to keep it in mind all the time. It's only needed when you need to optimize really aggressively. But let me give you an example, again, from a game dev world. There is one approach which is widely used. When you have your game objects, it's usually a vector, a vector of game objects, and you probably have a flag there saying if the object is visible or not, should you render it or not. And when you loop through your object,
Starting point is 00:20:09 you check this flag, and if it's inside the object, you probably load this flag and data from your object around it, which is not very useful for you. So it makes sense to put all the flags in a separate vector and loop through it, and it makes you faster. So do you loop through both vectors simultaneously, or you just loop through the first vector that's the flags? No, no, no.
Starting point is 00:20:31 You loop through the first vector, but you know the index. And if your object is visible, you can do something with your object. Ah, so it's a one-to-one index. You take the index from the first vector and then use that to actually do something. Yes, exactly.
Starting point is 00:20:44 Okay. Hmm. I've not done that myself. index, you take the index from the first vector and then use that to actually do something. Yes, exactly. I've not done that myself. So what are some of the other languages that are popular in distributed computing? I think you mentioned Rust being one. It's not actually popular. I know that Dropbox started using it, and
Starting point is 00:21:01 it was a very risky move because not many people using it but looks like it worked well for them overall so I was complaining about garbage collection in distributed systems plenty of people use Java Scala C sharp go a very good one because it's very easy to learn and it's made by Google and it's used by Google
Starting point is 00:21:32 so it has a very good reputation Erlang got a lot of attention after WhatsApp Is Go garbage collected? Go? Yes, it's garbage collected. I think all the ones you just listed are aside from Rust, right? Yes. So are there any techniques in distributed computing?
Starting point is 00:21:54 You mentioned that you do use C Sharp sometimes. Are there techniques to mitigate the effects of the garbage collector? There are some techniques. I read articles about it. But when I really care about my latencies, I use C++. There are plenty of tasks when you don't need to be that fast,
Starting point is 00:22:19 and C Sharp is fine for them. Okay. So I don't apply those techniques myself. Right. So do don't apply those techniques myself. Right. So do you have any particular aspects of C++ that are your favorite features of C++? My favorite features of C++ alone are smart pointers,
Starting point is 00:22:39 shared pointer, unique pointer. I think it makes C++ much safer language. Is there anything else that you would like to talk about for distributed computing or the work that you've done? Let's talk about complexity, managing complexity, which is important both in game development and in distributed systems. And it's very easy to make your program very complex in C++.
Starting point is 00:23:08 You can use template metaprogramming for it or something else. So it's very important to manage your complexity to keep things as simple as possible because if you do not manage your complexity, complexity starts managing you.
Starting point is 00:23:23 Your program starts falling apart and you just cannot ship. And a lot of people forget about it. They start a project, they get very excited. They get very excited with new C++ features. And then it's already too late to do anything to stop the project from failing. So what do you do to manage complexity? Do you have any process in place? Not a process, but just think about
Starting point is 00:23:51 how can I make my code more simple than that? Do I really need a factory of visitors here, or I can just write a simple function which does pretty much the same? That makes sense. Yeah, I think it's important to think about keeping your code as simple as possible. I agree with you. All right, so in your bio, you said you graduated from the gaming industry.
Starting point is 00:24:18 I thought that was a funny way of putting that. So that was one phase of your life, and you've moved on from there? Yeah, I moved on. Well, honestly, I just got a better offer from Microsoft. I had two offers at the time, one from a game dev company and one from Microsoft, and Microsoft One was better. And I knew both because I was specializing on distributed computing in the university. So a lot of people like to say that the game industry is this cutthroat, 90-hour-a-week kind of world. Was that your experience?
Starting point is 00:24:52 No, it's not always true, because I was very picky about which companies to work, and I couldn't afford working crazy hours because I have a family. So you can find good companies which treat their employees well. Okay.
Starting point is 00:25:11 You said you worked at an indie developer. Is it mostly the AAA game series that have that reputation as far as you know? No, no, no. Indie developers, small developers. Ah, yeah, yeah. That's true. I know that mostly AAA. I never worked for such a company.
Starting point is 00:25:29 I've heard stories, but I never worked there. So you did C++ development in your game development. Is that correct? Yes, mostly C++. And when did you leave the game industry? It was 2010. Since you left in 2010, were the developers of the game industry
Starting point is 00:25:51 starting to adopt C++11 features yet? I know most compilers had Auto, for instance, and I believe Lambdas were in most compilers. No, I don't remember them adopting it yet. And I was working on Nintendo Wii, which had this funny compiler called Code Warrior. And actually, it wasn't very good with existing stuff like templates. So I really doubt that.
Starting point is 00:26:21 Wow, okay. I never considered how much the compiler might limit what features of c++ you could use for those kinds of tools i know well 2010 it was just too early uh this c++ was largely adopted later but well it's still not adopted by a lot of companies because people are very cautious about new stuff. And, for example, I know a company where the keyword auto is prohibited because they think that it makes code unreadable and hard to debug.
Starting point is 00:27:01 That's interesting. That's actually why I'm quite skeptical about C++ guidelines, because I don't know how many people are going to use it. Right? That's new syntax, that's something you need to use all the time. And what we've got. In C++, you need to make an additional effort
Starting point is 00:27:23 to make your code safer. Right? And in Rust, you need to make an additional effort to make your code safer, right? And in Rust, you need to make an additional effort to make your code unsafe. And Rust approach sounds better for me. Right. Yeah, unfortunately, to get C++ there, we would just have to eliminate features that the core guidelines are just trying to prevent you from using, I guess. Yeah. Okay.
Starting point is 00:27:47 Well, Elena, it's been great having you. Where can people find you online if they want to look at your blog or follow you on Twitter? Twitter is the best. Okay. And what's your handle? I'll put in the show notes as well, but it's for people listening. It's Elena CPP. It's A-L-E-N-A CPP. Okay. Well, it's been great having you on the show.
Starting point is 00:28:12 Thank you. Thanks for joining us. Thanks. Bye. Thanks so much for listening as we chat about C++. I'd love to hear what you think of the podcast. Please let me know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic. I'd love to hear that also. You can email all your thoughts to feedback at cppcast.com. Thank you.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.