CppCast - Distributed Computing
Episode Date: April 28, 2016Rob and Jason are joined by Elena Sagalaeva from Microsoft's Bing Ads team to discuss Distributed Computing with C++. Elena Sagalaeva is a Russian-born professional C++ developer since 2000. S...he was primarily a game developer working both for various studios and as an indie developer. She grad uated from the industry while being a tech lead at the head of a small dev team. Elena currently lives in U.S. with her family and works at Microsoft in Bing Ads. Her current interests focus on large scale distributed systems and the development of the C++ language. She has a popular blog on C++ in Russian and she is the author of the famed C++ Lands map. News Introducing the C++ Core Guidelines Red Hat at the ISO C++ Standards Meeting pybind11: Seamless operability between C++11 and Python Elena Sagalaeva Elena Sagalaeva's Blog @alenacpp Links Nexus Wireless Silent Mouse C++11 Lands Map
Transcript
Discussion (0)
This episode of CppCast is sponsored by JetBrains, maker of excellent C++ developer tools including
CLion, ReSharper for C++, and AppCode. Start your free evaluation today at jetbrains.com
slash cppcast dash cpp. CppCast is also sponsored by CppCon, the annual week-long
face-to-face gathering for the entire C++ community. Get your ticket now during early bird registration until July 1st.
Episode 55 of CppCast with guest Elena Sagaleva recorded April 28th, 2016. In this episode, we talk about the core guidelines and Python bindings.
Then we talk to Elena Sagaleva from Microsoft's Bing Ads team.
Elena tells us about distributed computing with C++ developers by C++ developers.
I'm your host, Rob Irving, joined by my co-host, Jason Turner.
Jason, how are you doing today?
All right, Rob. How about you?
Doing pretty good. I invested in a new mouse.
Whenever I edited the show,
I know it picks up a lot of my mouse clicks
and it was really bothering me.
I was trying to filter them out,
but it just didn't work.
I guess my mouse was extremely loud
and it just wouldn't filter out at all.
So I got this new mouse
and it's supposed to be silent
and I can't hear anything.
I'm clicking away right now, furiously.
I wish that I was a fly on the wall
when you were at Best Buy,
sitting there, click, click, click, click, click, click, click,
trying to pick up the mouse.
There's a website that I looked at.
It's called endpcnoise.com,
and they recommended this mouse as being silent.
Fascinating.
I had no idea.
I think the brand was Nexus,
and they have four or five different
mouses, and this is a nice wireless one.
I really can't hear it. I can still feel it. I get the tactile response of clicking,
but you really can't hear it at all, which is great.
That's impressive.
One less thing to filter out at the end of the show.
It won't annoy the listeners anymore. I'm sure it was just as annoying
as your old pen clicks.
I have no idea what you're talking
about.
Okay. Well, at the top of our episode,
I'd like to read a piece of feedback.
Last week, we talked
briefly about Runtime
Compiled C++, that article
that was just released for free.
And the author of that library, Doug Binks, must be a listener of the show because he reached out to us over Twitter and said he'd be willing to come on the podcast.
So we're going to have him on in a few weeks.
That'd be awesome.
Yeah.
So we'd love to hear your thoughts about the show as well.
You can always reach out to us on Facebook, Twitter, or email us at feedback at cpcast.com.
And don't forget to leave us reviews on iTunes as well.
Joining us today is Elena Sagaleva.
Elena is a Russian-born professional C++ developer since 2000.
She was primarily a game developer working both for various studios and as an indie developer.
She graduated from the industry while being a tech lead at the head of a small dev team.
Elena currently lives in the U.S. with her family and works at Microsoft and Bing Ads.
Her current interest focuses on large-scale distributed systems and development of the C++ language.
She has a popular blog on C++ in Russian, and she is the author of the famed C++ Landsmap.
Elena, welcome to the show.
Hello.
Let's spend a second talking about the C++ lands map.
How did that come about?
Well, that was my idea, but I cannot draw.
But fortunately, I have a friend who's an artist, quite a good one.
And he agreed to help me with that.
But he doesn't know C++ at all.
So when we were making that map, he kept laughing.
I have no idea what I'm doing.
I have no idea what I'm doing.
But he put all my thoughts on paper very well.
And the map became popular, not only among the Russian community, but all over the world.
And people are still reaching me out, still asking me to add more features c++ 14 17 etc so it's it's
quite popular it's a very beautiful map i mean it looks like it is something that you would see in
like you know some fantasy novel or something like lord of the rings would have it at the front of
the book yeah it's that quality yeah yeah it does actually i think look like the fold outs that you
get in the extended releases of lord of the rings actually yeah it's very i think look like the fold outs that you get in the extended
releases of lord of the rings actually yeah it's very nice are you so are you planning an update
for 2017 plus uh no i don't think so because my friend arty jim he's very busy he owns a business
now so he probably won't have time for it anymore. That's cool. Too bad.
But we will put a link to this in the show notes.
It is very cool to check out.
Nice work. Thanks.
Okay, so we had a couple news items to go
over. Elena, feel free to jump in
and comment on any of these.
The first one is
an article from Kate Gregory,
a past guest on the show, and
she's talking about the C++ core guidelines,
and this is on visualstudiomagazine.com.
And obviously it's something we talked a little bit about,
but we still really need to go more in depth on the C++ core guidelines.
This is kind of just her introduction of that
to the Visual Studio Magazine readership.
Yeah. It's a good article.
It is a good article.
I'm not sure who would be the best person
to get on the show to go into this content
in more depth.
Herb Sutter?
Yeah, who do you guess would be great?
Or Bjarne.
I've reached out to Herb before. He's always busy.
I was there at CPPCon
when Stravstrup announced
C++ Core Guided Lens,
and he had a very long presentation about it.
And people around me were whispering
they're trying to make Rust out of C++.
Look.
Because with these guidelines, C++ starts removing Rust.
Yeah.
Okay, so the next article comes from the Red Hat developer blog.
And this is from Torvald Riegel, one of the Red Hat C++ developers.
And he was talking about the proposals that he was involved in at the latest meeting in Jacksonville.
And I had not heard too much about this standard synchronic proposal before.
Jason, did you read into this one a bit?
I read, well, his description of it,
and I had not heard of it either,
but it does look interesting.
Yeah.
It's basically a way to block on an atomic until it has changed, right?
Yeah, and he's comparing it to Linux few texts,
which is a feature I'm not familiar with.
Maybe you are?
I may be familiar with them because I feature I'm not familiar with. Maybe you are? I'm maybe familiar with them
because I'm a little bit familiar with spin locks
and what can happen in the Linux kernel,
but no, I mean, not by that name.
Okay.
Iona, have you been following the C++17 development at all?
A little bit.
I like to see more and more concurrency features
being voted in.
That's nice.
That will make cross-platform development easier.
Yeah, definitely.
So I did have one question about this,
and that's the fact that they need a special proposal
for floating-point atomics.
And I guess I don't know enough about CPU architecture
to know why that would be so different from integer atomics.
Anyone have any input?
I have no idea.
Yeah, I don't know.
We'll have to find out a little bit more information about that one.
All right.
Yeah.
This last item is PyBind11,
which is a new lightweight header-only library for binding your C++ to
your Python. And the author is describing it kind of as Boost Python without Boost.
Yeah.
So yeah, if you aren't already using Boost and you want to bind to some Python and you don't
want to bring in all of the associated Boost libraries, then this might be a good alternative to Boost Python.
I think it basically has the same features,
but it's built for C++11
and doesn't require the rest of Boost, which is nice.
It might be useful for game developers
because it's a common situation
when you have a game engine written in C++,
and you script it in Python or Lua or Unreal script,
and you don't want to bring the whole boost with you,
so you can use some small library like this one to script your game engine.
But I briefly looked into it,
and it looks like you need to define your interfaces twice. First
in C++ and then to expose
it for Python.
Right? It's not auto-generated.
It's not auto-generated.
You do have to tell it what functions you want exposed.
Yeah. I think
it should be auto-generated
just to eliminate this
boring work. But other than
it, it could be useful.
Yeah, it looks very clean.
I need to check it out myself.
Yeah, and one of the nice things about this one
is it does run on all the major compilers,
Clang, GCC, Visual Studio 2015,
and the Intel C++ compiler, so that's very nice.
Wow.
Yeah, you don't usually see the Intel one listed there.
Awesome. Yeah. Okay, so elena let's start talking about distributed computing can you tell us a little bit about distributed computing
and and the types of problems that are involved in distributed computing uh so i'm working with
for bing ads for most of my years i I use mostly C++, sometimes C Sharp,
and I work with code which runs on thousands of servers,
serves tens of thousands of requests per second,
and because of it, we want our code to be fast,
and C++ is very good for it,
and we want our 99% percentile latencies to be low and again C++ is very good
for it because it doesn't have garbage collection for example and garbage collection can affect
your 99th percentile latency. We also want a low level control on our code. For example, we want our data
to fit good in memory.
That way we can use
cache-aware algorithm.
And again, it's very good
for our latencies.
So you've mentioned this concept
of 99 percentile latency.
What exactly does that mean?
I'm not familiar with that.
That's the 1% of slowest
requests.
You have your distributed system,
a cluster, right?
And you probably want to measure your latencies
there and split it by
percentile.
You'll get 100
of those. And
the 1%, the slowest 1%,
your tail latency, it's a very important metric.
Because you might think about it, it's like, why 1%? Who cares?
But to process a user's query, you really need more than one request.
You might need 100.
And that means that most of your queries will be affected by the 99% latency.
Interesting.
So what are some of the specific benefits of using C++ with distributed computing?
So because you don't have garbage collection, you don't have this nasty latency spikes.
And I see a lot of folks using other languages
like Java or C Sharp,
and they always fight with language,
they always fight with framework
to tweak the garbage collection,
to make the effect of garbage collection
on latencies lower.
It doesn't apply to all the distributed systems.
Sometimes you're kind of fine
with your tail latencies
being affected by garbage collection.
But most of the time,
you'd prefer to use something else,
to use, for example, C++ or Rust,
which don't have garbage collection at all.
So does this not wanting garbage collection,
I guess, if you will,
how does that affect the way you program in C++
for distributed computing?
Are you sensitive to creating things on the heap?
Does that matter to you?
Do you aim to the stack or do you just not really care?
Just the fact that you don't have garbage collection,
is that what is important?
Well, you should be very accurate with allocating memory. You don't want to allocate all the time.
You might want to use arena allocation, you know, when you allocate a huge chunk of memory,
and then you allocate out of it smaller parts, and then you deallocate all of it at once.
So that's why you save on allocation and
data fragmentation.
That's actually a very popular approach
used in game development
as well.
Or you might
want to use something like boost pool
when you pre-allocate
your objects and then
just use them out of the pool.
Okay.
I'd like to interrupt the discussion for just a moment to bring you a word from our sponsors.
ReSharper C++ makes Visual Studio a much better IDE for C++ developers.
It provides on-the-fly code analysis, quick fixes, powerful search and navigation,
smart code completion, automated refactorings,
a wide variety of code generation options, and a host of other features to help increase your
everyday productivity. Code refactorings for C++ help change your code safely, while context
actions let you switch between alternative syntax constructs and serve as shortcuts to code
generation actions. With ReSharper C++, you can instantly jump to any file, type, or type
member in solution. You can search for usages of any code and get a clear view of all found usages
with grouping and preview options. Visit jb.gg slash cppcast-rcpp to learn more and download your free 30-day evaluation.
Are there any newer features in C++ 11 or 14 that have been particularly important to distributed computing?
First of all, the language became much better, in my opinion, overall.
So we're using a lot of features of C++ 11.
But the most important one for me
are move semantics, of course,
because you don't need to copy,
you know, can move.
And memory fences.
Because we
work with log-free stuff
a lot. Log-free
algorithm, log-free data
structures, and
thanks to memory fences, developers can develop their log-free data structures and thanks to memory fences, developers
can develop their
log-free stuff easier.
I do not write
log-free,
low-level
algorithm or data structure.
I just use them as they are written
by somebody else.
But I see a lot more and more
projects being open source,
using C++11 memory fences,
and they look good.
And I have more choice now.
There's a couple of videos from last year's C++ Now
that was about lock-free data structure development.
And it's a mind warp.
It messes with your head.
True.
That's why I prefer to use somebody else's libraries
and not write it myself.
Right.
Is template metaprogramming
a big part of distributed computing?
I know, doesn't Facebook do several C++ CppCon videos
about how much they use template-aimed programming?
I'm assuming they use a lot of distributed computing scenarios.
And it's important to kind of reduce the power requirements
of all those servers that are running your code.
I can't say it's a very important part.
Some teams use it, some don't.
I use it from time to time.
We use variadic templates, but not too much,
because, first of all, your code becomes less readable because of it,
and it's harder to debug.
But sometimes it's a good solution.
What kind of inter-process communication do you guys use
for talking
between your server processes?
Okay, today I
cannot talk with you on behalf of
Microsoft, so I can
talk overall.
But
I prefer
RPC, remote procedure
call, mostly
because I was specializing
on distributed computing way back in the university,
and that's what I started with, RPC.
But there are plenty of other stuff like MPI, OpenMP.
Okay.
We talked a little bit about in your bio how you've done game development in the past.
How has that experience kind of shaped you
as you move from game development
into distributed computing with Microsoft?
Performance is important both in games
and in distributed systems.
I've already talked to you about
original allocation, which is used in games.
For example, when you start a new level,
you can allocate a huge chunk of memory
for your objects, static objects, enemies, etc.
When they've been removed from the scene,
like killed or something like this,
you just do not draw them, but they're still in memory.
And when you deallocate your level,
you deallocate everything., you deallocate everything,
and that's why you are not dealing
with memory leaks,
you don't have memory fragmentation,
so it's a very useful approach.
And that's used in distributed
systems as well.
Performance
measurement, performance
measuring tools,
I first met them when I was working on games,
and it's very important to measure before you optimize,
because you never know what actually slows you down.
And being cache-friendly is also important,
both in game development and distributed systems.
Do you have any specific advice for how you can keep cache friendliness in mind
while you're programming in C++? I don't think you need to keep it in mind all the time.
It's only needed when you need to optimize really aggressively. But let me give you an example,
again, from a game dev world.
There is one approach which is widely used.
When you have your game objects, it's usually a vector, a vector of game objects,
and you probably have a flag there saying if the object is visible or not,
should you render it or not.
And when you loop through your object,
you check this flag,
and if it's inside the object,
you probably load this flag and data from your object around it,
which is not very useful for you.
So it makes sense to put all the flags in a separate vector and loop through it, and it makes you faster.
So do you loop through both vectors simultaneously,
or you just loop through the first vector that's the flags?
No, no, no.
You loop through the first vector,
but you know the index.
And if your object is visible,
you can do something with your object.
Ah, so it's a one-to-one index.
You take the index from the first vector
and then use that to actually do something.
Yes, exactly.
Okay. Hmm. I've not done that myself. index, you take the index from the first vector and then use that to actually do something. Yes, exactly.
I've not done that myself.
So what are some of the other languages that are popular in distributed
computing? I think you mentioned Rust being
one. It's not
actually popular.
I know that Dropbox
started using it, and
it was a very risky move
because not many people using it but
looks like it worked well for them overall so I was complaining about
garbage collection in distributed systems plenty of people use Java Scala
C sharp go a very good one
because it's very easy to learn
and it's made by Google
and it's used by Google
so it has a very good reputation
Erlang got a lot of attention
after WhatsApp
Is Go garbage collected?
Go? Yes, it's garbage collected.
I think all the ones you just listed are aside from Rust, right?
Yes.
So are there any techniques in distributed computing?
You mentioned that you do use C Sharp sometimes.
Are there techniques to mitigate the effects of the garbage collector?
There are some techniques.
I read articles about it.
But when I really care about my latencies,
I use C++.
There are plenty of tasks
when you don't need to be that fast,
and C Sharp is fine for them.
Okay.
So I don't apply those techniques myself.
Right. So do don't apply those techniques myself. Right.
So do you have any particular aspects of C++
that are your favorite features of C++?
My favorite features of C++ alone
are smart pointers,
shared pointer, unique pointer.
I think it makes C++ much safer language.
Is there anything else that you would like to talk about for distributed computing or
the work that you've done?
Let's talk about complexity, managing complexity, which is important both in game development
and in distributed systems.
And it's very easy to make your program very complex
in C++.
You can use template metaprogramming
for it or something else.
So it's very important to manage
your complexity to keep
things as simple as possible
because if you do not manage your
complexity, complexity starts managing
you.
Your program starts falling apart and you just cannot ship.
And a lot of people forget about it.
They start a project, they get very excited.
They get very excited with new C++ features.
And then it's already too late to do anything to stop the project from failing.
So what do you do to manage complexity?
Do you have any process in place?
Not a process, but just think about
how can I make my code more simple than that?
Do I really need a factory of visitors here,
or I can just write a simple function
which does pretty much the same?
That makes sense.
Yeah, I think it's important to think about keeping your code as simple as possible.
I agree with you.
All right, so in your bio, you said you graduated from the gaming industry.
I thought that was a funny way of putting that. So that was one phase of your life, and you've moved on from there?
Yeah, I moved on.
Well, honestly, I just got a better offer from Microsoft.
I had two offers at the time, one from a game dev company and one from Microsoft,
and Microsoft One was better.
And I knew both because I was specializing on distributed computing in the university. So a lot of people like to say that the game industry
is this cutthroat, 90-hour-a-week kind of world.
Was that your experience?
No, it's not always true,
because I was very picky about which companies to work,
and I couldn't afford working crazy hours
because I have a family.
So you can find good companies
which treat
their employees well.
Okay.
You said you worked at an indie developer.
Is it mostly the AAA game series
that have that reputation as far as you know?
No, no, no.
Indie developers, small developers.
Ah, yeah, yeah.
That's true. I know that mostly AAA.
I never worked for such a company.
I've heard stories, but I never worked there.
So you did C++ development in your game development.
Is that correct?
Yes, mostly C++.
And when did you leave the game industry?
It was 2010.
Since you left in 2010,
were the developers of the game industry
starting to adopt C++11 features yet?
I know most compilers had Auto, for instance,
and I believe Lambdas were in most compilers.
No, I don't remember them adopting it yet.
And I was working on Nintendo Wii,
which had this funny compiler called Code Warrior.
And actually, it wasn't very good with existing stuff like templates.
So I really doubt that.
Wow, okay.
I never considered how much the compiler might limit what features of c++ you could use
for those kinds of tools i know well 2010 it was just too early uh this c++ was largely adopted
later but well it's still not adopted by a lot of companies because people are very cautious about new stuff.
And, for example, I know a company
where the keyword auto is prohibited
because they think that it makes code unreadable
and hard to debug.
That's interesting.
That's actually why I'm quite skeptical
about C++ guidelines,
because I don't know how many people are going to use it.
Right?
That's new syntax, that's something you need to use all the time.
And what we've got.
In C++, you need to make an additional effort
to make your code safer.
Right? And in Rust, you need to make an additional effort to make your code safer, right?
And in Rust, you need to make an additional effort to make your code unsafe.
And Rust approach sounds better for me.
Right.
Yeah, unfortunately, to get C++ there, we would just have to eliminate features that the core guidelines are just trying to prevent you from using, I guess.
Yeah.
Okay.
Well, Elena, it's been great having you.
Where can people find you online if they want to look at your blog or follow you on Twitter?
Twitter is the best.
Okay.
And what's your handle?
I'll put in the show notes as well, but it's for people listening.
It's Elena CPP.
It's A-L-E-N-A CPP. Okay. Well, it's been great having you on the show.
Thank you. Thanks for joining us. Thanks. Bye.
Thanks so much for listening as we chat about C++. I'd love to hear what you think of the podcast.
Please let me know if we're discussing the stuff you're interested in, or if you have a suggestion for a topic. I'd love to hear that also. You can email all your thoughts to feedback at cppcast.com. Thank you.