Programming Throwdown - Concurrency

Episode Date: July 31, 2018

What is a thread/process? How can you speed up a program that requires a lot of compute resources? How can you have a single machine serve web pages to 100s of people, some of whom have slow ...connections? Patrick and I answer these questions on today's show: Concurrency! We have also set up a discord channel! We will be posting news stories as we find them and also record the show live! Check out our channel here: https://discord.gg/r4V2zpC Show Notes: https://www.programmingthrowdown.com/2018/07/episode-80-concurrency.html ★ Support this podcast on Patreon ★

Transcript
Discussion (0)
Starting point is 00:00:00 programming throwdown episode 80 concurrency take it away jason hey everyone we're trying something new we have a couple of people already, which is really great. It's pretty cool that we got to sort of work out some of the kinks and things like that. And we were trying out Discord. So we have a Discord channel. You know, this is something that I've mentioned off and on for years and years. Basically, we have Facebook. We have a Google Plus page,
Starting point is 00:00:47 we have Twitter, we have all these different media, we have email, which is which is where we probably get most of our traffic. You know, we've all these different things. And people can't really talk to each other. Because if someone posts on Facebook, you know, they're only posting to one over n of the people, right? If someone emails us, you know, if someone posts on facebook you know they're only posting to one over n of the people right if someone emails us you know they can't other people can't respond to that right and so um with discord you know we're hoping that we can get something where um you know people can ask the one thing is you'll be able to listen to the show live which is pretty cool so we're actually broadcasting live on on discord um but then also like more importantly, people can ask questions.
Starting point is 00:01:28 We set up a little questions channel, um, and, uh, you know, we can answer them live or sometimes, you know, a lot of probably the majority of the questions we actually just email back that person and we don't really talk about it on the show. Um, we also get a lot of the same question. Um, you know, I think the most common question is, most common question is sort of like, should I go to college or should I do a nano degree and things like that? And we answer that directly to that person. And so this way, it's like if someone asks questions by default that go there, it can be seen by everyone. You can really help other people out with your question. Obviously, you know, our email still still works so email is not a problem but i figure we'll try this out we'll see how it goes new things are always yeah and actually patrick's on this which is that's
Starting point is 00:02:17 one thing that separates this from from every other platform is is patrick himself is on this one well i am right now anyways you're pretty bearish on the discord i am bearish on social yeah anything yeah i uh i can understand that do you use uh slack at work probably not right no yeah makes sense cool well let's i used walk over to people's desks and then reminded why i went to the same office that's a good point i am somebody the other day who is like literally just a foot away and i kind of realized wait a minute i should just talk to this person um so yeah we'll just jump right into it so uh my first news article is 15 years of spark fun so spark fun's an awesome website they do a bunch of really cool projects um my guess is i know they have a store too right so if you want to buy like a i don't know a linear servo or something like
Starting point is 00:03:20 that or arduino board they sell them there um they sell some really cool kind of battery um you know different form factors of batteries and things like that um but the thing they're most known for is just having really cool articles and and having really cool how-tos and things like that and they're celebrating 15 years which is which is which is really neat so not only is this article just a yay type article but also they they really go through and document the entire 15 years. And they're very candid about, you know, this is year one. I don't remember exactly who started it, but the person who was writing this, like, this is year one. I'm by myself, like in my basement, starting SparkFun. And they go through kind of the whole evolution of of spark fun so it's pretty cool to read
Starting point is 00:04:07 yeah so hobbyist electronics components that's how i would describe it but yeah specializing in having a really good documentation instead of obscure data sheets yep yeah it totally makes sense that's crazy about 15 years i don't even think i i don't even know when i was the first customer it was early but it wasn't 15 years ago yeah it's wild i had no idea that they had been around that long i didn't get into robotics till very very late so uh um yeah for me it's the first thing i bought was maybe a year ago or something but uh yeah it was really cool. Now I'm curious.
Starting point is 00:04:48 I'm going to go search my email and figure out when the first time I bought something from them was. It's going to take a while. I'll do my new story instead. So my new story is building a CPU in the web browser. This is something I ran across. I think it was on Hacker News where someone was doing a show, Hacker News,
Starting point is 00:05:11 and they were, I guess, practicing their web skills. But I did play this for a while and I found it pretty interesting. And so this is, we've talked about, I believe, NAND to Tetris before, which is a similar idea, starting with the lowest level gates and building up more complex digital logic gates and all the way up until you have a processor and NAND to Tetris even sort of programming the processor and going onward
Starting point is 00:05:39 and upward. But this one, I think, stopped short of that. I didn't do all of the from scratch. I did get, I think, to all the way through the adder. So starting with, I believe you start with a NOR gate. So using a NOR gate, building up all the other gates, building up various logic components, and then an adder. Nice.
Starting point is 00:05:59 And they have a little playground canvas that you can drag components on to build them and then once you sort of unlock a new component then you your sort of next lesson uses that or is allowed to use that component um and so if you're interested in that kind of thing and want a super low barrier to entry uh just you can head on over to nandgame.com or we'll have a link in the show notes and you can try that out yeah very cool i played um i think it was human resource manager something like that yeah we made it yeah i made it a tool to show one show and that was very very satisfying and so yeah i'll give this a shot i think those games are just are, super fun. Because it's like, it's the idea of, you know, I built something,
Starting point is 00:06:47 and then I am now able to sort of replicate that thing. And then you end up with, like, so many layers of indirection. So you have this really complex web. It's just really cool to watch. Yeah. So this one's a little less polished from a UI standpoint than something like the Human Resource game, or is there something like TIS
Starting point is 00:07:06 8000 or Shenzhen IO there's a couple of others that are in similar veins this one's a little less it's somebody mostly just sort of practicing well I don't, not to be offensive but from the way I understood the way it was posted it was someone sort of learning to do a sort of web app
Starting point is 00:07:22 but they made something that was pretty cool and doing it. Nice. My next one is, again, about Discord, why Discord is sticking with React Native. And, yeah, this was super interesting. So basically, for people who don't know, React Native is a way where you write JavaScript in HTML.
Starting point is 00:07:45 Actually, they call it, I think, RSX or something like that, but it looks a lot like HTML. And basically, it will sort of transpile that into an app. Like it will basically distill that down into a native app. And so in this React Native system, you say I have a button and I have a label, et cetera, et cetera. And then you say, okay, I want an iOS app.
Starting point is 00:08:12 And it goes and does all sorts of magic and you end up with an iOS app that is native. So it's, as opposed to something like Unity, where you're manipulating basically individual pixels on the screen. And so if you make a button, you know, it's going to look exactly the same on everyone's phone, like pixel for pixel. But that's not really what people want. What people want is, you know, if they're on iOS, they want it to feel like an iOS app
Starting point is 00:08:41 and the same for Android, right? Also, you know, an incredible amount of work has gone into making those buttons and those text boxes and making sure the keyboard doesn't pop over the text box and all of that stuff, right? And you don't want to reinvent all of that. That would be very tough. So, and even if you did,
Starting point is 00:09:01 it wouldn't be like the one people are used to, right? So React Native kind of gets around a lot of that. I mean, it's not a silver bullet, right? So, for example, if the keyboard just interacts differently with text boxes on one platform and the other, then it could be on Android. You're covering up a text box with a keyboard, and it's really ugly but on ios you're not and so you have to sort of finagle all of that so that you get something that looks nice on everything um and so this uh this article is just all about a team doing that and and uh um and that whole experience so they kind of document their whole uh journey with react native so i thought that was pretty
Starting point is 00:09:43 cool i feel like there's been a sequence of these kinds of things i don't remember all the names their whole journey with React Native. So I thought that was pretty cool. I feel like there's been a sequence of these kinds of things. I don't remember all the names because I don't do really mobile programming, but the React one seems to have stuck around for a while. I don't know if it's stuck around quote-unquote long, but it seems like it's been a thing for longer than average.
Starting point is 00:10:00 There was like Xamarin was a big thing, and there was PhoneGap, where it was literally a webpage. that one um yeah and a lot of these kind of died out um it seems like react native for whatever reason has just stuck so i found my first spark fun order oh nice what time what uh year do you want to guess i'll take a shot how many years ago i'm gonna guess that you needed it for some college project and so i'm told oh we can't disclose how long ago rat you out but okay let me just loosely guess i'm gonna guess around um like 12 years ago. Oh, that's a good guess. Oh, nice. 11 years ago. Nice.
Starting point is 00:10:46 Yeah. It was not for a school project. Oh, really? Oh, okay. I actually can see what I ordered, and I know what I did with them. Oh, okay. All right. So it was actually, well, anyways, I'm not going to age myself.
Starting point is 00:10:59 All right. I ended up using Arduino to build a little thing that moved around on stepper motors. Oh, nice. Actually, speaking of that, my chess robot arm, I'm making good progress there. I'm posting on Facebook if anyone is following me there, but not as the podcast. You have to follow me because I want to keep the podcast pretty high signal to noise. But yeah, the robot arm is uh is pretty good i have a little stepper motor i'm actually looking at it right now and it moves i don't know this
Starting point is 00:11:31 probably isn't a good design but you know it works like basically i have a platform that goes up and down or i guess left and right and then that figures out which column to go the arm should pick from. And then the arm only has to think about one axis then. Oh, that's nice. Yeah, so it seems to work pretty well. I mean, it's not as fast, I think, as having an arm that could swing around. I mean, it just needs to work. Yeah. I mean, for me, this is a huge accomplishment
Starting point is 00:11:56 because I actually burnt a motherboard or melted – not a motherboard. I melted a breadboard because no one told me that you're not supposed to put seven amps melted a breadboard because uh no one told me that you're not supposed to put seven amps through a breadboard and uh wait yeah how did you manage to even have something that could do seven amps uh well i have the the stepper maybe it's not seven but i have the stepper motor and i have a digital servo and the digital servo is for the shoulder of the robot so so i got one that's pretty powerful.
Starting point is 00:12:28 And basically, yeah, if I have the platform moving and I try to move the arm at the same time, it draws a lot. Like, I don't know if it's exactly seven amps, but it's a lot of amps. It's not seven amps. It's definitely... You can't run seven amps through the PCB wiring. Oh, really? Oh, okay. That's not seven.
Starting point is 00:12:43 It's okay. I won't be nerdy. the pcb wiring oh really oh okay that's not seven it's okay it's it's enough well the breadboard can handle half an amp and i melted the breadboard so it's it's more than that so you are somewhere over half an amp that's my lower and upper bound uh okay well cool yeah don't don't don't burn your house down though man i'll try my best you know it's it's proving to be pretty difficult not to burn my house down also yeah you probably I'll try my best. You know, it's proving to be pretty difficult not to burn my house down. Also, yeah, you probably want to look at like motor drivers or adding a transistor so that it drives directly
Starting point is 00:13:12 from the power supply instead of running through the Arduino, if that's what it's doing. Yeah, I got... So basically... Okay, you already do that. Yeah, the breadboard is only there because I don't have a way to splice wires. So it's literally... You don't have a sold to splice wires so it's literally you don't
Starting point is 00:13:25 have a soldering iron i oh when's your birthday i'll send you one i actually have a soldering iron i never even uh um oh no actually it's uh i don't have sorry i have the split what i don't have is the um i would need to take a regular wire and basically put the right adapter on it so i could plug it in or you could just cut off the adapter i mean i yes i run into this all the time this is a horrible habit i just end up cutting off whatever adapter and just soldering on whatever oh man that is so smart so like oh i have a power plug that has this barrel jack and i need to feed it into my arduino like asker it i just snip off the barrel jack and then ta-da wires yeah that's really smart i should have thought about that no it's not smart no i mean it's it's resourceful or whatever i never
Starting point is 00:14:18 would have thought of that yeah okay um yeah so just remember i mean you're hacking so you you can just cut off whatever you can't connect to yeah one of the things as long as if you don't need it back again yeah exactly one of the things i'm having a hard time being destructive but then it's like i bought this stuff to make this one thing so i need to just get over that yeah okay soldering yeah the next fun here i want to know when you're making your own pcbs yeah i'll call you when i get my fingers stuck together wait what super glue solder i don't know well solder is not that sticky but solder is super hot and i don't know all right next topic we're talking
Starting point is 00:15:00 about fortnite fortnite's popular we talk about whatever's popular like cryptocurrency was popular so we talked about now i hear fortnight's popular yeah it's just seo seo baby do you play have you seen a baby i have played it's i ain't gonna lie it seems too complicated you know what i i feel the same way and the only thing i like i know granted i've only played a little bit but i like the 50 verse 50 have you tried that uh no i like i've played it literally three times i think yeah me it's not that much different for me but but uh uh they have this mode where it's it's just like pub g and that it's it's it's 100 players but it's literally two teams of 50 um you know it's pretty fun i'm sure everyone's laughing because they've probably all played it before oh sorry no no i mean basically the thing about 50 on 50 is you have
Starting point is 00:15:49 very little control of the outcome of the game but uh i thought it was it was pretty fun yeah they made a billion i'll try that wow so i played it on my ipad and i found the uh running and shooting was fine um and i've tried pubg before i played that a little more um but the building i found very difficult oh okay got it so i noticed that i don't know how people um and this is just because i haven't played enough some people have a way to build like entire towers i don't know if they're like you know using some kind of macro like you know at an input level or if there's just a setting in fortnite that i don't know about or just well yeah well apparently you and i are the only people who don't know because fortnite has made a billion dollars since launching in october so they are getting that money from somewhere people are
Starting point is 00:16:43 playing it it's a thing uh even you know sort of non-traditional gamers have been playing that i was listening to some podcasts i think it might have been twit which i listen to very rarely i don't listen to a lot of podcasts but i was listening to one for some reason and um they were speaking about that this is an interesting phenomenon where people are increasingly specializing into like specific games so something like i'm not a gamer i play fortnite i play league of legends i play dota like like i don't play it used to be people would just play whatever games whatever game was out whatever game was new but now there's increasingly people who just play minecraft just play fortnite you
Starting point is 00:17:21 know what like they they were sort of observing that a lot of their friends and acquaintances only play a single game wow it's really interesting so i don't know if that's i don't know if that's true i i do experience that somewhat like i know some people at work who are sort of always playing the same game uh overwatch is that is that oh yeah overwatch that's one of them yeah so a lot of these uh online multiplayer games uh they play very exclusively that makes sense i mean at some point it's like chess or something or playing you know poker with your friends or something where it's like you don't go to poker night and decide to play i don't know like uh rummy or something they just always play poker maybe you don't let's roll the die to see what card game we're gonna play today sellers of Catan wait I thought this was Pokernet
Starting point is 00:18:05 cool so now it's time for book of the show of the show my book of the show is a podcast but I've been as opposed to listening to audio books which is what I usually do
Starting point is 00:18:22 this month I just binged on this podcast and I thought it was great. It's called Dear Harvard Business Review or Dear HBR. So I don't actually know too much about Harvard Business Review. Like I don't know exactly how it's affiliated with the university. But what it is, is just people who study, um, the, the, the social part of businesses. So you remember in the last episode, we talked about technical arguments and, um, these people just focus on, um, just the, the in and outs of, of, of working with, with in a company or even for yourself or what have you.
Starting point is 00:19:01 And, um, uh, you know, each episode focuses on a particular topic um so there's one episode there's a bad bosses there's another episode that was bad uh reports and and uh and they take calls um from actually they take i guess emails from the audience and uh answer their questions they also bring on experts um you know um authors uh people who social scientists you know people have studied um you know uh like industry you know and that sort of you know corporate culture and all of that and uh it was really cool i actually kind of i totally binged on it i watched um probably 12 episodes and it was just super fascinating so so check it out my book of the show is a science fiction recommendation um and we've had this author before but that's john scalzi's old man's war uh and i decided to do something different because
Starting point is 00:19:57 i always struggle to not give spoilers so i'm just gonna read well i would say the back of the book but i'm just gonna read the summary from amazon uh john perry did two things on his 75th birthday first he visited his wife's grave then he joined the army oh nice that's a good lead-in i know yeah so if you're one of those people who enjoys the first one two three sentences is that the author who did Red Shirts? Red Shirts. Also, we've talked about Locked In, I think, on this show that I recommended. I also read from him Fuzzy Nation, which is actually him rewriting a much older book,
Starting point is 00:20:37 but I don't know if I've mentioned it as a book of the show on this episode. Anyway, so Old Man's War, I'm about two-thirds of the way through and uh so far i very much enjoyed it uh and i believe it's turned into a series that has like a bunch of books like seven or eight books like a lot of books in the series but this is the first one and i'm i've just started it and uh so far i'm enjoying it it as the three sentences said there's some interesting premises about what would it be like if old people and why would old people go to war
Starting point is 00:21:12 so but yeah you read I did I thought it was really interesting it was really fun the definitely not trademark infringing Star Trek story yeah that's right I mean i in general i just love parodies right because i love um somebody basically you because you really have to distill
Starting point is 00:21:34 something down to to parody it and so uh in to those on audible because i have a commute and so i spend a lot of time listening to audiobooks um get through a fair number of them and i always listen to them on audible and if you would like to support the show and uh you would be interested in trying to listen to a free audiobook, you can go to audibletrial.com slash programmingthrowdown and you can get a free one-month membership, which gives you a free book, one credit for a book. And for me, I'm always trying to optimize picking a book, which Old Man's War is a great book so far.
Starting point is 00:22:22 I think it would be good to pick up if you're not used to listening to a lot of books, but I normally try to go for sort of bang for the buck and look for books that are really long. And I feel like I get a better value for my credit because basically all the books are kind of priced this. Yeah. But we have a whole litany of recommendations through the show notes of
Starting point is 00:22:42 various books that both Jason and I have recommended that are available on audible. So check it out. Yeah. other thing i started uh listening to the podcast um on google play and at least on android um the google play app has all sorts of issues like i don't know if it was specific to that podcast but basically um what would happen is if you would pause it and then resume it would would start over. I don't know why. Actually, now I think about it, someone emailed us saying that that happens to our podcast. I think it's just an issue with Google Play, but it turns out there's also podcasts on Audible. And so you can actually get Dear HBR and other podcasts on Audible.
Starting point is 00:23:27 They don't cost any money as long as you have a subscription. And if you already have an Audible subscription, it's a much better user interface than the Google Play. Oh, I didn't know that. Yeah. Cool. If you don't want to support us on Audible, you either already have an Audible account or you want to support us in another way, you can check out our Patreon.
Starting point is 00:23:47 So that's patreon.com slash programmingthrowdown. And I post the episodes there. So they give you, you know, any patron has a high quality RSS feed. So if you are subscribed to us on Patreon, you know, don't use the rss feed on programthrowdown.com you can but the the one from patreon is is uh is super super high bandwidth it's meant for patrons meant to get you the episode like as fast as possible and so uh so check that out all right time for our tools of the show. Tool of the show.
Starting point is 00:24:26 My tool of the show is PyTorch. So if you've ever heard of Torch, Torch was a library written in Lua, and it basically gave you a lot of tensor-type operations. So, for example, if you want to multiply matrices, add matrices to each other, create like a neural network for doing some kind of AI project,
Starting point is 00:24:51 you could use Torch. The thing about Torch was it was written in Lua, so it was hard to really make it play nice with other libraries. So for example, if you wanted to read some data from H5 and put that into your Torch model, if you wanted to read some data from H5 and put that into your Torch model, you'd have to count on Lua having an H5 or HD5 reader. And Lua just doesn't have as much support
Starting point is 00:25:16 as other languages. So they went ahead and basically ported all of Torch to Python. And so PyTorch is exactly what it sounds like. It's really, really sharp. I've been using it quite a lot lately. And in the past, I've used other similar libraries. So I've used Keras, I've used TensorFlow,
Starting point is 00:25:40 I've used Theano. I've used pretty much all of these. And I think PyTorch is by far the best. And it's actually sort of taking over. You're just seeing more and more projects being done in PyTorch. And it's really a phenomenal library. The design also. Basically, all of these libraries or programming languages, really anything, they have the benefit of hindsight.
Starting point is 00:26:09 Right. So, you know, in the case of PyTorch, they really just looked at all these other libraries and said, you know, what can we do differently? And they've made some really, really good design decisions. So if you want to do some cool AI project or if you even need to do anything any type of sort of math heavy project check out PyTorch. It's really nice. I mean NumPy is good. The big difference between PyTorch and NumPy is that PyTorch can run on the GPU. There's a lot of a lot of differences but that's sort of the big one. I somebody said, hey, I want to do some math-like project that has a lot of heavy math in it, and I need it to run really fast,
Starting point is 00:26:54 then you could do it on PyTorch on the GPU. It'll just scream. Mine is also very educational and useful for work mine is a game nice uh called that's not probably funny because i think i do every episode uh suzy cube uh this is i i did my homework this time it is available on both ios and android nice um and this is a platformer in the style of, for me, it's most like Super Mario Land 3D, which is a Wii U game that I play with my kids that we like. It's very much in that style. And you're a cube character moving around in a 3D world.
Starting point is 00:27:43 But most platformers or games of that type i struggle with on on tablets or phones um but the controls are done really well and the game is a sort of good balance of feeling like you're making progress without just being like cruising through this is super easy no problem whatsoever um and i believe it's three or four dollars i think probably i think it was four dollars um but so it's not super expensive and it was a good time I've not beaten it yet I've only I'm only about four or five levels in but I'm enjoying it it has the you know common trope I guess of you can get through the level which is a sort of feat of its own but then there's also hidden stars some of which are more hidden than others.
Starting point is 00:28:25 And there's three stars on each level. And I'm trying to find those as an additional challenge. Nice. That's super fun. Yeah, I love games that do that because that way, it's like you have really two goals when you're playing all these puzzle games. The first goal, I mean, assuming it has a good story, is to really like experience that whole story. And the second goal is to solve, obviously solve the puzzles.
Starting point is 00:28:52 And so by having these sort of optional challenges, it lets you sort of pace yourself based on your skill, luck, and stuff like that. Yeah, so check it out. Cool. All right, on to the topic concurrency um or as we kept saying before the show cryptocurrency con cryptocurrency yeah cryptocurrency i think there's a crypto coin in there yeah we should we should start this i mean there's no way it could make less money than Dogecoin. I mean, Dogecoin is not the worst, dude.
Starting point is 00:29:29 We can name a lot worse than that. Oh, yeah, that's true. That's true. Yeah, that's... Sorry, I guess what I meant to say is it's got to be better. One Doge is one Doge. One Doge is one Doge. All right, so concurrency. So, yeah, I mean...
Starting point is 00:29:43 That died. Why would you do concurrency? Yeah, Patrick, why would we bother with concurrency? Why not? Concurrency is a way to, most people know that, well, I guess there are a couple ways, but the main one is people know that there's more than one core in modern day computers and even phones. Almost everything seems to have more than one core. I think even the newest Arduinos have multiple cores.
Starting point is 00:30:12 Maybe not Arduino. Yeah, probably even Arduino. Definitely the Raspberry Pi does. Oh, yeah. Okay. And so if you want to leverage more than one of these, you need to do more than one thing at a time. You need to take concurrent work and the normal programming we do uh when you think about sort of writing c++ or java um javascript i guess it will have to think about exact application but
Starting point is 00:30:37 if you're writing an application level programming language go um you know what, when you're doing that, you're writing a set of instructions you expect to execute sequentially. And the processor may do what's called reordering, but you expect it to roughly execute in order. And there's a lot of things that have these dependencies. First do this, then do that. Yeah, basically, it's a guarantee, right? So if you say like x plus 2 and x times 10 and if those are swapped you're kind of in big trouble that's true but there are certain things you can run in parallel or swap and not get into trouble like x equals two and a equals five right right right yeah but it's done behind the. Yeah. But we're 10 that doesn't exist for now. And so concurrency is a way to be able to do, you know, like that example,
Starting point is 00:31:30 X equals two and A equals five. If you want to run those at the same time, you would get done faster, however long it takes to do that assignment. Instead of having to take time to do it twice, you would get it in wall clock time, time to execute it just once because they would happen on two different processors that are independent and can both take care of it yeah um just to explain like so so there's cpu time which is basically like how many clocks does it take to do something and then there's wall time which is how much total time so if you have 100 things you need to do the number of the amount of cpu time is probably fixed let's say right but but if you could do 50 of them on one core and 50 on the other you could half your wall time
Starting point is 00:32:11 even though your cpu time hasn't changed yeah and this is the so when you talk about concurrence you're talking about trying to get some amounts of this i guess you'd say parallelism and there's a lot of things that talk about it and formalisms. And even when I was studying up for this a little bit, remembering from college, these Petri nets and ways of describing parallel accesses. And then I guess the analogy I always hear is that there's different kinds of tasks which are more or less suited to concurrency. And the example is that if you have nine ladies, you can't have a baby in one month, but you could have nine babies in nine months.
Starting point is 00:32:54 Right. So if you have one, well, if they can only have one at a time, I guess this is an oversimplification and a slightly awkward example, but it's one that was always taught to me. Octomom proves you wrong. Yeah, okay. This is a bad example. Well, it's the same, was always taught to me. Octomom proves you wrong. Yeah, okay.
Starting point is 00:33:05 This is a bad example. Well, it's the same, but take twins out of it or multiples out of it. Yeah, and what you're trying to say there is that some tasks you really do, as Jason was saying, first I need to take X and add two, and then I need to multiply by 10.
Starting point is 00:33:19 And if you do those differently, then you're gonna have an issue. And so splitting that task, as simple as it sounds onto two processors isn't going to go faster. Because one of them is just going to be stuck with nothing to do. So some tasks have these large sequential dependencies. And because of that, it just takes a while to do that. And adding more processors, you could duplicate that work, you could do more of it. But you so if you had to do some image processing one way to get
Starting point is 00:33:46 parallelism is to take a whole image and do it on processor one and a whole image and do it on processor two but depending on the task you're doing it may be possible to do like the top half on processor one and the bottom half on processor two and the two different scenarios are that if you're going to do both images all the way to the end the total time taken wall clock and cpu time is the same but if you're only going to do one image in in splitting the image in half the first image comes out sooner wall clock time than if you have one processor doing image a and one processor doing image b and that's an advantage often because often we worry about how long until we get the first answers out or the soonest answers out think about things like video games
Starting point is 00:34:32 and video games you know you need to get the whole frame rendered in a certain time so doing a lot of the different tasks in parallel helps you get that frame out sooner um versus a pixar movie where you could spend a lot of time just working on a single frame and a whole different set of computers be working on a different frame than what you were on. Yeah, yeah, exactly. Another reason why you want concurrency is if you have a lot of asynchronous calls that you have to make. So for example, let's say you have a web server and somebody goes and hits your web server. You want to hit a database, get some information from that database, and then send that information back to that person and wait for that person to get it. Right. So you're doing a lot of waiting. You're waiting for the database, you're waiting for the person
Starting point is 00:35:30 to get your reply, right? Sometimes you have to resend the reply if they didn't get it, things like that. And I mean, imagine if the web server just ran in one thread. So like, while it's doing all this waiting, nobody else could get on your website. That'd be terrible. I mean, most requests, I think, I mean, it depends on a lot of factors. Let's say they take a second. So that would mean you can only have one person per second on your website. Um, you know, so, so another thing to do there is, is, is use concurrency to say, okay, it doesn't even necessarily have to be, um, have to use multiple cores, it could just use one core, but it could just release, you know, some type of lock or something. So it could say, okay, I'm going to go to the database and ask it for some information. And until I get that information back, you know, I'm just going to chill and someone else could do something on the core. And so
Starting point is 00:36:21 that's how most web servers work is they just are this whole waterfall of asynchronous calls. And the entire time one of those calls is in progress, that, I guess we'll call it thread to same time, uh, on one core even. And, um, uh, you just taking advantage of the fact that you're spending a lot of time waiting. So, so yeah, that's the other sort of big use case between the two of those you've covered, you know, 90% of the, um, you know, uses for a concurrency. Yeah. And I think it is useful because people talked a lot about having multiple, uh, processes or like if you ever did UI work or still do not doing something on the main thread. And people sort of make this comment. But the idea is sort of what Jason is pointing out, which is even if you only have one processor, you do this time slicing, you could have prioritizations.
Starting point is 00:37:21 But the idea is you want certain things to be very responsive. And if you need to do a long running task, even if it's not going off to another computer, you want to make sure that that task that's long running has lower priority and gets interrupted to service things which you want a quick response to. So if you imagine like this, not exactly how web servers work, but you know, hey, Jason, I have a question for you. And he says, okay, what's your question? I give him a complex differential equation he needs to go solve. You may want him to say,
Starting point is 00:37:52 okay, I'm working on it, and then go work on solving it. But getting a, I'm working on it, so I know he's not dead or not responding to me is a useful thing, to have that initial very quick response. And concurrency, even if you only have a single processor, allows you to sort of keep things working well. Yeah, exactly.
Starting point is 00:38:13 And so basically the core like mechanisms by which you get concurrency, one is through message passing. So imagine you have sort of two threads and they're both running, or let's say even two processes. So the difference between a thread and a process is multiple threads share the same memory, but multiple processes, at least by default,
Starting point is 00:38:39 they don't share the same memory. So you couldn't create an object in one process and just pass the pointer to another process and use the same memory. So you couldn't create an object in one process and just pass the pointer to another process and use the same object. It won't work. So for example, if you have two processes and you want them to, as I said, pass some object around, one process has to serialize it. So turn it into just a series of ones and zeros, like a string or something like that, and then send it to the other process, right? And you can do the same with threads as well, right? Just you don't have to, but you can. And we call that message passing, right? And it's just the same as if you were to send a message to another machine on the internet or something like that. You basically are just going to package up that message, send it across, and the other person's going to unpackage it.
Starting point is 00:39:27 So if you hear things like IPC, inter-process communication, that's going to be a way to pass messages. And in fact, at least on Unix, IPC and TCP work almost the same way. You can actually open a socket to another process on your machine, or you can open a socket to, you know, yahoo.com or something like that. And it's almost the same code, right?
Starting point is 00:39:55 So it's really kind of, you know, it's meant to work that way, right? Another way to do it is through shared memory. And so, you know, this gets a little bit complicated because it's very OS specific. Like on some, especially older OSs, you actually have to use files and file locking and things like that. But at a high level, just imagine there's a chunk of memory that both processes can access, right? So actually, I mean, if you're talking about threads, then everything is shared memory, right?
Starting point is 00:40:30 So then it's just a matter of saying, okay, this object, I'm gonna access this object from multiple threads and just sort of effectively writing some documentation and things like that so that people know that, hey, while they're editing this object or reading this object or something like that so that people know that hey while they're um you know editing this object or reading this object like that it's possible another thread is is also doing that at the same time yeah so i mean i i think for this threads um especially in the last few years this has gotten in um i guess application programs much easier. So in Python, there's threading.
Starting point is 00:41:06 In Java, there's now, there's always been threading, but there's easy ways to even just sort of multiprocess streams where this shared memory kind of happens without even thinking about it. And even in C++, there's standard thread now and standard async, which standard async is still a little broken, but... Oh, wait, really? Tell me more. Well, no, I forget the exact nuances. thread now and standard async which standard async is still a little broken but oh wait really
Starting point is 00:41:25 standard tell me more well no i i forget the exact nuances and and i think like i do a lot of my work in c++ 11 and 14 and it was broken and i don't know if they fixed it in 17 um async has a weird dispatch policy is the is this long and short of it you have to take special care to make sure you're like when you run a standard async command it happens right then like the function will run in parallel when you think it does as opposed to sort of getting deferred uh if i've already called the argument correctly um but yeah so i mean as jason pointed out the for sort of threads it's just implicit what you might even like i was mentioning you might just sort of write an anonymous function,
Starting point is 00:42:06 a Lambda function, or have a function call just in line with something. You say, hey, I want to go do this 10 times, or I want to go do this across these slices of the data. And that's just an object that you're sharing out to those threads. It doesn't have to be copied. It just gets accessed in place.
Starting point is 00:42:31 Yep, that totally makes sense. One, one last thing about this is, uh, there are like thread pools and, and, and process pools. And so the idea there is, you're creating a thread is actually, especially creating a process is pretty expensive. I mean, you know, the computers nowadays are so fast that you really don't notice, right? I mean, unless you're doing a lot of it. But under the hood, there's actually pretty significant overhead around creating those things. And so there's this concept of a pool where basically, you know, at the start of your application, you say, OK, I know I have a ton of work to do. And I know I have, let's say, eight cores. So I'm going to create eight threads right at the beginning. And these eight threads are just going to start taking work.
Starting point is 00:43:10 So as soon as I have some work, I'm going to say, hey, eight threads, go and take this work. And I don't really, you know, there's a lot of complexity around, okay, which thread is free, who can accept the work, et cetera, et cetera. So the thread pool kind of abstracts all of that away. So you basically, and almost every language has this, you basically say, you know, I want a thread pool with eight threads. And then you can just say pool dot, you know, it depends on the language, pool dot nq or
Starting point is 00:43:40 pool dot start or something like that. And you could give it a function, and it will handle, if you call it nine times in a row really quickly, it'll take that ninth one and it'll wait for one of the first eight to finish. It'll kind of handle all of that for you. So the thing to point out is if you use message passing, whether you're talking to something on the same computer or two different computers, as Jason was pointing out with this sort of socket example, is actually not immediately obvious to you or really doesn't make much of a difference as far as you writing the program is concerned. But if you're doing shared memory, there's extra steps that need to be taken. Or even if you're using processes,
Starting point is 00:44:31 sometimes you need to use a resource. Like I need to talk to, oh, that's a bad example, like the hard drive, let's say, and I'm going to be writing stuff out in the hard, oh, the speaker. I want to go talk to the speaker. And if two programs are talking to the speaker,
Starting point is 00:44:44 it would just be nonsense. So in our system, we don't want, you know, the speaker to be accessed by multiple programs. So there's this concept of, you know, locking that resource so that I'm the only one who writes to it. So that garbage audio doesn't come out. And so in order to do that, these locking concepts need to be done typically by the operating system and exposed in the programming language in some way but they allow you to sort of do coordination and the things you would hear are semaphores mutexes locks and they come in a variety of forms but roughly they're all around this making sure that you don't collide in how you're trying to change or use something and what that amounts to is mutexes are mutually exclusive which says there's some bit of
Starting point is 00:45:34 code and i want to guarantee that when i run this bit of code i'm the only person with regards to some you know some signal and is, if there's two threads running the exact same function, it's probably the same block of code, you only want to be executed one at a time. But it could be two different things you want mutexes in order to protect maybe, like I said, the speaker. And so you want to say, hey, I have a mutex. Anytime someone's going to use the speaker, then they're going to acquire the mutex. And you know that when you acquire the mutex, that you're the only one who will be talking to the speaker at that time, because everybody else is mutually excluded. And then you get into a conversation about, instead of just doing this with a variable, where you could end up with a race condition but
Starting point is 00:46:26 even if you avoid it the nice thing about mutex is they often have the built-in concept that instead of pulling where the processor is busy in a very tight loop just going are you free are you free are you free are you free are you free there's normally more sophisticated mechanisms where you'll be able to put the other threads into a state where the processor is free to do other work, put them effectively to sleep and have them get woken up when the mutex is free for them to acquire. And that has the advantage of allowing, again, the computer to process more stuff. So as Jason's example, when something goes off to use the database, you don't want to just sit there saying, is the database transaction done?
Starting point is 00:47:06 Is it done? Is it done? You want to sort of be notified, okay, hey, I'm done now, and receive that signal. So you're not locking up all the computer processor just asking, are you done? Yep. Yeah. And so the mutex implementation, even on every language, it comes with a lock object. And so what the lock object does is you say, hey, you know, lock this mutex. And what that will do is it will lock it. And, you know, any point from that lock call onwards, you know, you have total ownership. You know no one else is in that code. Or if someone else has the lock, it'll just wait. So, you know, a common problem with multi-threading is that you could end up saying, okay, I want to lock this object
Starting point is 00:47:59 because, you know, I want to increment it. And you have another object you want to lock, and you want to, let's say, print that you incremented it. And if you have different threads sort of locking different objects, you can end up causing the system to be really, really slow and not realize it. Because you're tracing through your code, and you're saying, oh, this function's pretty quick, this function's quick. It's very hard to know how long a lock will take, because you kind
Starting point is 00:48:30 of have to think about what else is going on in the system. Yeah, that's right. And this is sort of these shared memory versus message passing, and there's a variety of other things are, I guess, a formal way of describing it. But what I often find in practice is that you see somewhat of a kind of hybrid solution. So what I've seen kind of the most for when someone actually has to go handle raw concurrency themselves is in a language like C++ or Java, you end up spinning up some threads to do something, and you have a queue that describes you know sort of each of the things you want done and you put a lock on that queue and so you effectively get a kind of low cost lightweight um message passing interface where i have a job queue
Starting point is 00:49:19 of things i want to go execute it it it might be like parameters or little messages that have come in or pieces of an image. And I sort of add a description of the work to be done to a queue. And then I have however many threads that spin up. And when a thread wakes up, it tries to acquire the lock on the queue. It reads the data out of the queue
Starting point is 00:49:41 and then releases a lock, does its work. And then there's typically an output queue where when it finishes, it acquires the lock on of the queue and then releases a lock does its work and then there's typically an output queue where when it finishes it it acquires the lock on the output queue puts the data in unlocks it and goes back and tries to get something from the input again and if you just sort of do that you know over and over again you end up with an effectively what looks a lot like message passing but using in that the sort of inputs and outputs are going into a shared structure, but the individual things themselves aren't shared.
Starting point is 00:50:11 The individual sort of parcels are, but you might have a read only something that's shared. Like you might only pass the pixel coordinates of the image to process, but the image itself is kept read only. So even though it is shared, you're not really editing it and you don't run into maybe the same complexity you might if everybody was simultaneously editing the image. Yep. Yeah, exactly. So yeah, the two big pitfalls
Starting point is 00:50:38 are one is race conditions. Do you want to talk about race conditions? I'll take that. Okay, sure. So race conditions is a generic term, but it just means when two things are running in the system, you can get unintended consequences where you think you're, you know, I want to set variable A to two and then B to three. And then something happens where you find out during debug that a was three when b was three instead of a being two and it might be because some other piece of code
Starting point is 00:51:13 was setting a to three and then b to something else and you have this race condition where two people are modifying something and you think it should happen in a certain order but because they're sort of not synchronized in some way via one of the locking mechanisms or semaphores we discussed before that you're sort of up to a chance where sometimes it might happen in a way that looks sort of a little bit out of order um so example would be like uh printing to the screen if you have two threads printing to the screen the data can get intermixed and that it's sort of a race condition that you're trying to write out your whole message but while you're trying to write out someone else comes in and writes out too and that might not be a harmful
Starting point is 00:51:52 race condition but sometimes race conditions uh can be harmful where you sort of think almost transactionally like this set of operations going to happen all at once but in reality somebody else could come in in the middle and do something. And that would result in a risk condition. Yeah, exactly. I mean, a simple way to observe this, right, is if you create two threads and those threads just immediately try to print hello and then print world, you'll often get hello, hello, world, world. Right. hello hello world world right and so uh um you know as soon as you start introducing kind of in that case as patrick says just printing so you start introducing logic that becomes really yeah
Starting point is 00:52:31 so you think about hello world coming out as sort of one unit of thing but in reality it's actually many many different instructions that have to run yep yeah exactly so what are deadlocks all right deadlocks i'm gonna see if i if i can get this right hang on i hear i hear wikipedia what is no no no no uh you tell me uh check me out okay but i think so so for example let's say um i'm trying to come up with a really solid example. Let's say, for example, you have a thread that is... Let's say you have some type of database, and this database is holding information to email. So it's some kind of reporting database, right?
Starting point is 00:53:25 And then you have a reporting, so you have a thread for just kind of managing that database. Then you have a reporting service. And this service, you know, takes things from the database and actually sends them out over email, right? And so you have these two threads, and so you have a sort of mutex.
Starting point is 00:53:43 So when the database is updating the sending the service has to wait and when the service is you know pulling something from the database to send it off the database has to wait to update right and so now someone goes in and introduces let's say multi-part messages so or or some kind of instant message or something like that. So the database is updating. And while the database is updating, the database gets this request that says, OK, send an instant message. I need this to go out immediately. So it waits on the server. It says, hey, service that's sending out these emails, I need this to go out right away.
Starting point is 00:54:29 But the service that's sending the email, they are in this loop where they are waiting for the database to finish updating. So because remember how they work, right? They wait for an update and then they check it. So they're waiting for the database to update. The database update has this new like emergency mode and it wants to send something and they're both waiting on each other. And that's a deadlock, right? And it's actually extremely easy to do that and it's very painful because what happens is your system just grinds to a halt. And you can usually see this happening
Starting point is 00:55:08 because all of your logs will just lock up. Everything will just lock up. But you won't be using any of the processor. It's just kind of sitting there as if it's idle. And so that's a pretty painful situation. We actually had one of those a couple of days ago, and it's not pretty to debug. It's really difficult, actually.
Starting point is 00:55:27 Yeah, I think it's difficult to explain. But I mean, extracting out the most obvious dangerous conditions is when there's more than a single resource that you need to get your job done. So if a single process needs to use things that are locked by more than a single protection mechanism. So that's, you know know when would you have oh that's you have two systems and i need to lock system a and do something but then i also need to go lock system b and maybe at the start you don't say i just want to lock a and b together because maybe what i'm doing with a takes a really long time and b i only need for a very short amount of time. But then if you have another process which needs B for a long time
Starting point is 00:56:09 and then A for a short time at the end before finishing off a write to B, then what you can end up is, as Jason was trying to describe, is you end up with this deadlock where A gets locked, B gets locked by two different processes, and then they go try to
Starting point is 00:56:25 lock the other resource, but it's held by someone else, and you're sort of stuck. And this happens, it sounds really like, oh, I'll just never do that. But as Jason was saying, it's actually pretty subtle that you're doing this, because in a really big system, there's often many locks flying around and being locked and unlocked and people are trying to write codes efficient and um for me the thing that that i sort of when when we deal with them at work as often as possible and we'll get to this in a second with advice is try to just use a single lock until you know you don't that that's not working for you and you actually prove that that's inefficient so if you have multiple things that that's not working for you. And you actually prove that that's inefficient.
Starting point is 00:57:07 So if you have multiple things that need protection against race conditions or need mutexing, try to just lock them all together, even if that's less efficient. And so what that means is, say I need to have resource A, and then there's also a resource B, but I'm not using it. But someone else might want to use B and share it
Starting point is 00:57:26 and so you say I'll create a lock for A and a lock for B but then now you have this opportunity where if someone else comes along and needs both A and B they can end up in this race condition state so instead just say lock A and B at the same time like all of the resources which are prone to having this condition all sit under a single lock. Because if there's only one lock in the system, you actually never can have this problem. But that's an oversimplification.
Starting point is 00:57:53 It doesn't always work, but it works more often than you think. And a lot of times, if you can do the work needed in the lock very, very quickly, then the amount of contention, the amount of time spent, as Jason was saying, waiting around for the lock to be freed, if it's low enough, then it kind of doesn't matter that there aren't these fine grain locks. Yeah, exactly. I mean, very rarely do you need a lock for a long period of time, especially like if you follow that sort of design pattern that Patrick mentioned earlier, you know, you're only accessing the lock to just send basically signals around. And all of the work is happening outside of the lock. And if you kind of can follow that, then you never need so many locks.
Starting point is 00:58:40 You never need that much control. Yeah. Basically, you know, both Patrick and I have done a lot of threading and it's a total it's a total nightmare. It's very hard to debug. So I'll transition into my my first bit of advice, which I just gave one, but I'll give another, which is a lot of times people think, oh, I'm all multi thread this. I guess this is early optimization problem again. But like, I'm going to multi-thread this or I'm going to add concurrency here. And what they're really just getting is complexity. There's a lot of overhead in not just these locks,
Starting point is 00:59:15 but also, as Jason mentioned, spinning up threads, allocating, deallocating things. You tend to end up with more copying of the data because as you're trying to organize it and stuff and so non-obviously sometimes you for instance if you can only split it into sort of two or three threads the startup and organization cost for for having three threads do the work may not be enough to merit having the multi-threading it might be faster just to have the single processor work on it and be faster just to have the single processor
Starting point is 00:59:45 work on it and be done with it rather than going through that overhead of trying to split it up into parts and parcel it out to everyone because there's inherently more synchronizing an organization that happen to have has to happen as part of that so one my big cautionary advice is both have measurements in advance that show you need an improvement. And then when you go add or start adding it, add it in as simple a way as you can and show that you're actually getting a speed up. Because it might be that the overhead dominates the cost and the actual computation. Like we mentioned, you know, setting a variable or doing a simple addition. If you're trying to multi-thread just doing an addition,
Starting point is 01:00:24 that's not going to be ideal because addition runs really fast and so unless you have a ton of additions to do um you know say i only have five but i'll put them in you know five different threads and have it do it the overhead cost is going to dominate there because the amount of work to be done is really small yep yeah exactly um yeah mean the, the biggest piece of advice that, that we could give is, um, yeah, don't do concurrency unless you absolutely have to. And, uh, even then really try not to do it yourself. Like, uh, for example, let's say you need something that reads a thousand files and, uh, um, you know, let's say, you know, does some transmutation of those files and creates 1,000 new files that are slightly different, right?
Starting point is 01:01:17 You know, you could, you know, it makes sense. Like, you could use a thread pool and you could have all your cores reading these files and all of that. But something way, way easier is just write a program that reads one file and creates one output file and then use a new parallel. So if you're on Unix, you could type the word parallel. That's actually your program. And so it takes a little bit of time to learn how it works, especially if you need to read a bunch of files and you need to create output files that are the input file name dot data or something like that.
Starting point is 01:01:55 It's a little bit of exercise in learning how that works. But it lets you write programs that are just very small and simple. And they'll run, GNU Parallel will run as fast as your computer can run. And they have all sorts of different tricks to speed that all up. And there's just Bash is doing all the parallelism for you. On OS X, you have to use Homebrew and Brew install parallel. I don't think it's on by default. But you could just do
Starting point is 01:02:25 that and now you have it there too. So yeah, Gadoo parallel and writing something that just does one file will eliminate most of the parallelism. Also, there's a lot of async libraries. For example, we've mentioned Node.js on the podcast. If you're using, let's say, Express, which is a web server library in Node.js, they do all the parallelism for you. So basically, you just create handlers. So you say, look, when someone comes to my website and they enter this URL, like slash me, then I want to go and fetch their account and print it. And, you know, that handler can get called, you know, 10 times from 10 different threads. And you don't even have to manage those threads or anything. It's all done for you.
Starting point is 01:03:15 So almost every web server library does that part for you because they know how difficult it is. So, yeah, the last thing I would mention is basically in your algebra systems, so BLAS systems like PyTorch and NumPy and SciPy and these other systems, under the hood, they're doing multi-threaded. So basically, you know, if you say NumPy.add A and B, A and B are matrices. Under the hood, it's going to use all the cores of your computer. It's going to have a thread pool. It's going to spin up all these threads.
Starting point is 01:03:53 It's going to even do like SIMD and all of this stuff. But you don't have to have to do with any of that yourself. You just say add. So even PyTorch has the multip-processing module, or we talked about Julia in a past episode. You know, a lot of these libraries will even do multi-processing or even multi-machine. PyTorch has Torch.distributed for multi-machine. And you don't have to really mess with any of that threaded code. It's all abstracted away, and that will work for 99% of use cases. All right. So yeah, I think the other piece of advice is definitely do a lot of logging make sure my last piece of advice is make sure
Starting point is 01:04:47 that when you log you log the thread so you can actually usually get the pointer to the thread object or something like that that way you can kind of tell your threads apart and yeah definitely start small write lots of tests unit tests
Starting point is 01:05:03 and things like that if you have threaded code because you're amplifying the amount of errors and also the the effect the penalty of having an error is amplifying man we had all this interesting conversation and then we just basically said and try not to do any of this that's great yeah exactly i, people have to do it for you. I mean, like the Eternal Terminal thing that I wrote is full of threads. So in your career, it'll be very hard to get out of it. So our advice is don't do it if you can help it. But both Patrick and I do multi-threaded code regularly.
Starting point is 01:05:46 So you won't be able to get away with it all the time. Well, I look forward to probably not seeing any of you on the Discord. Well, we just announced it like an hour ago. Oh, because you're Patrick. I thought, okay. Yeah, Patrick is going to go into recluse mode now you had your opportunity two hours four five there's like four there's three three people who uh drop by so there we go or four people that's right so you three people got a rare exclusive chance to see patrick actually engage in a in some type of multi-user forum. But mostly you could actually just be streaming this to them
Starting point is 01:06:28 and pretending like I was live. That's true. All right, catch you later. The intro music is Axo by Vynar Pilot. Programming Throwdown is distributed under a Creative Commons Attribution Sharealike 2.0 license. You're free to share, copy, distribute, transmit the work, to remix, adapt the work, but you must provide attribution to Patrick and I and sharealike in kind.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.