Grey Beards on Systems - 099: GreyBeards talk Folding@Home with Mike Harsch, a longtime enthusiast

Episode Date: April 1, 2020

Mike Harsch (@harschness) is a personal friend, a computer enthusiast with a particular and enduring interest in distributed systems and GPU computing. MIke’s been a longtime user and proponent of F...olding@Home, a distributed system focused on protein dynamics that anyone can download and run on their personal computer(s) or gaming devices. We started the discussion … Continue reading "099: GreyBeards talk Folding@Home with Mike Harsch, a longtime enthusiast"

Transcript
Discussion (0)
Starting point is 00:00:00 Hey everybody, Ray Lucchese here with Keith Townsend. Welcome to the next episode of Greybeards on Storage podcast, the show where we get Greybeards Storage bloggers to talk with system vendors and other tech experts to discuss upcoming products, technologies, and trends affecting the data center or our world today. This Graybeard Dunn Storage episode was recorded on March 20, 2020. We have with us here today Mike Harsh, a longtime Folding at Home user, computer hardware enthusiast, and personal friend. So, Mike, why don't you tell us a little bit about yourself and how you first got involved with Folding at Home. Okay, thanks for having me, Ray. So I should say from the beginning, I don't have any affiliation with the Folding at Home project other than being a user for a long time.
Starting point is 00:00:57 But it's been interesting. I think going back to all the way back to the late 90s is when distributed computing projects kind of came on the scene. And that's when I was just getting into college. The first thing I ever saw was like study at home kinds of things where they were using desktop systems or something like that to analyze radio spectrum and stuff so even before that so yeah that um that's kind of chapter two of the distributed computing story but even before that uh was uh distributed.net have you heard of that? No, I haven't. which is going to, it's sort of this hacker project where they're going to compete in this encryption contest where there's a prize for cracking, you know, the encryption standard of the day. Was this being put on by, you know, gamblers or, you know, dark nut kinds of guys or what? From what I can tell, it was kind of a publicity stunt by RSA to kind of, at first it was to discredit the legacy system and show how weak it was. And then it was, you know, supposedly to show how strong their replacement was. So anyway, I started contributing my compute time to this project and they would hand out parts of the brute force key space to crack this code. And yeah, I would just run in the background and the distributed.net system was set up and they were going to give, there was a prize for whoever cracked the code. And the individual who got the correct guess
Starting point is 00:03:09 was going to get a share of it. So it was kind of fun. I think kind of foreshadowed the whole cryptocurrency mining thing later on where you're doing brute force stuff for a potential payoff. But so that was 97. And then, um, a couple of years later, uh, SETI at home came out and, and, um, used the same kind of idea where they're gonna, um,
Starting point is 00:03:37 hand out little chunks of work to people on the internet that were interested in helping out um but this time it was essentially doing um analysis of radio astronomy uh data to try to find alien uh you know signals it was like a desktop saver for me for like you know a couple of years actually yeah it was okay it was a little fun thing it was nice i love the gamification of it that was like one of the things that you know the the idle cycles of my pc going to trying to find aliens was like a really cool like right right i never did find an alien signal not that we know of yeah well um yeah you're right keith the gamification um, was a key aspect and that, that got picked up by folding at home and they're, they really built themselves on the SETI at
Starting point is 00:04:31 home template. Um, but, uh, yeah, um, folding at home actually, um, just announced, uh, on the 7th that they were, um, closing down at the end of this month um which dovetails nicely though into uh you know the explosion that's happening right now folding at home so assetti at home um just this month announced they're shutting down um so you know uh no aliens yet but um in the meantime we can uh we we can point our computers at other interesting problems. So what is this folding at home thing? where volunteers were willing to essentially contribute their compute power to a problem that couldn't really be tackled otherwise economically.
Starting point is 00:05:33 So their problem that they're interested in solving is molecular dynamics of proteins in the medical sphere so they can understand the processes of the folding dynamics of these molecules with a name towards finding... Therapeutic things or something like that. Yeah, basically sites that drugs can attach to. So not being a cellular expert or anything like that, but, you know, proteins
Starting point is 00:06:13 have this weird folding topology, let's say, that creates a 3D space that other proteins or other cellular mechanisms can latch into. And it's this unique 3D space that makes each protein sort of unique in this world of cellular dynamics. Yeah, so I'm way out of my depth on the science of this stuff, but the way I understand it is that, you know, the protein molecules take on different shapes and there are certain configurations or states of the folded shape where they may be vulnerable to a binding molecule that could be introduced as a drug, for instance, a small molecule. And so what that could do is disrupt the function of the protein. So in the case of the coronavirus work that's happening right now, what they're targeting is the spike part of the protein of the virus itself and trying to disrupt the behavior of that protein so that it can't infect the cells of the lungs. So if they can disrupt the spike protein
Starting point is 00:07:37 on the virus molecule, it won't be able to bind to the ACE2 receptor of the host cells, and hopefully it can disrupt the life cycle. So I've got a picture of a COVID, and then I can put it in the blog post, a molecule from the WHO or CDC or something like that. And it's got this big round ball with these spikes out of this ball, but there's this red sort of topographical thing at the end of these spikes. Is that what you're trying to determine or match or understand? So what they're looking for are states in the sequence of folding operations and the sequence of motions of the molecule where the, where there are potentially pockets where a drug could bind to. I gotcha. I gotcha. So it's not enough to look at a static picture of the molecule when it's in its common state or in its folded state.
Starting point is 00:08:38 Cause it's more dynamic. Right. And, and they, you know, have actually demonstrated the ability to do this kind of identification of these potentially useful pockets in the Ebola virus. And they have papers published on that. So it's definitely a technique that has been demonstrated to work. And yeah, it's pretty cool. So let's say they have some chemical description of a protein, and I'm not sure what it would look like, but you know, it's probably some long number of amino acids tied together
Starting point is 00:09:17 or something like that. And so this sort of set of amino acids, sequence of amino acids, I guess, could fold in a number of different ways? So, again, my expertise is way more on the computing side. as the virus replicates, it has to take the, you know, the protein starts out in a unfolded state and then it, it through some period of time, it makes its way into its folded state. And there are different approaches to figuring out how that happens. And like I said, identifying parts of the molecule that can be potentially interrupted with drugs. But that's a pretty weak layman's version of it.
Starting point is 00:10:16 Yeah. So Mike, help us out with kind of the approach, the folding at home approach, because this sounds like the perfect challenge for quantum computing when it gets there like i mean as you're describing i'm like oh this is one of the first real world quantum computing uh uh examples that i've seen because there's really no yes no answer it's yeah yeah maybe probability and let's let's it's it's a problem it's. It's a perfect problem for quantum computing. an answer that was given by the team during their Ask Me Anything marathon they did yesterday on Reddit, where some of the team members weighed in on the quantum thing. I think the TLDR on that was there could potentially be real interesting applications of quantum computers when we get there. But who knows when, you know. The raw performance and capability maturity just isn't there yet.
Starting point is 00:11:28 I could talk at length about quantum computing, but that's not the subject of this discussion. But yeah, you're right. Exactly. Quantum computing could be used to do this if it ever gets to a point where it could be effectively used. And there are certain algorithms today that are usable, but not all of them. So how is folding taking, like, so I can think of, I'm starting to get a scale of the problem. And I think, Ray, that's what you're trying to get at too, is how big is this problem? And can you talk a little bit about the economics of what the project has been able to do, not just for COVID-19, but other problems is tackled that, you know, what problems have we solved today that we couldn't have economically in the past?
Starting point is 00:12:12 So that's a great question. The scale of the compute that's being harnessed by the project is on the order of petaflops. So prior to three weeks ago when things have kind of exploded, the running power of the system was around 100 petaflops, which is like a big supercomputer um so it's and and you know the the organizers of the project obviously don't have to pay for the hardware or the electricity that it takes to run uh which is significant so i always thought this was some sort of thing where you were actually doing some folding of it and trying different it's actually computationally intensive and it's doing the folding internally
Starting point is 00:13:08 what what does this thing actually do so it's running a molecular dynamics simulation over a period of time on the order of nanoseconds or maybe getting into the range of milliseconds. So it's simulating the motions of the atoms of the molecules of the protein in a fixed region of space over a very short period of time, and it's attempting to describe the folding behavior. Okay, so it is running, not unlike the SETI at home radio spectrum analysis, it's running some sort of a simulation program that's divvying out, you know, portions of this work to all these folding at home users in the world. And they're running, you know, this portion of the simulation on their system.
Starting point is 00:14:03 And it's providing the response back to folding at home this that says okay for this amount of this for this simulation time for this protein sequence this is what we believe or this is what the this is what the result is from that sequence is that what it's doing that's right so you've got these researchers, you know, postdocs and grad students and at different labs around the country and maybe around the world that are contributing. So they're basically framing the problem and they're saying, okay, I have this, they described the molecule that they're looking at. And then they set a bunch of initial conditions, which are force vectors that are initially applied to that molecule. And then the simulation starts at T equals zero. And it runs,
Starting point is 00:15:01 like I said, for some number of ticks, which is, like I said, on the scale of nanoseconds. It's simulating nanoseconds of real time, but in reality, it may take seconds or minutes of compute time, right? Oh, it generally takes hours. Hours, okay, per nanosecond. Yeah, for a work unit. So these things are, that so the the researchers create what's called a project and then that project gets um it generates all these work units of which there may be i don't know thousands thousands probably i have so many questions that
Starting point is 00:15:38 we probably would need to uh point to the folding people like one of the biggest ones is queue management and resource management. Obviously, the VMware folks are super proud. They put together a team of folding at home folks. Maybe you can help us with this part of it. The gamification
Starting point is 00:16:00 side of it, there's a VMware team and that team is growing and they're dedicating compute resources to the COVID-19 problem. But from a researcher in folding folks challenge, there's like this classic problem that we talk about in data center all the time, which is how do I distribute the load based on the application requirements? So that is the aspect of this story that's been particularly interesting to me. And I've been following it as best I could from both their Twitter feed and also from
Starting point is 00:16:39 watching the support forum site. And so what happened was the system was chugging along with an average of 30,000 users per day. And it had been for as long as anyone could remember, basically steady state. And the architecture has, I think, two front-end servers which accept requests for these work units. And then they redirect the request to a work server, which is the machine that transfers the work unit to the client and then accepts the results back from the client when it's done. So it's like a two-tier architecture. And this system, I don't know how long it had been in place for, but it had been just fine, never had any scaling issues. And then on February 27th, they announced on their blog that the Folding at Home project is going to start working on the COVID-19
Starting point is 00:17:46 virus in particular, trying to analyze the protein with an aim towards identifying receptor sites that could lead to a drug. And as you could imagine, this was a popular idea with people right now. And anyone who had heard of folding at home in the past and knew what it was about, probably went and fired it up right away. And then over the next week, it made its way through slowly at first, but made its way through, slowly at first, but made its way through various amplifiers on Twitter and other places to where as of the 14th, they started really experiencing scaling problems that they couldn't really cope with. So the infrastructure began to throttle significantly and all these new users, a lot of them were starved for work units. Do you know what the number of users were on the, around the 14th at all? Did they give you any numbers? It's on the order of 400,000 new users. So 10x. So they've been scrambling and I'm sure there's been some heroic efforts over the last couple of weeks on the part of the actual ops people that are running this thing.
Starting point is 00:19:19 And I'm happy to report as of last night, the 19th of March, seems like their throughput has gone up significantly. And they're not 100%, but they're able to cope, it looks like, with most of the demand right now. So they're really cranking. So it went to like a three-tiered structure or just added more front ends or more work servers? No, what they did was they've added more of everything. like a three-tiered structure or just add more front ends or more work servers or? No, what they did was they've added more of everything. Now, I don't think they're out of the woods yet because watching, so they actually have some stats pages online you can watch.
Starting point is 00:20:02 I've definitely seen some new servers pop up and that's good to see. But another comment from one of their support people said, the next thing that's going to hit us is running out of storage. And you can see on their stats page, I've been watching one server in particular just tick down. It's down to, I think, three terabytes left. I've talked to a number of vendors over the last couple of days and they're all offering free storage for COVID-19 users or, you know, researchers and stuff. Yeah, I'm sure there's no shortage
Starting point is 00:20:34 of offers for help. I think it's a matter of, you know, getting the right people on it, getting the hardware to the people. And, you know, it's a fairly old architecture. Yeah, there's this overhead of overhead. Always we run into this thing. I have a my advanced degree is in IT project management.
Starting point is 00:20:58 And it always boils down to you can't make a baby in one month by having nine women. And, you know, a lot of these problems are pretty serial from a, not from a technology perspective, because folding at home and distributed computing kind of mastered the distributed computing problem was not distributed always this scale of business relationships. You know, you can get everybody from WD to Seagate to Micron can call and say, hey, we have storage for you. The logistics of getting that, you know, getting the units to go and physically upgrade the capacities, there's probably solo space, all these challenges, you know, just are very, very weird physical challenges that we forget about in the era of cloud. First time I heard that phrase about nine women and one month, mythical man month,
Starting point is 00:21:54 software development at IBM by Brooks, right? Or something like that. It was an interesting book. Anyways, that's an aside. Yeah, yeah. It's, you know, there, there, there are plenty of organizations willing to help, but, you know, getting it there, getting it actually installed and up and running, it's going to take effort, time and money and not money. Well, yeah. Even anything that requires time or money of some type. Right. Yeah. Yep. Even the, the, as free as in puppies. Yeah. That's, and that's another thing that's been interesting to watch is, you know, as you can imagine, there's been an outpouring of people wanting to help on the, on the forums. And it's, you know,
Starting point is 00:22:42 there's everyone showed up kind of at the same time and was like, okay, we're here to help. And I think everyone who was, was there before is kind of looking at each other. Like, uh, yeah, it's kind of like it's, it's twofold. It's like when everyone shows up to the gym on the first of the year. And then the other example is when, uh, you watch these extreme home makeover shoulder shows, there's like hundreds of volunteers that have no idea how to do construction. Which kind of brings me to the community aspect of it. Tell us what the folding at home community was like prior to the influx of all these new users. I mean, for me, I remember the steady days and some, I did some of
Starting point is 00:23:26 the early folding stuff and it's, you know, I install an agent on my computer and I kind of just walk away. What, what, what's the conversation like in the forums and et cetera, even pre COVID? Yeah. So I, I, I would describe it as very niche, uh, before they're having this moment where it's getting to be very much mainstream. But, you know, talking about SETI at home, I think these things sort of follow a life cycle where someone finds out about it. And in the case of folding at home, you know, their pitch, their front page, you know, call to action is really like, hey, you want to help cure cancer, you want to help work on the cure for Alzheimer's, all these, all these things, you know, fundamentally, we need to understand how these proteins work better, so we can make progress on these problems. And so that's a very, you know,
Starting point is 00:24:25 compelling thing for a lot of people, obviously. So that's where people previously had, I think, come to the project. A lot of them wanted to do something about, you know, medical research. And this was one kind of interesting niche for for computer people especially but um now so yeah i think i think you know with seti it was this initial thing of like oh we're gonna go find aliens and then i think the life cycle of of a typical user goes from initial interest and excitement to um after a while it's like, what am I, what am I really doing here? Exactly. Well, you know, sitting at home, we never did find aliens. Like, I'm sure that, you know, folding at home is actually making progress on some proteins, but it needs to make
Starting point is 00:25:16 more progress on this, obviously. And that's where they've struggled, I think, a little in the past with PR and marketing is the output of the projects is primarily scientific papers that are not really accessible to most people. So, you know, that's the biggest question they get. And that was, you know, the number one upvoted question on their Reddit Ask Me Anything session yesterday was, you know, how can I see this, how can I see the tangible results of my work that I'm contributing here? And I think they're starting this month, they're doing a much better job of closing the loop on that. So it's encouraging to see. So, you know, I did the same sort of thing. I've started working on this Kaggle project on,
Starting point is 00:26:01 there's like, they've got 13,000 COVID-19 research papers. They want to be able to effectively create some sort of a knowledge base where they can ask questions. Okay. You know, what does this thing look like? What does this protein do? And, you know, try to understand, trying to understand the research so they can ask questions to it. But it's, it's a, it's a bear. It's, it's, it's, it's a very difficult task. So back to this folding at home. So you just load something that runs on a Mac and a PC and Linux and all that stuff? Yeah. So you download a client for your platform, and it installs kind of two components. One is a background process that does all the contacting the server and downloading the work units and crunching on them and then sending them back.
Starting point is 00:26:50 And the other part is a sort of a front end client that talks to the back end process and shows you progress on what it's doing and allows you to configure that. And the way it's configured is in terms of what are called folding slots. And so you can have either a CPU slot or a GPU slot. And then, so depending on what hardware you have, and you can, there's a little bit of configuration you can do in terms of how many cores you want it to use on CPUs or which set of GPUs you want to allow it to use. And yeah, you set it to go and it can run in the background. It's supposed to run as a low priority thing. So if you're doing other stuff, you know, ideally you won't notice it running, but you can also pause it while you're working and then kick it off again when you walk away from the machine, that kind of thing. And does it, you know, the study at home, it had some sort of a screen display where it would show you what, analyzing of the radio spectrum was looking at and stuff like that.
Starting point is 00:28:07 Does it have anything like that? Yeah. I believe it has a 3d visualization of the molecule as it's working on it. You don't have to, to, you know, look at that thing. And, you know, the, those of us who take this real seriously don't want to sacrifice any extra GPU cycles to do pretty pictures while we're… So that's pretty interesting. So, I mean, are there multiple proteins that they're working on?
Starting point is 00:28:37 Are there multiple… You know, I imagine, you know, even one protein would have, oh gosh, maybe 10,000 work units to try to understand its lifetime over a second or so, right? Or 10 seconds in this case. No, that would be milliseconds. No, no. We're talking like milliseconds would be a really big job is the way I understand it. Yeah. So that would be 10,000 work units for 10 milliseconds or something like that.
Starting point is 00:29:03 No, it's microseconds. The numbers are beyond my ken here i guess yeah i'm having trouble comprehending the mind you know wrapping my mind around the numbers but to the client side one of the things that i find interesting is that i haven't looked at folding uh since preGPUs were being used for this type of work. And before consumer side GPUs got to the point that they were reasonably useful for this. So you get to the whole Bitcoin and data mining expansion. Tell us about the, do you have a point of reference or ideal of how much more work is done now relative to GPU and CPU performance from when, you know, we weren't given the option to give GPU cycles? Yeah. So, yeah, so you're absolutely right, Keith. The history of folding at home is kind of the history of the last 20 years of innovation in computing.
Starting point is 00:30:14 So initially we just had, you know, they launched in 2000. We had single core CPUs and that's what it ran on. And then later on in, I think, 2006, I want to say, they introduced SMP cores, which were compute cores that could run on a multiprocessor system. But they were kind of primitive. They used message passing interface to do the parallelization even just on the on the cpu and then you know later they came out with a threaded version of the s p core which was better in a lot of ways it ran on windows and then you know as you say the the advent of GPUs on the scene as compute devices really changed the game because, particularly for this problem that we're describing here, it's a good fit for throughput-oriented computing where you're not sensitive to the latency of these individual little operations that you're running, but you want to run a lot of them just like a GPU is good at. So in order to quantify
Starting point is 00:31:34 the difference, what they tried to do was set up a point system, and this gets into the gamification aspect of it. They set up a benchmark system that would run a given workload. So a researcher would propose a work unit or a project, which he would feed a work unit through the benchmark system. And then that CPU on that system would do the work unit in a certain amount of time and they would assign a point value to that as a baseline and then if your cpu is faster you'd get you know some multiple of that point score for for completing one of those work units and then they they tried to scale that to gpus but the uh the point difference is amazing. It's like, it's, it's at least an order of magnitude, if not, uh, two. So, uh, the amount of useful work, uh, that GPUs can do is incredible. Now. Um, I think they're going to rethink the,
Starting point is 00:32:39 the point system a little bit because, so I would say for, for the same work, they can port the work units to, from, uh, they can, they can essentially make the same simulation run on CPUs or GPUs. And they do do that. Um, like for the, for the recent COVID one, they did that just so people who only had CPUs could do something. But I think there's other cases where the underlying code that, you know, is the simulation software that they're running, hasn't been ported to GPU, or maybe it's not a good fit for GPU. So there's still useful work to be done by CPUs. And based on kind of the discussion they had yesterday in the Ask Me Anything session, I think they're going to kind of re-tinker the point system to favor GPUs a little less than they currently are. But in numbers terms,
Starting point is 00:33:46 a modern GPU can generate between half a million and 3 million points per day. Whereas, I don't know, a CPU would be lucky to get like 300,000 maybe. You know, it brings up an interesting thing. So I've been doing some work with, you know, supercomputing groups and stuff like that. So last year, Summit was announced.
Starting point is 00:34:09 It had like 27,000 GPUs and like 4,600 server, you know, multi-core servers and stuff like that. But that was, you know, that was top 100. That was the top supercomputer in the world last year. El Capitan is being installed this year, and it's got, they don't even give us a number for the number of GPUs or CPUs, but it's now called, they're saying it's an exaflop, but they're using the GPU computational characteristics to make that statement. And what we're seeing is that a lot of these scientific workloads used to be all double precision floating point.
Starting point is 00:34:47 But nowadays, they're coming back down to single precision or even less. And for single precision or less, GPUs work wonderfully. So, really interesting question from a gamification going back to community because I'm a community guy. Is there like, how are are some of your peers on there are they are they selecting their pc purchasing purchases based on like their folding capability even if they don't need gpu so you know if you know we live in a world where if you do it for a living three or four hundred dollars more for your laptop or computer isn't a tremendous amount of money. And that's usually the difference in getting something with Intel-based GPU versus NVIDIA or AMD GPU.
Starting point is 00:35:36 Do you see some of your community members specifically going out for specific configurations? I couldn't really say, but I know that the video game cohort online has got to be the biggest segment of people that have been mobilized lately. They're the ones that already have the machines built that are already optimal for this kind of thing. And, um, the, um, some of the, you know, online personalities or companies that these people rally around, uh, have, you know, gone to the trouble of creating a folding at home team and sort of, you know, like you say, making it a game and having this sort of competition aspect. So, for example, you know, name any popular YouTube tech person, they're going to, you
Starting point is 00:36:38 know, there's a number of them that have recruited. So, Keith, you mentioned VMware is doing something with this? Yeah, so VMware, unofficially, Amanda Bevins, who works in the Octo office and a couple are, just have been cheerleading on Twitter personalities
Starting point is 00:36:58 just saying, hey, you guys have, especially in the VMware community, we have entire home labs with four servers with, you know, we might have, you know, I know, again, I know these are CPUs, but again, CPUs, we, a lot of us have like 80 cores of, physical cores of CPU just sitting in our closets. So she's been put the, you know,
Starting point is 00:37:25 she's been a champion to say, put these cores to use for a folding. Yeah. You're you're yeah, definitely on the right track. There's, there's a whole niche of hardware enthusiasts that get into that kind of thing. And then the other cohorts you have are the crypto miners are showing up. So it's not so profitable these days to go mining cryptocurrency, but a lot of them can switch over. So one of the big mining firms announced, uh, shortly after
Starting point is 00:38:06 folding at home came out with their announcement that, uh, they'd be pointing 6,000 of their GPUs at the, at the problem. And then, um, uh, actually before that Nat Friedman, um, uh, GitHub announced that they'd be spinning up 60,000 core hours per day, up to 60,000 core hours per day from GitHub to work on this. So it's kind of fun to see, you know, tech companies join the fray. Well, this has been great. Keith, any last questions for Mike before we close? No, I don't want to take the podcast off into a whole other tangent. This whole orchestration of workload management is fascinating. The scale
Starting point is 00:38:51 going from 30,000 to 400,000 endpoints, I'm not smart enough to solve that problem, but it's a fascinating problem. Mike, anything you'd like to say to our listening audience before we close? No, just if this kind of thing interests you, check out Folding at Home. Download the client and give it a try. If you have any issues, go to foldingforum.org and ask for help. There's a lot of great people doing triage and tech support there. Okay. Well, this has been great.
Starting point is 00:39:20 Thank you very much, Mike, for being on our show today. Yeah, my pleasure. And if you enjoy our podcast, tell your friends about it. And please review us on iTunes and Google Play and Spotify as this will help us get the word out. That's it for now. Bye, Keith. Bye, Ray. Bye, Mike.
Starting point is 00:39:35 So long. Until next time. Thanks.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.