a16z Podcast - The Cool Stuff Only Happens at Scale

Starting point is 00:00:00 The content here is for informational purposes only, should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund. For more details, please see A16Z.com slash disclosures. Hi, this is Chris Dixon's the A16Z podcast. I'm here with Herman Nerula from Improbable and VJ Pandey from Stanford, who's also a professor in residence here at A16Z. Hey guys, let's talk about distributed computing. So my own view, I guess, I'll just start it off, is that over the next few decades, distributed computing will be a particularly important topic because we're now with things like AWS awash in computing resources. You know, compute is becoming approaching zero, storage, networking. But most of this is, you know, on multiple physical machines. And it's very, very hard for software developers to build. software that distributes well.

Starting point is 00:01:02 And so we have things like, I think, of Hadoop and its successor, we think as successor Spark, as frameworks for doing distributed computing for specific application, which is data processing. And I think we'll probably see more of that kind of pattern among other verticals. And also, at the same time, more infrastructure that helps us, you know, programming languages, frameworks, et cetera, that help us do this. So, I don't know, maybe Vijay, if we could start off. How do you view this?

Starting point is 00:01:31 Yeah, I mean, I think this is the key issue, right? Because everyone can program the standard C paradigm on one processor core. Now actually, you know, going to multi-core is not so hard. But start thinking about multiple boxes, 1,000 boxes, 10,000 boxes. This will drive you insane, right? We can't be spending our time programming thinking about each of these things. We need to have some type of abstraction. And that seems to be the key.

Starting point is 00:01:51 That's how we've been successful in general with coding. And Hadoop and its successors are great abstractions for, certain things. But you can't do MapReduce with everything. And I think that's going to be the challenge. And I think what we're going to see is verticals moving in certain directions that can do different things. It's hard to imagine the magic language that solves all this problems. And so picking things that can sort of attack certain key domains actually could have a huge impact. Yeah, I mean, I'd agree. I think, Avidji, when you say you can't use MapReduce for everything, that suggests an even deeper issue, which is a lot of the approaches people have today

Starting point is 00:02:22 for scale and for paralyzability. They really boil down to problems that are quite easy to scale because they naturally have a simple abstraction a way of splitting them up. I think the next few years are going to be about attacking problems which are naturally harder to split, harder to scale. And how we go about that,

Starting point is 00:02:37 I mean, a whole other layer of reliability is going to be required as well for computations that are more challenging to split up. And that suggests to me that, even in terms of the cloud infrastructure that's available, there may be completely different

Starting point is 00:02:48 characteristics that are necessary. So I think there'll be lots of winners and losers in this space that no one can yet predict. And the thing to emphasize here is that the possible win is huge. The difference between what you could do on one box versus 100,000 boxes.

Starting point is 00:03:00 I mean, that's not just like I'm being impatient. That's being transformative. There's a couple things happening, right? One is, I mean, they always say Morris Law's ending. People are saying that now. I mean, then again, people said that every year for the last 50 years. So, but to the extent that maybe it's slowing down or whatever, that the way you're going to get additional, sort of the effects of more of the law, additional compute is going to be going across machines, not with more transistors on a single core, number one, right? So you need to.

Starting point is 00:03:24 Number two, as you said, like, it's a different, like, it's not just like you're, like, it's not just like you're. doing like three times as much, if you're doing 100,000 times as much, it unlocks a whole new class of potential applications. Yeah, exactly. So like from your own area, like, I don't know if you could, like, so in biology, for example. Yeah, in biology, you know, we launched Folding at Home in October of 2000 now, so we're coming up on. Can you tell us people what that is? Yeah, so Folding Home is a large-scale distributed computing project where people go to our website, folding.spanaford.edu, they download the software. And right now, we have about 40 petaflops worth of performance out of maybe about, you know, maybe 400,000

Starting point is 00:03:56 and processors. It's actually interesting. This was inspired by SETI at home? Yeah, inspired by SETI at Home, which also was inspired by GIMS, and there's a few other ones. I think we were the first to do sort of something in science. I think, you know, debate whether finding aliens of science or not. But, you know, something where in biology...

Starting point is 00:04:13 Nonfiction science. Yeah. In biology, you know, the challenge was, we wanted to tackle problems that would take, let's say, a million CPU days to do. You know. But now, now would it, would you, did you still need to do that kind of approach to day, or could you do it on a cloud computing? Yeah, I think you could do it in cloud computing, but do you think about it, I think Amazon

Starting point is 00:04:32 has roughly maybe 300,000 boxes. So if we want to buy all of Amazon, that would be a little pricey. Okay. But, you know, you think about what we do with Foiling Home, it's kind of like a time machine because what we did 10 years ago, people can now do on GPUs. What we are doing now with 10,000 GPUs, people will probably do in the future with maybe a small GPU cluster. But the paradigm, the programming paradigm, the way you think about is all the same.

Starting point is 00:04:56 And therefore, we are taking advantage of Moore's law as time goes on. And so how does the future, let's say the future you can have access to a million boxes or something like, what does it mean for the applications in biology and health care? Yeah, I think there's a couple different ways to think about it. I think there's sort of processing lots of data and then doing calculations either for us we do a lot of simulation. But I think it's people usually think about the data side, but the simulation side is actually pretty intriguing too. It's interesting. I think all that power, it unlocks so many more problems. I mean, how do I test my distributed application?

Starting point is 00:05:26 and how do I even run, you know, sensible diagnostics on it. Plus, not to mention the skills shortage and people that can really build distributed applications well. Thinking about that alone, I think, will start to create a whole movement of people where this skill set becomes incredibly valuable. You know, even in terms of languages and tools, we're not well served today with good abstractions

Starting point is 00:05:46 to think about distributed systems. Most of the ideas that are being used now, like actor paradigms, for example, which some people may be familiar with them. These are from the 80s, right? Nothing's really changed in how we attack distributed system. So it should be fun to see that revolution. Is there anything promising you see in terms of languages, frameworks, infrastructure, software?

Starting point is 00:06:05 I think right now everyone is sort of rolling their own for their domain. I mean, that's true for us. And there's these problems that we all have that are hard to handle sort of in a generic way. Like you have to deal with fault tolerance. You've got a million boxes or even just 10,000 boxes. Most likely one of them will die or have some problem along the way. and MPI, which has been the standard in super-imputing, HPC, is very fault intolerant. The whole job will crash and things like that. And so those paradigms really have to change.

Starting point is 00:06:34 And I think the prominence of companies like Google, which have all this on the back end, I think have gotten people thinking about this, and MapReduce and things like that. Those abstractions have played a huge role. I really am looking forward to seeing where people will go. And I think Hadoop and Spark are a good example, but I think we need much more.

Starting point is 00:06:49 Yeah, I mean, these problems are profoundly different from scaling web services. And I think another interesting point is that we traditionally assume large tech companies are going to have a hegemony over large compute problems. But with this new space, I wonder whether existing infrastructure really will be that. One pattern we've noticed is that whereas industry, meaning probably Google and maybe Facebook-led, data center innovation, and Amazon, sorry, over the last 15 years, we're seeing more and more academic led stuff.

Starting point is 00:07:15 So Spark, as an example, came out of Berkeley. A lot of interesting stuff at Stanford, Berkeley, MIT, kind of the usual suspects. And I think that the theory we have is that that's because you kind of, you know, industry is very good at kind of a depth search, right? Like, kind of continuing to iterate on something. But when you need to go back and fundamentally rethink how you do something, that's probably better done in academia. Completely. But I guess industry always needs, it needs motivational problems, right? And while the issues within biology are profound academic and potentially commercial importance, I think if you want to get the mass of hackers and developers out there behind something, we need to start seeing some interesting.

Starting point is 00:07:51 problems that are kind of solvable, but through interesting innovation are going to result in new companies. And I think there's tons of stuff out there, be it in gaming or wherever. I think also there's interesting intersection between academia and industry. At Stanford, there's this pervasive parallelism lab, which brings together companies in the valley, mostly big companies, but I think startups could certainly play a role there too, because I think academics are interested in these questions, but it's useful to have some grounding for where are the big problems that really need to have the biggest impact right now. Completely. And we see a lot of academics that we speak to in kind of

Starting point is 00:08:21 Cambridge, Oxford, using supercomputer methods right now, unaware that actually a distributed systems approach might be more cost-effective or even easier to think about from that perspective. So, Herman, your company improbable build simulations. Can you talk about, you know, why are simulations important? Sure, absolutely. Well, I mean, I guess there are many paths to knowledge, right? And one of the ones that people are very familiar with now, I guess big data, collecting huge amounts of information about the world and running pattern analysis on top of it. Another approach, which we're passionate about, is completely recreating. a phenomenon from the real world. I mean, I guess this is something Vijay would be familiar

Starting point is 00:08:54 with from a biological perspective, but imagine being able to model cities, model power grids, model telecoms networks, actually achieving any of that, though, you know, involve solving some of the distributed systems problems we just talked about, and also being able to think about simulation in a totally new way. And why would someone want to model us? I mean, sure, so you can answer questions, right? Answer those water of questions. What happens if, you know, a disease is released in this crowd? What happens if we shut down this tube station? Questions which governments and companies want to answer, but which, you know, you can't answer, they're just looking at data, particularly when you're considering situations that have never happened

Starting point is 00:09:24 or trying to project or understand. So it lets you kind of A-B-test the real world in the way that you can do. Yeah, but I think the problem's even deeper than that, right? Like the problem, the big problem is complexity. You know, stepping away even from technology, the problem is how do we make choices when our world is full of so many interrelated complex systems that no one person can actually hold in their head. And that's where simulation comes into its own, right?

Starting point is 00:09:44 It's interesting, like today, like so one, I'll tell you my pet theory, which is sort of in the same way that, so if you go back to like the 80s, machine learning was kind of this rebel enclave of AI, right? Like the mainstream AI I thought you could do, use rule-based systems. And this rebel enclave was like, no, no, no, you need to create statistics and have machines that learn. And now it turns out, of course, that the enclave became the dominant movement. In fact, AI and machine learning are basically synonymous today. Today, simulations and agent-based kind of reasoning, it's like just kind of like Santa Fe Institute and all these kinds of, quote, eccentric thinkers, the mainstream, if you're

Starting point is 00:10:21 look at the social sciences, all the mainstream kind of thought leaders use analytic approximations. If you look at like macroeconomics, for example, they use, you know, whatever, they have their set of equations, and they have a very, very poor track record in predicting the future. There's these rebel enclaves of agent-based thinking who haven't actually been able to really run there. I mean, so my own pet theory is a little bit like machine learning in the 80s or something, which is until machine learning couldn't happen for real until you had the infrastructure, right? You needed to have massive amounts of data. You couldn't do the kind of things like for, just take Google Translate.

Starting point is 00:10:51 example. Like rule-based systems were better than statistical systems until you had the ability to scan, you know, corpora of millions of books or something, right? I mean, it's particularly important when you consider that the times you're going to want to run simulation, you're often dealing with phenomenon that have emergent complexity. The cool stuff only happens at scale, right? So, I mean, for example, we're dealing with a group called an Institute for New Economic Thinking at Oxford. These are some amazing scientists, and they want to model the UK housing economy. Now, with 10,000, 20,000 houses, there's only so much, and actors, there's only so much you're going to be able to deduce, right? I wonder, Viji, actually, are there some things in

Starting point is 00:11:23 your domain where this emerging complexity property becomes very important? I think those are things that we're most excited about because I think, you know, analogous to what Chris was talking about, there's tons of pencil paper, analytic work that's done in physics and chemistry, but it's reaching its limits. The approximations you have to make really sort of take out a lot of the things that are the hope for what would be interesting and complicated. So we turn to simulations to give us new insights that we couldn't get from other things. If anyone listening to this wants to get an example of emerging complexity, I recommend Googling just Conway's Game of life.

Starting point is 00:11:50 This little cell-based automata that you can play with, but you see so much beauty emerging from such simple rules. Very simple. Yeah. Very simple roles. Yeah. So I think actually in physics and chemistry, this is, simulation is a dominant paradigm. It is actually really intriguing to imagine taking this to social areas, social science.

Starting point is 00:12:06 Yeah. And what, and so if you had, so if today you had your fantasy simulation scenario with, like, say, for example, you know, to model a cell or something, and you explain, like, how that would work and what kind of questions you might build to answer? Well, you know, the cell is. is interesting because a cell is more like New York City than like a sort of a dirt path or something like that. There's a lot going on. And it's that complexity that leads to all these emergent properties. And so the hope about simulating a cell is that we'd be able to sort of gain some

Starting point is 00:12:34 understanding that you couldn't get from just doing the experiment alone. Because if you could just do the experiment, then that would be fine. But there's just so much going on, it's hard to really sort capture everything. And so I think there's been a lot of excitement in cellular biophysics and cellular biology on the disease side because we think disease are sort of systemic problems. not having to do with any one single point. I think you can imagine the systemic things are sort of what's interesting to go after, and it's not just simulating a cell

Starting point is 00:12:57 and looking at the systemic properties. It would be simulating a city where maybe shutting down one bridge has effects sort of all over and making small changes even could have major effects. And it's those counter to the things are the things that I get excited about

Starting point is 00:13:09 because kind of the things that are obvious the simulation could verify. That's not very interesting. It's the discoveries of things that you would never think would be connected are, I think, where the real excitement is. And that's where we've had the biggest win. I think it scares people a little bit, particularly in some domains like economics.

Starting point is 00:13:23 I mean, analytical methods, they come with a certain degree of certainty and trust, right? I can exactly explain to you how and why this works. But emerging complexity is unpredictable. It's scary. The results may not be what you expect, and it has the potential to upset a lot of preconceived notions about what's possible. The other aspect that I think is interesting is this concept of AB testing, that you can't do this in real life as easily. You know, you can't shut down this bridge during this time and see what would happen. And so the ability to do this type of A-B testing and optimizing things before you bring into the real world is actually also exciting.

Starting point is 00:13:55 Finally, I think one of the things that we've seen in terms of these types of areas is that it always starts with heuristics. When people built bridges, people built bridges in Roman times. You know, they didn't have simulations of bridges or F-E equals M.A. or anything like that. But now with the Bay Bridge, you know, that, you know, multi-billion-dollar thing, you wouldn't just do that empirically. And so I think as the simulations and sort of analytics become better and better, we don't. don't have to use our best guesses, which is what heuristics are. We can actually really just see what's going to happen. So there's a common preconception that I wonder if Fiji or maybe Eucharist can attack, and it's an interesting idea that you often hear thinking about

Starting point is 00:14:29 simulation, which is how do you know the model is right? And could your simulation be useful unless your model is 100% accurate? I mean, what do you think of that? Yeah, I think, you know, there's a couple of things. One is that there's waves of testing on back data to see if you're right, but you're correct that actually the simulation doesn't have to be perfect to be provocative. A, there's, you know, the ideal is something where it's perfect and it gives quantitative predictions, whether of the stock market or of traffic or something like that. But a lot of times things are useful even if it just gives you an idea or a hypothesis or a new insight that you wouldn't have gotten just by thinking about it or by doing pencil paper math. And then that

Starting point is 00:15:02 hypothesis can be tested in other ways. So Herman, so your customers are using improbable to simulate cities. Can you talk about like how that might work? Sure. Awesome. Wow, to be quite concrete and to give them a little mention, Matthew Ives at Oxford and the ITLC group are very interested in modeling a large city infrastructure. So from their perspective, they see cities as interconnected layers of infrastructure where each layer actually isn't as significant as the sum total of all the layers working together in interesting ways. Now, again, the limiting factors in being able to see that emergent complexity

Starting point is 00:15:34 and to be able to poke them is enough scale and enough detail. So what we're hoping to do is for a platform, almost an OS, where they can build those sorts of simulations. But actually, I think the cool things will happen when we can go a little further than that, and not simply creating standalone models, but actually instrumenting real cities. Imagine a simulation bowed by sensor data for millions of sensors placed around a city that actually lets you... So a car in the simulation actually corresponds to a car with an Internet of Things device sitting on it. And so maybe half the entities in the simulation are actually real world entities and half could be modeled or something.

Starting point is 00:16:09 And for every event, there are knock-on consequences. If there's an accident or a traffic jam, it's possible to then extrapolate from that. potential other outcomes and scenarios that would be of interest to a wide variety of people. I mean, I guess at an improbable, we don't really believe that a simulation should be this like standalone box where knowledge comes out of. We see it as an operating platform, somewhere that you can actually make decisions and build applications that consume that simulation. That's also something that's been missing.

Starting point is 00:16:32 I mean, the community, I think, is very much, Vijay, correct me if I'm wrong here, I might be jumping out of my purview, but the community is very much inspired from supercomputer research, which was always about putting in data and getting an answer out, thinking of simulations in this more like almost, Web 2.0 appway is quite flexible constructs, is a little alien to people in the space at the moment. And I think we're going to see more of this,

Starting point is 00:16:52 because the desire to have one place where you can integrate simulation predictions and sort of experimental data, whether that's IoT experiments or whatever, that becomes very powerful. Because imagine having a million IoT devices. How do you even, like, understand what's going on and how do you put that together

Starting point is 00:17:07 and how do you get a picture of what it means? Yeah, exactly. I mean, it's quite weird. A lot of customers we've spoken to, they just want to visualize the current state of a large complex system, let alone simulating it, turns out to be quite a hard challenge. And then, you know, there are dark aspects you can't see, and if simulation can fill in the gaps between that, then suddenly you have a full picture. Completely.

Starting point is 00:17:25 I mean, there are undoubtedly amazing companies out there that have built massive distributed computing infrastructures that live on their own proprietary hardware. The question, though, is as we start to think about the kinds of applications that Chris and Vijay are talking about, where problems are not so easy to paralyze, how useful those software infrastructures are going to be in the future. I mean, I think ultimately the companies and solutions are going to start dominating the space are going to have to redo a lot of stuff at quite low layers in order to make it effective. So there may be a need to throw out some of the work that's gone before. So, you know, just on the business side of simulations, one area that people have been interested in is that increasingly there's pressure on companies, particularly around sort of core infrastructure like financial services, public safety, et cetera, to do serious disaster recovery planning.

Starting point is 00:18:12 and that includes both in the case of, let's say, cyber attacks, also physical, like, terrorist attacks. So, you know, if there is a terrorist attack, God forbid, you know, against some financial institutions or something, you know, we want to make sure that the system as a whole is robust and survives that. And so there's lots and lots of thought being put into this concept of sort of disaster planning. And it seems like an area where simulations can be quite useful. Yeah, I mean, I think even conceptually, simulations tend to allow you to consider the vulnerabilities in your infrastructure a little bit more objectively than you would if you were being totally analytical. Because it doesn't

Starting point is 00:18:49 require a human being to maybe focus on what vulnerabilities seem most obvious. For example, when we look at cascading failures in power grids, which is an area that we've explored with improbable, how those failures arise can often be the accumulation of many, many disparate, like seemingly irrelevant events and slight vulnerabilities which add up together to cause a big catastrophe. Again, I think this might even relate. perhaps to some of this stuff. These events are the sort of black swan events. Exactly.

Starting point is 00:19:16 If you look at, like, as an example, like the airplane industry has, air travel has gotten much safer over time. Unfortunately, they have a number, they had a number of crashes in order to learn from, you know, which of course is tragic, but is also from a disaster recovery planning point of view a positive because they had many data points, right? When you talk about things like massive terrorist attacks, you have one or two data point. So you have no historical pattern from which to train from. Even like a nine or greater earthquake, let's say, in the Bay Area, what do you tell people

Starting point is 00:19:47 to do? First, it would be really powerful to be able to simulate different possibilities. And the second one is in the moment having IOT-like information for what people are doing to be able to feed into making predictions for where to go from here. Instant decision making. Yeah, that combination is, you can imagine the team has seen this 100 times because the simulations have been running over the last like two years. And then in the moment, they're sort of ready for the game plan. They're getting information from IoT and they're sort of in real time deciding what to do based on what they've already run or what the simulation would

Starting point is 00:20:16 even predict would be the best thing. Indeed. Or even considering how a situation like a riot or civil unrest or a problem might evolve over time, given its current situation and given the mechanics of that group of people that the simulation is able to model and explore. I mean, these are all things that people don't even dream of doing today. I mean, and to make them possible and usable over so many different domains, I think is an immense challenge. And it's not something where you can just have a bunch of guys and say, oh, I think this is what we should do. I mean, to be data driven and to really use

Starting point is 00:20:43 the data on the ground in a way that no human could wrap their head around could really be something fantastic and could be the difference between life or death for many people. And it's another area where you can't just simulate one thing, right? It's not just about the physical effects of the earthquake on building integrity. It's about everything together.

Starting point is 00:20:59 It's about, you know, social conditions. It's about weather. And the power's out here but not there and so on. Exactly. And it's often the little details that slowly accumulate and add up and make a simulation meaningful. I mean, I'm always reminded of the, I don't know how particularly relevant this would be, I'm always reminded of the prisoner dilemma simulations

Starting point is 00:21:16 that have taken place a long time ago, very simple simulations. But as you add more detail, the results become completely different. You know, when you just run prisoners dilemma-style games between participants, okay, you get one outcome, but when you start to introduce geographic components to those simulations, suddenly it all changes again.

Starting point is 00:21:32 I mean, that's why we need a system something that will let people introduce detail at pretty much arbitrary scales in order to really get more and more accurate models. Okay, thanks guys.

Your Ad Here

a16z Podcast - The Cool Stuff Only Happens at Scale

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.