The a16z Show - The Cool Stuff Only Happens at Scale

Starting point is 00:00:00 The content here is for informational purposes only, should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund. For more details, please see A16Z.com slash disclosures. Hi, this is Chris Dixon's the A16Z podcast. I'm here with Herman Nerula from Improbable and VJ Pandey from Stanford, who's also a professor in residence here at A16Z. Hey guys, let's talk about distributed computing. So my own view, I guess, I'll just start it off, is that over the next few decades, distributed computing will be a particularly important topic because we're now with things like AWS awash in computing resources. You know, compute is becoming approaching zero, storage, networking. But most of this is on multiple physical machines. And it's very, very hard for software developers to build. software that that distributes well. And so we have things like, I think, of Hadoop and its successor,

Starting point is 00:01:05 we think as successor, Spark, as frameworks for doing distributed computing for a specific application, which is data processing. And I think we'll probably see more of that kind of pattern among other verticals. And also at the same time, more infrastructure that helps us, you know, programming languages, frameworks, et cetera, that help us do this. So, so I don't, I don't know, maybe VJ, if we could start off. How do you view this? Yeah, I mean, I think this is the key issue, right? Because everyone can program the standard C paradigm on one processor core.

Starting point is 00:01:37 Now actually, you know, going to multi-core is not so hard, but start thinking about multiple boxes, thousand boxes, 10,000 boxes. This will drive you insane, right? We can't be spending our time programming thinking about each of these things. We need to have some type of abstraction. And that seems to be the key. That's how we've been successful in general with coding. And Hadoop and its successors are great abstractions for certain things. but you can't do MapReduce with everything.

Starting point is 00:02:00 And I think that's going to be the challenge. And I think what we're going to see is verticals moving in certain directions that can do different things. It's hard to imagine the magic language that solves all this problems. And so picking things that can sort of attack certain key domains actually could have a huge impact. Yeah, I mean, I'd agree. I think, Avigee, when you say you can't use MapReduce for everything, that suggests an even deeper issue, which is a lot of the approaches people have today for scale and for paralyzability. They really boil down to problems that are quite easy to scale because they naturally

Starting point is 00:02:28 have a simple abstraction, a way of splitting them up. I think the next few years are going to be about attacking problems which are naturally harder to split, harder to scale. And how we go about that, I mean, a whole other layer of reliability is going to be required as well for computations that are more challenging to split up. And that suggests to me that even in terms of the cloud infrastructure that's available, there may be completely different characteristics

Starting point is 00:02:49 that are necessary. So I think there'll be lots of winners and losers in this space that no one can yet predict. And the thing to emphasize here is that the possible win is huge. the difference between what you could do on one box versus 100,000 boxes. I mean, that's not just like I'm being impatient. That's being transformative. There's a couple things happening, right?

Starting point is 00:03:05 One is, I mean, they always say Morris Law's ending. People are saying that now. I mean, then again, people said that every year for the last 50 years. But to the extent that maybe it's slowing down or whatever, that the way you're going to get additional sort of the effects of Moore's Law, additional compute is going to be going across machines with more transistors on a single core, number one, right? So you need to.

Starting point is 00:03:23 Number two, as you said, like, it's a different, like, It's not just like you're doing like three times as much. If you're doing 100,000 times as much, it unlocks a whole new class of potential applications. Yeah, exactly. So, like, from your own area, like, I don't know if you could, like, so in biology, for example. Yeah, in biology, you know, we launched folding at home in October of 2000 now, so we're coming up on. Can you tell us people what that is? Yeah, so folding home is a large-scale distributed computing project where people go to our website,

Starting point is 00:03:48 folding.comfodd-sneaford.edu. They download the software. And right now, we have about 40 petaflops worth of performance out of maybe about, you maybe 400,000 processors. It's actually interesting. This was inspired by SETI at Home? Yeah, inspired by SETI at Home, which also was inspired by GIMS, and there's a few other ones.

Starting point is 00:04:04 I think we were the first to do sort of something in science. I think, you know, debate whether finding aliens of science or not. But, you know, something where in biology... Nonfiction science. Yeah. In biology, you know, the challenge was we wanted to tackle problems that would take, let's say, a million CPU days to do. You know.

Starting point is 00:04:22 But now would it, would you... Did you still need to do it? do that kind of approach today, or could you do it on a cloud computing? Yeah, I think you could do on cloud computing, but do you think about it, I think Amazon has roughly maybe 300,000 boxes. So if we want to buy all of Amazon, that would be a little pricey. Okay. But you know, you think about what we do with Foiling Home, it's kind of like a time machine, because what we did 10 years ago, people can now do on GPUs.

Starting point is 00:04:47 What we are doing now with 10,000 GPUs people will probably do in the future with maybe a small GPU cluster. But the paradigm, the programming paradigm, the way you think about all the same. And therefore, we are taking advantage of Moore's law as time goes on. And so how does the future, let's say the future you can have access to a million boxes or something like, what does it mean for the applications in biology and health care? Yeah, I think there's a couple different ways to think about it. I think there's sort of processing lots of data and then doing calculations either for us, we do a lot of simulation, but I think it's people usually think about the data side, but the simulation side is actually

Starting point is 00:05:20 pretty intriguing too. It's interesting. I think, you know, all that power, it unlocks so many more problems. I mean, how do I test my just? distributed application. How do I even run, you know, sensible diagnostics on it? Plus, not to mention the skills shortage and people that can really build distributed applications well. Thinking about that alone, I think, will start to create a whole movement of people where this skill set becomes incredibly valuable. You know, even in terms of languages and tools, we're not well served today with good abstractions to think about distributed systems. Most of the ideas that are being used now, like actor paradigms, for example, which some people may be

Starting point is 00:05:52 familiar with. I mean, these are from the 80s, right? Nothing's really changed in how we attack distributed systems. So it should be fun to see that revolution. Is there anything promising you see in terms of languages, frameworks, infrastructure, software? I think right now everyone is sort of rolling their own for the domain. I mean, that's true for us. And there's these problems that we all have that are hard to handle sort of in a generic way. Like you have to deal with fault tolerance.

Starting point is 00:06:16 You've got a million boxes or even just 10,000 boxes. Most likely one of them will die or have some problem along the way. and MPI, which has been the standard in super-computing HPC, is very fault intolerant. You know, the whole job will crash and things like that. And so those paradigms really have to change. And I think the prominence of companies like Google, which have all this on the back end,

Starting point is 00:06:39 I think have gotten people thinking about this, and MapReduce and things like that, those abstractions have played a huge role. I really am looking forward to seeing where people will go. And I think Hadoop and Spark are a good example, but I think we need much more. Yeah, I mean, these problems are profoundly different from scaling web services.

Starting point is 00:06:52 And I think another interesting point is that we traditionally assume large tech companies are going to have a hegemony over large compute problems. But with this new space, I wonder whether existing infrastructure really will be that. One pattern we've noticed is that whereas industry, meaning probably Google and maybe Facebook led data center innovation and Amazon, sorry, over the last 15 years, we're seeing more and more academic led stuff. So Spark, as an example, came out of Berkeley. A lot of interesting stuff at Stanford, Berkeley, MIT, kind of the usual suspect.

Starting point is 00:07:21 And I think that the theory we have is that that's because you kind of, you know, industry is very good at kind of a depth search, right? Like, you know, continuing to iterate on something. But when you need to go back and fundamentally rethink how you do something, that's probably better done in academia. Completely. But I guess industry always needs, it needs motivational problems, right? And while the issues within biology are profound academic and potentially commercial importance, I think if you want to get the massive hackers and developers out there behind something, we need to start seeing some interesting. problems that are kind of solvable, but through interesting innovation are going to result in new companies. And I think there's tons of stuff out there, be it in gaming or wherever. I think also there's interesting intersection between academia and industry. At Stanford, there's this pervasive parallelism lab, which brings together companies in the valley, mostly big companies, but I think startups could certainly play a role there too, because I think academics are interested in these questions, but it's useful to have some grounding for where are the big

Starting point is 00:08:17 problems that really need to have the biggest impact right now. Completely. And we see a lot of academics that we speak to in kind of Cambridge, Oxford, using supercomputer methods right now, unaware that actually a distributed systems approach might be more cost-effective or even easier to think about from that perspective. So, Herman, your company, Improbable, build simulations. Can you talk about, you know, why are simulations important? Sure, absolutely. Well, I mean, I guess there are many paths to knowledge, right? And one of the ones that people are very familiar with now, I guess big data, collecting huge amounts of information about the world and running pattern analysis on top of it.

Starting point is 00:08:47 Another approach, which we're passionate about, is completely recreating. a phenomenon from the real world. I mean, I guess this is something Vigé would be familiar with from a biological perspective, but imagine being able to model cities, model power grids, model telecoms networks, actually achieving any of that, though, you know, involve solving some of the distributed systems problems we just talked about, and also being able to think about simulation in a totally new way. And why would someone want to model us? I mean, sure, so you can answer questions, right? Answer those water of questions. What happens if, you know, a disease is released in this, in this crowd? What happens if we shut down this tube station? Questions which government and

Starting point is 00:09:19 companies want to answer, but which, you know, you can't answer just looking at data, particularly when you're considering situations that have never happened or trying to project or understand. So let you kind of A-B-test the real world in the way that you can do. Yeah, but I think the problem's even deeper than that, right? Like the problem, the big problem is, you know, stepping away even from technology, the problem is, how do we make choices when our world is full of so many interrelated complex systems that no one person can actually hold in their head.

Starting point is 00:09:42 And that's where simulation comes into its own, right? It's interesting, like today, like, so one, I'll tell you my pet theory, which is sort of in the same way that, so if you go back to like the 80s, machine learning was kind of this rebel enclave of AI, right? Like the mainstream AI thought you could do, use rule-based systems. And this rebel enclave was like, no, no, you need to create statistics and have machines that learn.

Starting point is 00:10:03 And now it turns out, of course, that the enclave became the dominant movement. In fact, AI and machine learning are basically synonymous today. Today, simulations and agent-based kind of reasoning, it's like just kind of like Santa Fe Institute and all these kinds of quote eccentric thinkers, the mainstream, if you look at the social sciences, all the mainstream kind of thought leaders use analytic approximations.

Starting point is 00:10:26 If you look at macroeconomics, for example, they use whatever. They have their set of equations, and they have a very, very poor track record in predicting the future. There's these rebel enclaves of agent-based thinking who haven't actually been able to really run there. I mean, so my own pet theory is a little bit like machine learning in the 80s or something, which is until machine learning couldn't happen for real

Starting point is 00:10:45 until you had the infrastructure, right? You needed to have massive amounts of data. You couldn't do the kind of things like to take Google Translate an example. Like rule-based systems were better than statistical systems until you had the ability to scan, you know, corpora of millions of books or something, right? I mean, it's particularly important when you consider that the times you're going to want to run simulation. You're often dealing with phenomenon that have emergent complexity. The cool stuff only happens at scale, right?

Starting point is 00:11:08 So, I mean, for example, we're dealing with a group called an Institute for New Economic Thinking at Oxford. These are some amazing scientists, and they want to model the UK housing economy. Now with 10,000, 20,000 houses, there's only so much enacters, there's only so much you're going to be able to deduce, right? I mean, I wonder, Viji, actually, are there some things in your domain where this emerging complexity property becomes... Yeah, I think those are the things that we're most excited about because I think, you know, analogous to what Chris was talking about, there's tons of pencil paper, analytic work that's done in physics and chemistry, but it's reaching its limits. The approximations you have to make really sort of take out a lot of the things that are the hope for what would be interesting and complicated. So we turn to simulations to give us new insights that we couldn't get from other things.

Starting point is 00:11:44 If anyone listening to this wants to get an example of emerging complexity, I recommend Googling just Conway's Game of Life. This little cell-based automata that you can play with, but you see so much beauty emerging from such simple roles. Very simple. Yeah, very simple. Yeah. So I think actually in physics and chemistry, this is,

Starting point is 00:12:00 simulation is a dominant paradigm. It is actually really intriguing to imagine taking this to social areas, social science. Yeah. And what, and so if you had, so if today you had your fantasy simulation scenario with, like, say, for example, you know, to monitor, model a cell or something. Can you explain like how that would work and what kind of questions you might build to answer? Well, you know, the cell is interesting because a cell is more like

Starting point is 00:12:21 New York City or, you know, than like a sort of a dirt path or something like that. There's a lot going on. And it's that complexity that leads all these immersion properties. And so the hope about simulating a cell is that we'd be able to sort of gain some understanding that you couldn't get from just doing the experiment alone. Because if you could just do the experiment, then that would be fine. But there's just so much going on, it's hard to really sort of capture everything. And so I think there's been a lot of excitement in cellular biophysics and cellular biology on the disease side because we think diseases are sort of systemic problems, not having to do with any one single point. I think you can imagine these systemic things are sort of what's interesting to go after,

Starting point is 00:12:55 and it's not just simulating a cell and looking at the systemic properties. It would be simulating a city where maybe shutting down one bridge has effects sort of all over and making small changes even could have major effects. And it's those counter to the things are the things that I get excited about, because kind of the things are obvious the simulation could verify. that's not very interesting. It's the discoveries of things that you would never think would be connected are, I think, where the real excitement is, and that's where we've had the biggest wins.

Starting point is 00:13:20 I think it scares people a little bit, particularly in some domains like economics. I mean, analytical methods, they come with a certain degree of certainty and trust, right? I can exactly explain to you how and why this works, but emerging complexity is unpredictable. It's scary. The results may not be what you expect, and it has the potential to upset a lot of preconceived notions about what's possible. The other aspect that I think is interesting is this concept. of the AB testing that you can't do this in real life as easily.

Starting point is 00:13:46 You know, you can't shut down this bridge during this time and see what would happen. And so the ability to do this type of AB testing and optimizing things before you bring into the real world is actually also exciting. I think, finally, I think one of the things that we've seen in terms of these types of areas is that it always starts with heuristics. When people built bridges, people built bridges in Roman times, you know, they didn't have simulations of bridges or F equals MA or anything like that. But now with the Bay Bridge, you know, that, you know, multi-billion dollar thing,

Starting point is 00:14:12 wouldn't just do that empirically. And so I think as the simulations and sort of analytics become better and better, we don't have to use our best guesses, which is what heuristics are. We can actually really just see what's going to happen. So there's a common preconception that I wonder if Fiji or maybe Eucharican attack, and it's an interesting idea that you often hear thinking about simulation, which is how do you know the model is right? And could your simulation be useful unless your model is 100% accurate? I mean, what do you think of that, Vichai? Yeah, I think, you know, there's a couple of things. One is that there's ways of testing on back data to see if you're right. But you're correct that actually the simulation doesn't have to be perfect to be provocative.

Starting point is 00:14:45 A, there's, you know, the ideal is something where it's perfect and it gives quantitative predictions, whether of the stock market or of traffic or something like that. But a lot of times things are useful even if it just gives you an idea or a hypothesis or a new insight that you wouldn't have gotten just by thinking about it or by doing pencil paper math. And then that hypothesis can be tested in other ways. So Herman, so your customers are using improbable to simulate cities. Can you talk about like how that might work? Sure, awesome. Wow, to be quite concrete and to give them a little mention,

Starting point is 00:15:14 Matthew Ives at Oxford and the ITRC group are very interested in modeling large city infrastructure. So from their perspective, they see cities as interconnected layers of infrastructure where each layer actually isn't as significant as the sum total of all the layers working together in interesting ways. Now, again, the limiting factors in being able to see that immersion complexity and to be able to poke them is enough scale and enough detail. So, you know, what we're hoping to do is part of platform,

Starting point is 00:15:39 almost an OS where they can build those sorts of simulations. But actually, I think the cool things will happen when we can go a little further than that. Not simply creating standalone models, but actually instrumenting real cities. Imagine a simulation bowed by sensor data for millions of sensors placed around a city that actually lets you. So a car in the simulation actually corresponds to a car with a Internet of Things device sitting on it. Absolutely. And so maybe half the entities in the simulation are actually real world entities and half could be modeled or something. And for every event, there are knock-on consequences.

Starting point is 00:16:11 If there's an accident or a traffic jam, you know, it's possible to then extrapolate from that potential other outcomes and scenarios that would be of interest to a wide variety of people. I mean, I guess that it would improbable, we don't really believe that a simulation should be this like standalone box where knowledge comes out of. We see it as an operating platform somewhere that you can actually make decisions and build applications that consume that simulation. That's also something that's been missing. I mean, the community, I think, is very much, Vijay, correct me if I'm wrong here, I might be jumping out of my purview. But the community is very much inspired from supercomputer research, which was always about putting in data and getting an answer out. Thinking of simulations in this more like almost Web 2.0 app way is quite flexible constructs is a little alien to people in the space at the moment. And I think we're going to see more of this because the desire to have one place where you can integrate simulation predictions and sort of experimental data, whether it's IoT experiments or whatever, that becomes very powerful.

Starting point is 00:17:01 Because imagine having a million IoT devices. How do you even understand what's going on and how do you put that together and how do you get a picture of what it means? Yeah, exactly. I mean, it's quite weird. A lot of customers we've spoken to, they just want to visualize the current state of a large complex system, let alone simulating it. Turns out to be quite a hard challenge. And then, you know, there are dark aspects you can't see and if simulation can fill in the gaps between that, then suddenly you have a full picture. Completely.

Starting point is 00:17:26 I mean, there are undoubtedly amazing companies out there that have built massive distributed computing infrastructures that live on their own proprietary hardware. The question, though, is we start to think about the kinds of applications that Chris InVigier are talking about, where problems are not so easy to paralyze, how useful those software infrastructures are going to be in the future. I mean, I think ultimately the companies and solutions are going to start dominating the space are going to have to redo a lot of stuff at quite low layers in order to make it effective. So there may be a need to throw out some of the work that's gone before. So just on the business side of simulations, one area that people have been interested in is that increasingly there's pressure on companies, particularly around, sort of core infrastructure like financial services, public safety, et cetera, to do serious disaster recovery planning. And that includes both in the case of, let's say, cyber attacks, also physical, like terrorist attacks. So, you know, if there is a terrorist attack,

Starting point is 00:18:20 God forbid, you know, against some financial institutions or something, you know, we want to make sure that the system as a whole is robust and survives that. And so there's lots and lots of thought being put into this concept of sort of disaster planning. And it seems like an area where simulations can be quite useful. Yeah, I mean, I think even conceptually, simulations tend to allow you to consider the vulnerabilities in your infrastructure a little bit more objectively than you would if you were being totally analytical, because it doesn't require a human being to, you know, maybe focus on what vulnerabilities seem most obvious.

Starting point is 00:18:55 For example, when we look at cascading failures in power grids, which is an area that, you know, we've explored it with improbable, how those failures arise can often be the accumulation of many, many disparate, like seemingly irrelevant events and slight vulnerabilities which add up together to cause a big catastrophe. Again, I think this might even relate perhaps to some of the stuff. These events are these are these sort of black swan events. Exactly. Exactly. If you look at, like, as an example, like the airplane industry has, air travel has gotten much safer over time. Unfortunately, they have a number, they had a number of crashes in order to learn from, you know, which of course is tragic, but is also from a

Starting point is 00:19:31 disaster recovery planning, point of view, a positive because they had many data points, right? When you talk about things like massive terrorist attacks, you have one or two data points. So you have no historical pattern from which to train from. Even like a nine or greater earthquake, let's say, in the Bay Area, what do you tell people to do? First, to be really powerful to be able to simulate different possibilities. And the second one is in the moment having IoT-like information for what people are doing to be able to feed into making predictions for where to go from here. Instant decision making. Yeah.

Starting point is 00:20:00 That combination, you can imagine the team has seen this 100 times because the simulations have been running over the last two years. And then in the moment, they're sort of ready for the game plan. They're getting information from IoT, and they're sort of in real time deciding what to do based on what they've already run or what the simulation would even predict would be the best thing. Indeed, or even considering how a situation like a riot or civil unrest or a problem might evolve over time, given its current situation

Starting point is 00:20:25 and given the mechanics of that group of people that the simulation is able to model and explore. I mean, these are all things that people don't even dream of doing today. I mean, and to make them possible and usable over so many different domains, I think is an immense challenge. And it's not something where you can just have a bunch of guys and say, oh, I think this is what we should do. I mean, to be data driven and to really use the data on the ground in a way that no human could wrap their head around could really be something fantastic. And could be the difference between life or death for many people. And it's another area where you can't just simulate one thing, right? It's not just about the physical effects of the earthquake on building integrity.

Starting point is 00:20:57 It's about everything together. It's about social conditions. It's about whether... And the car's out here but not there and so on. Exactly. And it's often the little details that slowly accumulate and add up and make a simulation meaningful. I mean, I'm always reminded of the... I don't know how particularly relevant this would be. I'm always reminded of the prisoner dilemma simulations that have taken place a long time ago, very simple simulations.

Starting point is 00:21:19 But as you add more detail, the results become completely different. You know, when you just run prisoner dilemma-style games between participants, okay, you get one outcome. But when you start to introduce geographic components to those simulations, suddenly it all changes again. I mean, that's why we need a system something that will let people introduce detail at pretty much arbitrary scales in order to really get more and more accurate models. Okay, thanks guys.

The a16z Show - The Cool Stuff Only Happens at Scale

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.