a16z Podcast - The Cool Stuff Only Happens at Scale
Episode Date: June 5, 2015Distributed computing frameworks like Hadoop and Spark have enabled processing of "big data" sets -- but that's not enough for modeling surprise/rare "black swan" or complex events.... Just think of scenarios in disaster planning (earthquakes, terrorist attacks, financial system collapse); biology (including disease); urban planning (cities, transportation, energy power grids); military defense ... and other complex systems where unknown behaviors and properties can emerge. They can't be modeled based on (by definition impossible) limited data. And parallelization for this is hard. But what if companies and governments could answer these seemingly impossible questions -- through simulations? Especially ones where we can directly merge in knowledge and cues from the real world (sensors, sensors everywhere)? CEO of Improbable Herman Narula and Stanford University professor-in-residence at a16z Vijay Pande discuss this and more with Chris Dixon in this episode of the a16z Podcast. And as Herman says, "the cool stuff only happens at scale". The views expressed here are those of the individual AH Capital Management, L.L.C. (“a16z”) personnel quoted and are not the views of a16z or its affiliates. Certain information contained in here has been obtained from third-party sources, including from portfolio companies of funds managed by a16z. While taken from sources believed to be reliable, a16z has not independently verified such information and makes no representations about the enduring accuracy of the information or its appropriateness for a given situation. This content is provided for informational purposes only, and should not be relied upon as legal, business, investment, or tax advice. You should consult your own advisers as to those matters. References to any securities or digital assets are for illustrative purposes only, and do not constitute an investment recommendation or offer to provide investment advisory services. Furthermore, this content is not directed at nor intended for use by any investors or prospective investors, and may not under any circumstances be relied upon when making a decision to invest in any fund managed by a16z. (An offering to invest in an a16z fund will be made only by the private placement memorandum, subscription agreement, and other relevant documentation of any such fund and should be read in their entirety.) Any investments or portfolio companies mentioned, referred to, or described are not representative of all investments in vehicles managed by a16z, and there can be no assurance that the investments will be profitable or that other investments made in the future will have similar characteristics or results. A list of investments made by funds managed by Andreessen Horowitz (excluding investments and certain publicly traded cryptocurrencies/ digital assets for which the issuer has not provided permission for a16z to disclose publicly) is available at https://a16z.com/investments/. Charts and graphs provided within are for informational purposes solely and should not be relied upon when making any investment decision. Past performance is not indicative of future results. The content speaks only as of the date indicated. Any projections, estimates, forecasts, targets, prospects, and/or opinions expressed in these materials are subject to change without notice and may differ or be contrary to opinions expressed by others. Please see https://a16z.com/disclosures for additional important information.
Transcript
Discussion (0)
The content here is for informational purposes only, should not be taken as legal business, tax, or investment advice, or be used to evaluate any investment or security and is not directed at any investors or potential investors in any A16Z fund. For more details, please see A16Z.com slash disclosures.
Hi, this is Chris Dixon's the A16Z podcast. I'm here with Herman Nerula from Improbable and VJ Pandey from Stanford, who's also a professor in residence here at A16Z.
Hey guys, let's talk about distributed computing.
So my own view, I guess, I'll just start it off, is that over the next few decades, distributed computing will be a particularly important topic because we're now with things like AWS awash in computing resources.
You know, compute is becoming approaching zero, storage, networking.
But most of this is, you know, on multiple physical machines.
And it's very, very hard for software developers to build.
software that distributes well.
And so we have things like, I think, of Hadoop and its successor, we think as successor
Spark, as frameworks for doing distributed computing for specific application, which
is data processing.
And I think we'll probably see more of that kind of pattern among other verticals.
And also, at the same time, more infrastructure that helps us, you know, programming languages,
frameworks, et cetera, that help us do this.
So, I don't know, maybe Vijay, if we could start off.
How do you view this?
Yeah, I mean, I think this is the key issue, right?
Because everyone can program the standard C paradigm on one processor core.
Now actually, you know, going to multi-core is not so hard.
But start thinking about multiple boxes, 1,000 boxes, 10,000 boxes.
This will drive you insane, right?
We can't be spending our time programming thinking about each of these things.
We need to have some type of abstraction.
And that seems to be the key.
That's how we've been successful in general with coding.
And Hadoop and its successors are great abstractions for,
certain things. But you can't do MapReduce with everything. And I think that's going to be
the challenge. And I think what we're going to see is verticals moving in certain directions that
can do different things. It's hard to imagine the magic language that solves all this problems.
And so picking things that can sort of attack certain key domains actually could have a huge
impact. Yeah, I mean, I'd agree. I think, Avidji, when you say you can't use MapReduce for
everything, that suggests an even deeper issue, which is a lot of the approaches people have today
for scale and for paralyzability. They really boil down to problems that are quite easy to scale
because they naturally have a simple abstraction
a way of splitting them up.
I think the next few years
are going to be about attacking problems
which are naturally harder to split,
harder to scale.
And how we go about that,
I mean, a whole other layer of reliability
is going to be required as well
for computations that are more challenging
to split up.
And that suggests to me that,
even in terms of the cloud infrastructure
that's available,
there may be completely different
characteristics that are necessary.
So I think there'll be lots of winners
and losers in this space
that no one can yet predict.
And the thing to emphasize here
is that the possible win is huge.
The difference between
what you could do on one box versus 100,000 boxes.
I mean, that's not just like I'm being impatient.
That's being transformative.
There's a couple things happening, right?
One is, I mean, they always say Morris Law's ending.
People are saying that now.
I mean, then again, people said that every year for the last 50 years.
So, but to the extent that maybe it's slowing down or whatever, that the way you're going to get additional, sort of the effects of more of the law, additional compute is going to be going across machines, not with more transistors on a single core, number one, right?
So you need to.
Number two, as you said, like, it's a different, like, it's not just like you're, like, it's not just like you're.
doing like three times as much, if you're doing 100,000 times as much, it unlocks a whole
new class of potential applications. Yeah, exactly. So like from your own area, like, I don't
know if you could, like, so in biology, for example. Yeah, in biology, you know, we launched
Folding at Home in October of 2000 now, so we're coming up on. Can you tell us people
what that is? Yeah, so Folding Home is a large-scale distributed computing project where people
go to our website, folding.spanaford.edu, they download the software. And right now,
we have about 40 petaflops worth of performance out of maybe about, you know, maybe 400,000
and processors.
It's actually interesting.
This was inspired by SETI at home?
Yeah, inspired by SETI at Home, which also was inspired by GIMS, and there's a few other
ones.
I think we were the first to do sort of something in science.
I think, you know, debate whether finding aliens of science or not.
But, you know, something where in biology...
Nonfiction science.
Yeah.
In biology, you know, the challenge was, we wanted to tackle problems that would take, let's
say, a million CPU days to do.
You know.
But now, now would it, would you, did you still need to do that kind of approach to
day, or could you do it on a cloud computing?
Yeah, I think you could do it in cloud computing, but do you think about it, I think Amazon
has roughly maybe 300,000 boxes.
So if we want to buy all of Amazon, that would be a little pricey.
Okay.
But, you know, you think about what we do with Foiling Home, it's kind of like a time machine
because what we did 10 years ago, people can now do on GPUs.
What we are doing now with 10,000 GPUs, people will probably do in the future with maybe
a small GPU cluster.
But the paradigm, the programming paradigm, the way you think about is all the same.
And therefore, we are taking advantage of Moore's law as time goes on.
And so how does the future, let's say the future you can have access to a million boxes or something like, what does it mean for the applications in biology and health care?
Yeah, I think there's a couple different ways to think about it.
I think there's sort of processing lots of data and then doing calculations either for us we do a lot of simulation.
But I think it's people usually think about the data side, but the simulation side is actually pretty intriguing too.
It's interesting.
I think all that power, it unlocks so many more problems.
I mean, how do I test my distributed application?
and how do I even run, you know, sensible diagnostics on it.
Plus, not to mention the skills shortage
and people that can really build distributed applications well.
Thinking about that alone, I think,
will start to create a whole movement of people
where this skill set becomes incredibly valuable.
You know, even in terms of languages and tools,
we're not well served today with good abstractions
to think about distributed systems.
Most of the ideas that are being used now,
like actor paradigms, for example,
which some people may be familiar with them.
These are from the 80s, right?
Nothing's really changed in how we attack
distributed system. So it should be fun to see that revolution.
Is there anything promising you see in terms of languages, frameworks, infrastructure, software?
I think right now everyone is sort of rolling their own for their domain. I mean, that's true for us.
And there's these problems that we all have that are hard to handle sort of in a generic way.
Like you have to deal with fault tolerance. You've got a million boxes or even just 10,000 boxes.
Most likely one of them will die or have some problem along the way.
and MPI, which has been the standard in super-imputing, HPC,
is very fault intolerant.
The whole job will crash and things like that.
And so those paradigms really have to change.
And I think the prominence of companies like Google,
which have all this on the back end,
I think have gotten people thinking about this,
and MapReduce and things like that.
Those abstractions have played a huge role.
I really am looking forward to seeing where people will go.
And I think Hadoop and Spark are a good example,
but I think we need much more.
Yeah, I mean, these problems are profoundly different
from scaling web services.
And I think another interesting point is that we traditionally assume large tech companies
are going to have a hegemony over large compute problems.
But with this new space, I wonder whether existing infrastructure really will be that.
One pattern we've noticed is that whereas industry, meaning probably Google and maybe Facebook-led,
data center innovation, and Amazon, sorry, over the last 15 years, we're seeing more and more
academic led stuff.
So Spark, as an example, came out of Berkeley.
A lot of interesting stuff at Stanford, Berkeley, MIT, kind of the usual suspects.
And I think that the theory we have is that that's because you kind of, you know, industry is very good at kind of a depth search, right?
Like, kind of continuing to iterate on something.
But when you need to go back and fundamentally rethink how you do something, that's probably better done in academia.
Completely.
But I guess industry always needs, it needs motivational problems, right?
And while the issues within biology are profound academic and potentially commercial importance, I think if you want to get the mass of hackers and developers out there behind something, we need to start seeing some interesting.
problems that are kind of solvable, but through interesting innovation are going to result in
new companies. And I think there's tons of stuff out there, be it in gaming or wherever.
I think also there's interesting intersection between academia and industry. At Stanford, there's
this pervasive parallelism lab, which brings together companies in the valley, mostly big companies,
but I think startups could certainly play a role there too, because I think academics are
interested in these questions, but it's useful to have some grounding for where are the big
problems that really need to have the biggest impact right now.
Completely. And we see a lot of academics that we speak to in kind of
Cambridge, Oxford, using supercomputer methods right now, unaware that actually a distributed
systems approach might be more cost-effective or even easier to think about from that perspective.
So, Herman, your company improbable build simulations. Can you talk about, you know, why are
simulations important? Sure, absolutely. Well, I mean, I guess there are many paths to knowledge,
right? And one of the ones that people are very familiar with now, I guess big data,
collecting huge amounts of information about the world and running pattern analysis on top of it.
Another approach, which we're passionate about, is completely recreating.
a phenomenon from the real world. I mean, I guess this is something Vijay would be familiar
with from a biological perspective, but imagine being able to model cities, model power grids,
model telecoms networks, actually achieving any of that, though, you know, involve solving
some of the distributed systems problems we just talked about, and also being able to think
about simulation in a totally new way. And why would someone want to model us? I mean,
sure, so you can answer questions, right? Answer those water of questions. What happens if,
you know, a disease is released in this crowd? What happens if we shut down this tube station?
Questions which governments and companies want to answer, but which, you know, you can't answer,
they're just looking at data, particularly when you're considering situations that have never happened
or trying to project or understand.
So it lets you kind of A-B-test the real world in the way that you can do.
Yeah, but I think the problem's even deeper than that, right?
Like the problem, the big problem is complexity.
You know, stepping away even from technology, the problem is how do we make choices
when our world is full of so many interrelated complex systems that no one person can actually
hold in their head.
And that's where simulation comes into its own, right?
It's interesting, like today, like so one, I'll tell you my pet theory, which is sort of
in the same way that, so if you go back to like the 80s, machine learning was kind of
this rebel enclave of AI, right?
Like the mainstream AI I thought you could do, use rule-based systems.
And this rebel enclave was like, no, no, no, you need to create statistics and have machines that learn.
And now it turns out, of course, that the enclave became the dominant movement.
In fact, AI and machine learning are basically synonymous today.
Today, simulations and agent-based kind of reasoning, it's like just kind of like Santa Fe Institute and all these kinds of, quote, eccentric thinkers, the mainstream, if you're
look at the social sciences, all the mainstream kind of thought leaders use analytic approximations.
If you look at like macroeconomics, for example, they use, you know, whatever, they have their
set of equations, and they have a very, very poor track record in predicting the future.
There's these rebel enclaves of agent-based thinking who haven't actually been able to really run there.
I mean, so my own pet theory is a little bit like machine learning in the 80s or something,
which is until machine learning couldn't happen for real until you had the infrastructure, right?
You needed to have massive amounts of data.
You couldn't do the kind of things like for, just take Google Translate.
example. Like rule-based systems were better than statistical systems until you had the ability
to scan, you know, corpora of millions of books or something, right? I mean, it's particularly
important when you consider that the times you're going to want to run simulation, you're
often dealing with phenomenon that have emergent complexity. The cool stuff only happens at
scale, right? So, I mean, for example, we're dealing with a group called an Institute for New
Economic Thinking at Oxford. These are some amazing scientists, and they want to model the UK housing
economy. Now, with 10,000, 20,000 houses, there's only so much, and actors, there's only so
much you're going to be able to deduce, right? I wonder, Viji, actually, are there some things in
your domain where this emerging complexity property becomes very important? I think those are things
that we're most excited about because I think, you know, analogous to what Chris was talking about,
there's tons of pencil paper, analytic work that's done in physics and chemistry, but it's reaching
its limits. The approximations you have to make really sort of take out a lot of the things
that are the hope for what would be interesting and complicated. So we turn to simulations
to give us new insights that we couldn't get from other things. If anyone listening to
this wants to get an example of emerging complexity, I recommend Googling just Conway's Game of
life.
This little cell-based automata that you can play with, but you see so much beauty
emerging from such simple rules.
Very simple.
Yeah.
Very simple roles.
Yeah.
So I think actually in physics and chemistry, this is, simulation is a dominant paradigm.
It is actually really intriguing to imagine taking this to social areas, social science.
Yeah.
And what, and so if you had, so if today you had your fantasy simulation scenario with, like, say,
for example, you know, to model a cell or something, and you explain, like, how that would work
and what kind of questions you might build to answer?
Well, you know, the cell is.
is interesting because a cell is more like New York City than like a sort of a dirt path or something
like that. There's a lot going on. And it's that complexity that leads to all these emergent
properties. And so the hope about simulating a cell is that we'd be able to sort of gain some
understanding that you couldn't get from just doing the experiment alone. Because if you could just
do the experiment, then that would be fine. But there's just so much going on, it's hard to really sort
capture everything. And so I think there's been a lot of excitement in cellular biophysics and
cellular biology on the disease side because we think disease are sort of systemic problems.
not having to do with any one single point.
I think you can imagine the systemic things are
sort of what's interesting to go after,
and it's not just simulating a cell
and looking at the systemic properties.
It would be simulating a city
where maybe shutting down one bridge
has effects sort of all over
and making small changes even
could have major effects.
And it's those counter to the things
are the things that I get excited about
because kind of the things that are obvious
the simulation could verify.
That's not very interesting.
It's the discoveries of things
that you would never think would be connected
are, I think, where the real excitement is.
And that's where we've had the biggest win.
I think it scares people a little bit, particularly in some domains like economics.
I mean, analytical methods, they come with a certain degree of certainty and trust, right?
I can exactly explain to you how and why this works.
But emerging complexity is unpredictable.
It's scary.
The results may not be what you expect, and it has the potential to upset a lot of preconceived notions about what's possible.
The other aspect that I think is interesting is this concept of AB testing, that you can't do this in real life as easily.
You know, you can't shut down this bridge during this time and see what would happen.
And so the ability to do this type of A-B testing and optimizing things before you bring into the real world is actually also exciting.
Finally, I think one of the things that we've seen in terms of these types of areas is that it always starts with heuristics.
When people built bridges, people built bridges in Roman times.
You know, they didn't have simulations of bridges or F-E equals M.A. or anything like that.
But now with the Bay Bridge, you know, that, you know, multi-billion-dollar thing, you wouldn't just do that empirically.
And so I think as the simulations and sort of analytics become better and better, we don't.
don't have to use our best guesses, which is what heuristics are. We can actually really
just see what's going to happen. So there's a common preconception that I wonder if Fiji or maybe
Eucharist can attack, and it's an interesting idea that you often hear thinking about
simulation, which is how do you know the model is right? And could your simulation be useful
unless your model is 100% accurate? I mean, what do you think of that? Yeah, I think, you know,
there's a couple of things. One is that there's waves of testing on back data to see if you're
right, but you're correct that actually the simulation doesn't have to be perfect to be provocative.
A, there's, you know, the ideal is something where it's perfect and it gives quantitative
predictions, whether of the stock market or of traffic or something like that. But a lot of times
things are useful even if it just gives you an idea or a hypothesis or a new insight that you
wouldn't have gotten just by thinking about it or by doing pencil paper math. And then that
hypothesis can be tested in other ways. So Herman, so your customers are using improbable to
simulate cities. Can you talk about like how that might work? Sure. Awesome. Wow, to be
quite concrete and to give them a little mention, Matthew Ives at Oxford and the ITLC group
are very interested in modeling a large city infrastructure.
So from their perspective, they see cities as interconnected layers of infrastructure
where each layer actually isn't as significant as the sum total of all the layers
working together in interesting ways.
Now, again, the limiting factors in being able to see that emergent complexity
and to be able to poke them is enough scale and enough detail.
So what we're hoping to do is for a platform, almost an OS,
where they can build those sorts of simulations.
But actually, I think the cool things will happen when we can go a little further than that,
and not simply creating standalone models, but actually instrumenting real cities.
Imagine a simulation bowed by sensor data for millions of sensors placed around a city that actually lets you...
So a car in the simulation actually corresponds to a car with an Internet of Things device sitting on it.
And so maybe half the entities in the simulation are actually real world entities and half could be modeled or something.
And for every event, there are knock-on consequences.
If there's an accident or a traffic jam, it's possible to then extrapolate from that.
potential other outcomes and scenarios that would be of interest to a wide variety of people.
I mean, I guess at an improbable, we don't really believe that a simulation should be
this like standalone box where knowledge comes out of.
We see it as an operating platform, somewhere that you can actually make decisions
and build applications that consume that simulation.
That's also something that's been missing.
I mean, the community, I think, is very much,
Vijay, correct me if I'm wrong here, I might be jumping out of my purview,
but the community is very much inspired from supercomputer research,
which was always about putting in data and getting an answer out,
thinking of simulations in this more like almost,
Web 2.0 appway is quite flexible constructs,
is a little alien to people in the space at the moment.
And I think we're going to see more of this,
because the desire to have one place
where you can integrate simulation predictions
and sort of experimental data,
whether that's IoT experiments or whatever,
that becomes very powerful.
Because imagine having a million IoT devices.
How do you even, like, understand what's going on
and how do you put that together
and how do you get a picture of what it means?
Yeah, exactly.
I mean, it's quite weird.
A lot of customers we've spoken to,
they just want to visualize the current state
of a large complex system, let alone simulating it, turns out to be quite a hard challenge.
And then, you know, there are dark aspects you can't see, and if simulation can fill in the gaps between that, then suddenly you have a full picture.
Completely.
I mean, there are undoubtedly amazing companies out there that have built massive distributed computing infrastructures that live on their own proprietary hardware.
The question, though, is as we start to think about the kinds of applications that Chris and Vijay are talking about, where problems are not so easy to paralyze, how useful those software infrastructures are going to be in the future.
I mean, I think ultimately the companies and solutions are going to start dominating the space
are going to have to redo a lot of stuff at quite low layers in order to make it effective.
So there may be a need to throw out some of the work that's gone before.
So, you know, just on the business side of simulations, one area that people have been interested in
is that increasingly there's pressure on companies, particularly around sort of core infrastructure
like financial services, public safety, et cetera, to do serious disaster recovery planning.
and that includes both in the case of, let's say, cyber attacks, also physical, like, terrorist attacks.
So, you know, if there is a terrorist attack, God forbid, you know, against some financial institutions or something, you know,
we want to make sure that the system as a whole is robust and survives that.
And so there's lots and lots of thought being put into this concept of sort of disaster planning.
And it seems like an area where simulations can be quite useful.
Yeah, I mean, I think even conceptually,
simulations tend to allow you to consider the vulnerabilities in your infrastructure a little bit
more objectively than you would if you were being totally analytical. Because it doesn't
require a human being to maybe focus on what vulnerabilities seem most obvious. For example,
when we look at cascading failures in power grids, which is an area that we've explored
with improbable, how those failures arise can often be the accumulation of many, many
disparate, like seemingly irrelevant events and slight vulnerabilities which add up together to cause
a big catastrophe. Again, I think this might even relate.
perhaps to some of this stuff.
These events are the sort of black swan events.
Exactly.
If you look at, like, as an example, like the airplane industry has, air travel has gotten
much safer over time.
Unfortunately, they have a number, they had a number of crashes in order to learn from,
you know, which of course is tragic, but is also from a disaster recovery planning point
of view a positive because they had many data points, right?
When you talk about things like massive terrorist attacks, you have one or two data
point. So you have no historical pattern from which to train from.
Even like a nine or greater earthquake, let's say, in the Bay Area, what do you tell people
to do? First, it would be really powerful to be able to simulate different possibilities.
And the second one is in the moment having IOT-like information for what people are doing
to be able to feed into making predictions for where to go from here.
Instant decision making.
Yeah, that combination is, you can imagine the team has seen this 100 times because the
simulations have been running over the last like two years. And then in the moment, they're
sort of ready for the game plan. They're getting information from IoT and they're sort of
in real time deciding what to do based on what they've already run or what the simulation would
even predict would be the best thing. Indeed. Or even considering how a situation like a riot or
civil unrest or a problem might evolve over time, given its current situation and given the
mechanics of that group of people that the simulation is able to model and explore. I mean,
these are all things that people don't even dream of doing today. I mean, and to make them possible
and usable over so many different domains, I think is an immense challenge. And it's not something
where you can just have a bunch of guys and say, oh, I think this is what
we should do. I mean, to be
data driven and to really use
the data on the ground in a way that no human could wrap
their head around could really be something
fantastic and could be the difference
between life or death for many people. And it's another
area where you can't just simulate one thing,
right? It's not just about the
physical effects of the earthquake on building integrity.
It's about everything together.
It's about, you know, social conditions.
It's about weather. And the power's out here
but not there and so on.
Exactly. And it's often the little details
that slowly accumulate and add up and make a simulation meaningful.
I mean, I'm always reminded of the,
I don't know how particularly relevant this would be,
I'm always reminded of the prisoner dilemma simulations
that have taken place a long time ago,
very simple simulations.
But as you add more detail,
the results become completely different.
You know, when you just run prisoners dilemma-style games
between participants, okay, you get one outcome,
but when you start to introduce geographic components to those simulations,
suddenly it all changes again.
I mean, that's why we need a system something
that will let people introduce detail
at pretty much arbitrary scales
in order to really get more and more accurate models.
Okay, thanks guys.