a16z Podcast - a16z Podcast: It's Complicated

Starting point is 00:00:00 Hi everyone, welcome to the A6 and Z podcast. I'm Sonal. Today's episode is one of our hallway conversations where we just riff on a topic for a bit. And the topic we're talking about today is the theme of complicated. And to give you more context for this, we have A6 and Z board partner, Steven Sinovsky, who has written in the past about systems where the back end is really complicated. And the front end is deceptively simple. And this tension is also a common theme in design. We have A6 and Z research and deal team head, Frank Chen, who has talked a lot about AI and deep learning, and that's relevant here because those are complex systems that learn. And finally, we have Sam Arbisman, who is a complexity scientist and who also got his PhD in computational biology. And he has a new book out called Overcomplicated. So it all fits together. All right, guys, let's just get started. I'm excited to talk about this topic. So Sam, I was reading the book. And one of the first things that occurred to me is I wanted to ask you my favorite product manager interview question of all time. My question is, so how do phones work? How do phones work. Like an iPhone, smartphone, any phone? You pick any phone, even the simplest landline and

Starting point is 00:01:06 tell me how it works. Oh, boy. I'm going to show my ignorance probably really, really quickly. And yeah, I know you dial. And then actually, I was going to say there's some sort of packet switching thing. I guess it really depends if you're using kind of an IP phone or not. Yeah, and frankly, I don't know. Well, and I think you hit on it. And frankly, I think most people do not know. And we've You've been shielded from that complexity. The reason I ask is because that's what really jumped out at me when I was reading the book, which is like we create systems that nobody understands. And so it turns out like you can ask a million product managers how to a phone work.

Starting point is 00:01:44 Some actually do exactly what you did. Well, you dial it. And then the next question is, well, tell me about the electromagnetic stuff behind dialing and what is that or touchtones. How do you generate those frequencies? And then you leaped immediately to packet switching, which of course skipped the whole. Oh, totally. Yeah. No, it's like, yeah, I was jumping to a couple things that I was vaguely familiar with. And that was right. And then like, how does your voice turn into one of those things to begin with? Yeah. My question to product managers was, okay, when you go home after this interview, you're going to send me a nice email to thank me. How does that email get to me? Right. And the same exact thing, right, this cascade of technology that's layer and layer. And so what you're looking for if you're trying to find a technical one is how deep down the stack can you go in answering that question? Oh my God, that's so funny. You guys literally just ask the exact same question, but in different forms. Steven and I are actually, you know, twins. We share a mother, but, you know, he's the Jewish version. I'm the Asian version.

Starting point is 00:02:40 Why does it even matter to know these things? I mean, okay, beyond being a product manager that you're, or you're interviewing, you know, you're trying to find out their skills in the enterprise. Does it really matter for us as users, as consumers, to really know how our systems work? I only care that things are working. So I think, and for the most part, a user, they can, they can just use. things and often be blissfully unaware, and it seems like it's fine. I think the major problem, though, is that it's one thing to say that, oh, there's some experts somewhere who can understand the system in its entirety and really knows what's going on where we can kind of outsource our understanding to them. But more and more, there's really no one who can understand. It's exactly the point that's being made here. And so really, when no one fully understands it, it's incumbent

Starting point is 00:03:20 upon each of us to at least have some way of thinking about these systems, at least some sort of like glimpse into what's happening, sometimes underneath the kind of fairly simple interface into the underlying complexity. Because oftentimes we think we understand the system and then we're confronted with a bug or some other kind of unexpected behavior. And then we realize there's a gap between how we thought it would work and how it actually does work. One of the things that I think is so interesting is not being able to understand it has become like a almost a cool thing, like the one person who understands this one part of the system. You know, and even the words that we use like hack and kludge and stuff, they're now like cool.

Starting point is 00:03:56 Like hack is gone from like a problem to like we now celebrate it with hackathons. And so I'm trying to understand or think about, you know, why is it good to embrace the complicated nature of things or the complex nature of things? And when is it detrimental to society to do that? Like when is a hack like, wow, that's not like I don't want my cat scan machine to be hacked, but I'm okay if like a word processor is hacked. Yeah, and I think it's more about recognizing that too often this is just. just the way of the world, that they're just all around us. Certainly, in the way, when you're confronted with a large system, some large technological system, like a piece of software or whatever, oftentimes the only way to change it is through

Starting point is 00:04:38 those kind of clutches or hacks because it's this kind of iterative tinkering at the edges approach, which ends up meaning that you add something to it, it's not pretty, it gets the job done. The downside of that, of course, is you don't fully understand what's going on as more and more of these accrete than suddenly you're left with this like impenetrable mess. I do think, though, ideally, we should be deliberate in how we grow these systems and change them over time. Certainly, if we're building something from scratch, we should try to be as logical and drift away from the Kluge and kind of the Klugey approach. At the same time, though, these systems, they're not always fully engineered. They're almost grown. And when they're

Starting point is 00:05:15 evolved. And then like, then you often get the kind of all the, the terminology from evolution of kind of like evolved feature or repurposing some other kind of phenotypic thing or, there's like a whole bunch of like obsolete code in there. And I think then you kind of realize, oh, actually these systems, when they get big enough, they end up looking almost biological. Well, thinking about websites or mobile apps getting very, very big, they are almost all biological now because it's impossible for any single person to understand. I mean, you have a CTO and you have an architect. But if you look at what happens inside companies as these complicated sites are actually being built, what happens when you have a very complicated change is you have this entity called the Change Review Board. convene. And it's 12 people in a room, one representing the network, one representing storage, one representing servers, one representing application development. And you have to sort of review every change and basically say it out loud and say, oh, have I not thought through what this change is going to mean in your world? And so you have all of these people who need to convene

Starting point is 00:06:14 to vet changes before they actually get pushed into production. And then the reverse happens when problems occur. So when you have an outage, right, and I can tell you this is probably have hidden inside Niantic on a daily basis right now as Pokemon Go explodes in popularity, you have those exact same people convened to try to figure out what is causing it, right? When people can't log in, what's causing that? And is it a network problem? Is it a storage problem? Is it a database problem? No single person can understand it. And so you need to have groups of people to try to figure things out. And it's not just back-end things. I mean, I'm thinking of examples where it can touch our lives in very personal concrete ways.

Starting point is 00:06:51 And the classic example that comes to mind from you whenever we talk about this topic is, self-driving cars and the decisions the algorithm makes. I mean, that's a case where you can certainly code into it certain principles, like it should behave in this way under certain conditions. But as it learns, as a system learns, and we're not aware of exactly what it's learning and how it's learning and it gets increasingly more complicated, that's something that can affect us in very tangible ways. Yeah, I think this is one of the fascinating changes to the way computers are being programmed increasingly, right? So for basically up until this point in time, programming has been functional and procedural, which is I have if loops and else loops,

Starting point is 00:07:27 and I tell it. And what I'm trying to do is predict enough state so that the computer can make the right decision, if this do that, else do this. With the introduction of deep learning, what you have baked into these computer systems is a probabilistic reasoning system, which is if I see this input, I think I should do X, and how are we going to marry these two worlds of procedural computer programmer tells you explicitly what to do? in every case, and this probabilistic reasoning, which is, well, I've seen this road before and I think the right thing to do is turn right. So picking up on that, I'm curious how, in a sociological sense, like, because there have been, like, say, 75 years of computers being these exact, precise things,

Starting point is 00:08:10 and you know, and you use the analogy of physics and biology in the book, and, like, how is it, what needs to happen for when, for people to think that computers are biological, that, like, hey, it's okay if it has this goofy horn growing out the side of it. Evolution will eventually get rid of it. Because my experience has been that people have like a pretty low tolerance for error with anything that comes out of a computer. Like you use an example in the book that hit really home to me. When you're working with an advanced piece of software, such as our gargantuan,

Starting point is 00:08:42 which I'll assume you meant as a positive, a gargantuan word processing tool. And the end notes in your document go. and I'll quote haywire, don't panic. Instead, look at what went wrong. And I have to tell you, I've been on a lot of support calls with people with problems with word. And trying to say don't panic hasn't really worked for me personally. It didn't work. Oh, it can be very kind of. And you're a writer, I'm guessing like when, and actually I've been on calls with super famous writers. And don't panic just, it never worked, ever. And I tried. You know, oh, look, it's biological. Don't fret. the next evolution, you know, Darwin will take care of it.

Starting point is 00:09:26 Perhaps that advice is a little bit more theoretical than practical at this point. One thing is that even when we're in the realm of like more traditional iterative and procedural kind of functional programming, once you deal with like huge numbers of edge cases, you can actually still quite easily build systems you don't fully understand, but especially as we move more into this world of like new types of machine learning and deep learning, I think we need to kind of think more consciously about approaching them biologically. And I think we can see some of these kinds of hints happening.

Starting point is 00:09:54 So, like, for example, Netflix, they have this chaos monkey suite of tools where the tool will periodically take subsystems out of commission and see how the overall system responds live. It'll just kind of knock out portions of Netflix and see how it responds. And the idea is to lower the gap between how they assume the system works and how it actually does work in order to make it as robust as possible. And it turns out in biology, this is actually one of the ways you learn about a living thing. So, for example, let's say you have one type of bacteria, and you want to really understand how the genes interact, what genes are important for which different kinds of things. You can actively try to mutate it, irradiate it or subject it to some sort of chemical.

Starting point is 00:10:33 And thereby seeing how, as you knock out certain parts of the genome, it actually affects it. And I think people are beginning to use these more biological techniques to really understand their systems. Now, of course, it's one thing to do that when you're building the system. It's another thing to say, don't panic, just start tinkering with your word processing. and you'll be fine when you've lost all your unnotes. It's a lot easier to just freak out and kind of go crazy. It'll take some time. We'll get there slowly but surely, hopefully.

Starting point is 00:10:58 Yeah, Chaos Monkey is a great example of this big shift that's happened inside data centers precisely because we had to introduce biological thinking rather than physics thinking into even designing and troubleshooting these systems. The way I'm using the terms physics and biological thinking kind of as two different modes, and of course it's an oversimplification, is the physics mindset might be to write a single equation that explains a good fraction of what's going on. So it might maybe explain like 60% of what's happening within a system. The biological thinking approach says, well, these things are, they're very, they're very complex. They've evolved over time. There's sort of this

Starting point is 00:11:31 organic messiness. We actually need to focus much more on the details of the system, maybe understanding subsystems or kind of different components of what's happening within a living organism in the hopes that eventually you create this broader picture. Because in this biological mindset is the idea that the details really matter. It's wonderful if you can write an equation that explains 60% of what's going on, but it turns out the remaining 40% is really, really important when you're trying to make sure something works and really works properly, especially when it comes to technology. Now, of course, I mean, there are many physicists who dwell in details,

Starting point is 00:12:03 and there are many biologists who have grand theories and computational models. So it's not a perfect way of describing the two different groups of scientists, but they're kind of two different mindsets in how we approach the natural world. But increasingly, it's also a really good framework for thinking about how we approach the built world. And I think we need to kind of import some of that biological thinking that recognizes the details and kind of this iterative tinkering approach to understanding a technology to actually understand it fully, or at least partway, as we continue to build them bigger and bigger. So when I read that analogy, what leaptomine for me is in the data center over the last 20 years, we've done a big transition from whose data center do you want to look like. And that transition went from a Wall Street bank to Facebook or Netflix. And I would argue that the Wall Street banks build physics thinking into their data centers,

Starting point is 00:12:54 which is you had these massive Sun servers and EMC arrays and Oracle databases, and you paid attention to every single one of them because if one of them went down, you were screwed. But the benefit of knowing one of these things going down is you knew where to look. Right. And then if you look at the Netflix or Facebook data center, they sort of took the exact opposite view, which is any server, any disk drive, any process. That could die at any single time. But we still want the Netflix feed to work and we want the news feed to work. And the system needs to survive any given failure. And that's sort of the big change. And so I would argue that most modern data centers, which are built on microservices architectures, scale out architectures, are designed with sort of this biological thinking in mind, which is any single instance. or diskrive or server can vanish, but we need to make sure that the entire service doesn't grind to a hole. When you look at the types of terms used

Starting point is 00:13:48 to describe those types of data centers, like resilient or robustness, like these are the types of terms that are often used when thinking about an ecosystem or living organism. And I think that is very symptomatic of the idea that they have much more in line

Starting point is 00:14:02 with kind of biological modes of thought than physics modes. That in itself was like a major evolutionary point in the delivery of computing to people. I mean, I remember we used, And my old job, we were working on, like, on Netflix, basically. It was the way to distribute video. And we talked about, like, having data center employees, like literally on roller skates who were going to run around swapping out disk drives.

Starting point is 00:14:23 And the whole system actually couldn't work because they started doing the math on how quickly they would need to replace disk drives. And then all along comes Google. And they basically pioneered this whole notion that, like, all the disk drives, it's not like they're likely to fail. it's that they will fail. And so it was designing a whole system on the presumption of continuous failure, which was like a complete inversion from all the other systems that had been designed. In a sense, I think that the whole software of a service notion has made the back end of the services sort of designed in a biological way.

Starting point is 00:15:00 But I'm still fascinated by the fact that the people at the end of the services still think of them as physics. I just don't see a tolerance for failure because what happens is immediately people start thinking, well, fine, it's cool if it's Gmail and it's down for 18 minutes. I guess I could survive. But like that same thought in an airplane kind of freaks me out. One of the big innovations that I'm looking for as we switch from this deterministic to more probabilistic population-based is the way that we design, test, verify, monitor, and recover from paliers has got to change. And we're in the midst of that transition right now, which is if you look at monitoring tools, they're going from, you know, sort of HP Open View to things like

Starting point is 00:15:42 Signal FX, which is you're looking at populations of servers rather than individual servers. So one of the things I've been wondering is what's the big breakthrough that we need to verify the output of deep learning systems, right? Which is if these things are inherently probabilistic, how do we test them? How do we give people the assurance that it feels like physics at the end, right? But inside it's biology. And frankly, when it comes to biology, we as humans are actually conditioned to accept this inherent complexity. I mean, you go to the doctor, they can't figure out what's wrong with you. You go to another doctor and you keep doing that.

Starting point is 00:16:14 And you hear this narrative, you know, and even though it's very frustrating, it's almost accepted. And I wonder if we'd ever get to the same point with our computing systems in terms of expectations. I mean, it'll definitely take a new mindset. The perspective, I think people are going to need to eventually embrace to a certain degree is I would say almost like a humility in the face of technology. And I think oftentimes we kind of tend towards two. extremes of either, like, when we don't fully understand a system, when we maybe are confronted with kind of the biological messiness, we either freak out, or we say this is like so incredibly complicated that there's like this like reverential awe, almost religious sense of the system of

Starting point is 00:16:49 like, it's so beautiful, it's so wonderful, we're never going to fully understand it. And I think both extremes, they end up cutting off questioning and like trying to actually understand the system, even if we can never fully understand it, whether or not you're the designer or even just the user, I think we need to kind of recognize that there's going to be this almost like humble approach to our technological systems where it's going to be okay if we don't fully understand these things and if they do occasionally fail because ultimately those failures lead us towards better understanding. So that's a good thing. But there's just going to be this constant iterative process of trying to understand these systems. We might never get there,

Starting point is 00:17:22 but there's something exciting about actually trying to fully understand it and recognizing that these things are messy and complex, but still also something that we actually created. Well, also when it comes to something we created, we also have to think about the very combinatorial nature of that creation. And one of my favorite books here is the nature of technology and how it evolves by Brian Arthur. And what struck me most, I mean, there's a lot of things I love about that book. But what struck me most when I was reading it and it even applies to how you guys open this conversation with your question, it is a narrative around creation and who invented what. And we tend to talk about it in a very linear way, but it's a very nonlinear iterative

Starting point is 00:17:58 thing where people build on each other's ideas. And it's very messy and complex. And I have Lois thought that when we tell these stories, we need to do a better job of acknowledging all of that complexity and messiness. And only now the systems are even more complex. We now build systems that no one understands. And that classic, if I could get a time machine and go back to like 1950, I invent whatever your favorite product is now. And then you realize you couldn't come up with an iPhone in 1950. Oh, yeah, it's totally impossible to do that. You just don't have the knowledge that, yeah, you don't have the expertise of other people to build upon. It's simply impossible. There's all these things interacting. And you have to be

Starting point is 00:18:34 mindful of every single one and no one can actually be mindful of every single one. Okay. Well, thank you guys. That's all we have time for. And that's another episode of the A6 and Z podcast. Thank you. Thank you.

Your Ad Here

a16z Podcast - a16z Podcast: It's Complicated

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.