Software Misadventures - Early Twitter's fail-whale wars | Dmitriy Ryaboy

Episode Date: August 13, 2024

A veteran of early Twitter's fail whale wars, Dmitriy joins the show to chat about the time when 70% of the Hadoop cluster got accidentally deleted, the financial reality of writing a book, and how to... navigate acquisitions. Segments: (00:00:00) The Infamous Hadoop Outage (00:02:36) War Stories from Twitter's Early Days (00:04:47) The Fail Whale Era (00:06:48) The Hadoop Cluster Shutdown (00:12:20) “First Restore the Service Then Fix the Problem. Not the Other Way Around.” (00:14:10) War Rooms and Organic Decision-Making (00:16:16) The Importance of Communication in Incident Management (00:19:07) That Time When the Data Center Caught Fire (00:21:45) The "Best Email Ever" at Twitter (00:25:34) The Importance of Failing (00:27:17) Distributed Systems and Error Handling (00:29:49) The Missing README (00:33:13) Agile and Scrum (00:38:44) The Financial Reality of Writing a Book (00:43:23) Collaborative Writing Is Like Open-Source Coding (00:44:41) Finding a Publisher and the Role of Editors (00:50:33) Defining the Tone and Voice of the Book (00:54:23) Acquisitions from an Engineer's Perspective (00:56:00) Integrating Acquired Teams (01:02:47) Technical Due Diligence (01:04:31) The Reality of System Implementation (01:06:11) Integration Challenges and Gotchas Show Notes: - Dmitriy Ryaboy on Twitter: https://x.com/squarecog - The Missing README: https://www.amazon.com/Missing-README-Guide-Software-Engineer/dp/1718501838 - Chris Riccomini on how to write a technical book: https://cnr.sh/essays/how-to-write-a-technical-book Stay in touch: - Make Ronak's day by signing up for our newsletter to get our favorites parts of the convo straight to your inbox every week :D https://softwaremisadventures.com/ Music: Vlad Gluschenko — Forest License: Creative Commons Attribution 3.0 Unported: https://creativecommons.org/licenses/by/3.0/deed.en

Transcript
Discussion (0)
Starting point is 00:00:00 There was one outage that I distinctly remember. Long story short and without blaming anybody, a bunch of data in Hadoop got deleted. Like 70%, no backups, it got deleted, just gone. And the two emails I sent, one to Eng and one to the whole company, are I believe still linked at Twitter. Twitter has a Go link service. So if any current Twitter employees want to go to go slash best email ever
Starting point is 00:00:24 and go slash second best email ever, those are both mine. And it was like, you know, good news. We have lots of space in Hadoop. Bad news. It was totally an email I wrote with a sort of, this might be the last email I write to this company because I might get fired after this because we lost all the freaking data. It might drop the data. It's sort of pressure makes diamonds. Later, I heard people say, you know, the joke was that like, I love hiring ex-Twitter people because no matter how much everything is exploding, they just go like, eh, I've seen worse. Because there was stuff really, really bad, but also sometimes like the worst times are the best times.
Starting point is 00:01:14 One thing you mentioned from the acquirer side, like doing the technical due diligence, which is something you were involved in once you joined Ginkgo. So what does technical due diligence look like from the acquirer's standpoint? At what point do you feel like, okay, I'm satisfied enough that this looks okay? You're looking for deal breakers, right? If you're doing technical due diligence, unless it's specifically like
Starting point is 00:01:36 we acquiring the magical technology that it's going to be magical. And if it's not magical, the deal is not worth it, right? You're usually acquiring for some other reason. You're probably acquiring for a combination of there's some code and really good talent. And, you know, it positions us well for whatever like strategic reasons, right? So if you're at a point where you're doing technical due diligence, you're looking for deal killers, not for like, I wouldn't have done it quite that way. Right?
Starting point is 00:02:01 Like that's not a deal killer. That just adds to your integration estimate. interested in not just the technologies, but the people and the stories behind them. So on this show, we try to scratch our own edge by sitting down with engineers, founders, and investors to chat about their path, lessons they've learned, and of course, the misadventures along the way. Welcome back to the show, Dimitri. Fun place that I thought we could continue this conversation at is your LinkedIn profile. You mentioned being a veteran of early Twitter fail whale wars. Maybe you can take us back a little bit and talk about that. Yeah. So I joined Twitter on whatever the first working day was, January 2nd, January 3rd of 2010. And Twitter was a bit north of 100 people
Starting point is 00:03:06 and it was sort of very popular, but not nearly as popular as it was going to get eventually. And already in the throes of the infamous rewrite from Ruby to Scala, the fail whale was the error page. So whenever there was an internal server, what the users got was like a cute cartoon of a whale that was then christen, while also rewriting it from the original Ruby on Rails app into kind of a microservices architecture on Scala,
Starting point is 00:03:50 and trying to keep up with the user growth and all of that. And that was a really kind of stressful and fun environment at the same time. A lot of very, very rapid learning as the whole company grew, had to grow very rapidly. It was kind of the hyperscaler situation. When I joined, we were in a colo. Cloud services existed but weren't as popular. We essentially were renting servers in a co-located facility. And it was kind of a virtualized situation. So we didn't actually have good visibility into where the servers were in relation to each other.
Starting point is 00:04:26 And by the time I left, we were running in several data centers that we were running ourselves. So it was kind of a major migration. And there's just kind of one aspect of the scale and the growth. And a lot of, I think for me, those were like hyper growth personally years as an engineer, just because of the exposure to so many people working on these problems and so many of these problems and kind of how they were emerging it was it was really fun it was really stressful there was a lot of uh i really learned how to be
Starting point is 00:04:56 on call well and how to respond to incidents and you know firefight without losing your mind this was the data this was like the data platform team, right? Like, did you guys have like dedicated? I was on the data platform team, but like, first of data platform wound up getting involved in surprising ways. But second, when I started off, the company was small enough that I just sort of knew the other teams pretty well too. And like,
Starting point is 00:05:22 I would get pulled in just to like figure stuff out. But to give you a sense of like how the data platform would occasionally get looked into total site outages in 2010, that was the first year I was there. We were still in the colo and we were extremely constrained by the network bandwidth within the data center between the hosts and so we were getting the situation where we were getting a lot of errors because the web servers couldn't access the memcache the memcache servers so there was like a big memcache cluster and there were the web servers that were like very very light servers which were meant to be stateless and their requests were timing out like they just couldn which were meant to be stateless. And their requests
Starting point is 00:06:05 were timing out, like they just couldn't get through to the cache servers, which is basically where all the timelines are on Twitter, or at least were at the time. The way the whole thing worked was you post a tweet, there's a background demon that sees that a tweet happened, and it materializes the timeline for everybody who follows you and updates it and sticks that in the cache. So when you actually read stuff, it's not coming from a database. It's kind of pre-built for you. Right. And I'm eliding a bunch of technical detail, but that's the big picture.
Starting point is 00:06:34 And so it's a very big problem when your web server can't hit the cache, right? Because that means that you can't get anything. Right. And long story short, we were like, we have to kill anything that uses any bandwidth whatsoever. So, for example, we turned off the Hadoop cluster. Like in the particular moment when things got particularly bad, because everything was sitting on top of each other, we were like, okay, when the web server logs are being logged to Hadoop, to HDFS, that's taking up too much bandwidth. And if, God forbid, HDFS decides to rebalance and shifts a bunch of things between the data nodes, that will just flood the network. We can't have that. So we just shut down the whole Hadoop
Starting point is 00:07:16 cluster. And the day we did that was the day that Jimmy Lin joined Twitter. He was a professor at the University of Maryland. He is now in Toronto, I believe. And he literally joined because he was like, I'm going to, he was analyzing like graph data, social networks. He's like, man, I'm going to use their awesome Hadoop class. Sorry, there is like literally no compute for you.
Starting point is 00:07:40 Let's just whiteboard some stuff. And the issue turned out, it was like, it just didn't make sense. But the issue turned out to be that kind of the virtualized network in Arcola provided to us, hid from us how they actually grew Twitter's growing footprint. And so you start off with Twitter a certain size and it grants whatever, and servers, right, of different configurations. And they're sort of in a physical same pod somewhere in that data center. And so you've got your memcache servers, you've got your web servers,
Starting point is 00:08:11 you've got your Hadoop servers that are like different configurations, right? Hadoop servers, massive disk, memcache, lots of RAM, web servers, very thin, but like you have lots of them because you want lots of parallel cores. And then you want to grow your web servers, or maybe you want to grow your Hadoop cluster, maybe you want to grow something else. And there's no physical room in the pod anymore, so they get the next pod, right?
Starting point is 00:08:36 And they start allocating your things there. But to you, it looks like a flat network, right? So now what's going on is eventually you get to a situation where all of your cache, like two-thirds of your cache are over here on the left side, and your web servers are on the right side because maybe you didn't need to scale them as fast. And so they have to go through the interconnect between the pods, which is like really thin. And your data nodes for Hadoop are spread across all your pods.
Starting point is 00:09:03 And so whenever you do any MapReduce job and it does a shuffle, it just completely saturates the network of your web servers. And any server, any call between your web server and your cache server is trying to get through like MapReduce trying to shuffle 100 petabytes. It was probably 100 terabytes back then.
Starting point is 00:09:24 But, and like, we had no idea of that network topology. One of our first sort of data viz hires actually spent a while working with the network engineers and the ops folks to do like PCAP captures of network packets and figure out which protocol is talking to which IP and sort of create a map so that we kind of figure out that like, okay, all of these servers are over here and all of those servers are over there. And this is talking to that on the MySQL protocol and that's talking to this on the TCP protocol. So now we know like who's doing what because we needed to reverse engineer that. So when you ask like what the hell was data doing to, you know, involved in the
Starting point is 00:10:05 fail well stuff, like you think it's adjusting settings on your web server or like dealing with timeouts, it's that kind of stuff, right? Because the world was bananas. So that was one. There's also one where like our first move to data center was hilariously disastrous and there was both fire and rain. Sorry, sorry, before going into that. Wait, so, but basically shutting down Hadoop, so poor Jimmy, but then that did solve the problem. It's like from that decision, then you guys were like, okay, well, now the problem's gone,
Starting point is 00:10:34 so obviously it's the Rishapalooza, or like, how did you... No, no, no, that was just to get through like World Cup or something, because it's spiky. Like, let's shut down Hadoop for today. One story leads to another, because then that was like, obviously. So first we were like, we'll run the HDFS, but we don't want to lose logs.
Starting point is 00:10:57 Logs were writing, you know, describing to the, to the distributed system. We'll just not run any MapReduce jobs. And then we discovered that the occasional rebalance of the data nodes making extra copies on other nodes and not knowing about the actual layout, even that would cause enough chatter that it would create a spike of errors on the Twitter website. So then we had to shut the whole thing completely down. We just ate not having logs, which is hilarious when you're trying to troubleshoot a problem
Starting point is 00:11:30 on your first hand. Right, that seems a bit drastic. I don't recommend this. This was definitely a kind of last-ditch thing. That story just comes to mind because it is so particularly egregious. Most of the time, it wasn't stuff like that. But yeah, there was a bunch of yelling at our hosting provider to get them to move servers around, to get them to add network. That's how you actually solve it.
Starting point is 00:11:57 And also, try to... I think we did some stuff where we added some small caches on the web servers so that they could be more user pinning. So there was like, we started getting clever, but the first thing is just address the immediate pain and then like re-architect, don't re-architect and then eventually it will get fixed. Right, right. That was one of the lessons of the first, restore the service. Second, fix the problem. Not the other way around. The engineers really want to but why is it happening? Hold on, like, let's really
Starting point is 00:12:31 observe it. No, don't really observe it. Like the site is failing. It's ended up. I literally said that this morning to a colleague of mine, was like mitigate first and then figure out why it's happening. Yeah, like capture. And that's where like, as you get experienced, you also learn how to capture the right telemetry, Uh, or just like mitigate first and then figure out why it's happening. Yeah. Like capture. And that's where, like, as you get experienced, you also learn how to
Starting point is 00:12:48 capture the right telemetry, right? Like get your core doms, like capture all the possible state that you can capture. And then like, if the solution is rebooted and it mysteriously works, like reboot it. Yeah. For now. And that's how you reproduce the problem on a staging server. Right? Yes.
Starting point is 00:13:05 With the decision that big to be like, okay, let's be done with this Hadoop cluster, how do you go about even escalating that of the decision? Because there's multiple teams, not just Jimmy, that are impacted by this, right? Yeah. I mean, it was a very small... Now people are like, well, how would you do it at Twitter? Right now, I guess Alan would just say shut it down. But it was a very engineer-driven culture. And so, you know, when there was a big problem,
Starting point is 00:13:39 I think, I mean, we had a CTO, but I couldn't tell you who was between the CTO and my boss. And I was an engineer at that point. I wasn't managing it. I don't know that we had VPs or directors or anything. It was the people who are sort of like, well, there's the guy who knows about the web services and the woman who is like queen of cash and an ops guy who knows about the web services and, you know, the woman who is like queen of cash and like an ops guy who knows all the ops things. And they're like, Oh shit, everything's on fire. Hey,
Starting point is 00:14:10 what do you think about shutting down the Hadoop cluster? Well, we have to ask Dimitri about that. And I go like, well, I want the site to work. So yeah, let's shut down the Hadoop cluster. I'll tell everybody that's kind of it. It was very organic for a lot of war rooms, you know it's definitely it was very organic a lot of war rooms you know what were you saying i was just saying it sounds sounds a lot more fun environment to it there's a different kind of fun in being in this environment where you're just going through a bunch of fires and learning things together um yeah yeah absolutely it's sort of pressure makes diamonds later i heard people say you know
Starting point is 00:14:46 the joke was that like i love hiring ex-twitter people because no matter how much everything is exploding they just go like i've seen worse because there was stuff was really really bad but also sometimes like the worst times are the best times oh yeah for sure i think like this was an experience that a lot of engineers don't necessarily go through unless they worked on the op side at some point in their careers. And this is something I see pretty regularly when it comes to incident management.
Starting point is 00:15:15 Like you see a bunch of, let's say, for example, folks who have been SREs in their past lives or are SREs today, no matter how big the outage is, like you see the calmness and how they're dealing with the incident. You see how they are able to walk through the problem, able to mitigate it in time instead of like, what the fuck's happening?
Starting point is 00:15:34 Sight's on fire. Yeah. Yeah. Well, there's definitely like not freaking out helps. And also there's a lot, I think I put some of that in the book, or maybe I didn't and just meant to, but there's a set of skills you learn for dealing with an incident. Like, like, it's, it sounds simple when I say it, but like, so many people don't do it, giving updates, right? You know, the thing is broken, you're working on it. Even just saying,
Starting point is 00:16:03 I'm still looking at it, or I am looking at this particular area. Is anybody else looking at another area? Hey, how's it going with looking at that thing? Dropping in observations in the Slack, nowadays it's Slack threads, right? In the Slack thread of just like screenshots and, oh, this thing looks funny or whatever. Just like being vocal, being loud in a sort a dedicated space for a serious incident, not for just some minor troubleshooting. It is extremely helpful because then you can get multiple eyes on it and people know what's going on.
Starting point is 00:16:33 There are so many even very experienced and smart engineers who, when faced with something is failing, they kind of go dark and like they might show up at some point you have no idea right they might be working feverishly hard and just like going all out but because nobody knows anything it's it's very hard to manage the incident it's very hard for your support people to like support you and support the users it's hard for managers to answer you know ceo emails who are like, what the hell? And a lot of it is just about communication. You can be doing the same exact things, but good communication, the incident will go so much smoother and probably be resolved so much faster.
Starting point is 00:17:13 Yeah, I remember a lot of the feedback that I've gotten from like the second manager I had pretty early was like, you need to bring the team along, which at the time, right, I was very young and foolish while still foolish today. I was like, what do bring the team along which at the time right i was very young and foolish while still foolish today i was like what do you mean man i'm like trying to do all this like you know fixing all these things like i ain't got time for that but now that you know i'm on the other side like holy yeah like that's such it makes such a huge difference in terms of actually creating a sense of like oh yeah yeah, we have this under control. Like there's a process to it rather than, yeah, just having things being very chaotic. And when a lot of people are involved, because like maybe there is a problem where it's just
Starting point is 00:17:56 unclear what exactly is going on, right? Like we're seeing these errors, but, you know, we live in a microservices world, right? Like things propagate in weird ways. Back pressure builds up. Who knows, right? What you're seeing is, well, the site's erroring and MySQL has errors. Are the MySQL errors at all related to the site erroring? We don't actually know. Where else can we look? There's 5,000 Grafana dashboards, right?
Starting point is 00:18:21 It's just like there's a lot to do when it's not clear what's going on and the system's complex. And now I'm thinking more like Twitter circa 2015, right? It's just like, there's a lot to do when it's not clear what's going on in the system's complex. And now I'm thinking more like Twitter circa 2015, right? I mean, it's just, there's a lot and there's a lot of instrumentation and it becomes its own problem. Having experienced people who just take on the role of incident manager, not just from a like, sometimes incident manager is like the person who does the reporting who says, we will update you in another 15 minutes. But just sort of coordinating, traffic cop, right? Okay, we're exploring these three hypotheses about what's going on.
Starting point is 00:18:55 Here's where we are on this. Here's who's working on that. If anybody has any new ideas, right, like funnel them through me so we can keep everything organized. Right? And just kind of keeping everybody coordinated on that also super useful. And so there was a lot of stuff like that, that we either invented or reinvented or learned in those first few years of Twitter going bananas. Yeah, it was really fun. There was a data center that caught fire. And after it caught fire... That's a great start to any story.
Starting point is 00:19:30 It never ends. This podcast could go for a very long time. Very entertaining is what I would say. Okay, so the data center that caught fire. Yes, please go ahead. You know, we moved in a bit early. The data folks were the first people to move into this data center that was building out. And it was sort of our Colo host couldn't keep giving us space. So we needed to move somebody out. And it was like, well, the offline
Starting point is 00:19:58 data doesn't need to be as physically close. And if the connection drops or whatever, we can survive that. Nobody wants anybody to be out. But offline data processing, it's both big and separable. So it makes sense for us to be the first through the breach. But they were still building the data centers, it turned out, when our racks got installed in there. So at some point there were like guys welding something on the roof and they didn't protect it properly. And I guess there were sparks. And so. Spark in this context usually means something else. No, no, no. Literal, literal fire, fire through the roof. And then, so there was a small conflagration and then the sprinklers turned on
Starting point is 00:20:46 right and so it's flooding so first you have fire then you have flooding all the while like the servers are supposed to be running that one was fun there were yeah there was there was a bunch of things there was one outage that i distinctly remember, long story short and without blaming anybody, a bunch of data in Hadoop got deleted. Like 70%, no backups, it got deleted, just gone. It was sort of a combination of misconfiguration of a tool, a tool that allowed you to do that sort of thing in the first place, an intense environmental pressure that caused kind of fairly extreme usages of the tool to be sort of normal in the first place, because normally you wouldn't do it, but we had to. And then in that case, like a slight misconfiguration was catastrophic, right? As usual, this happens over the weekend.
Starting point is 00:21:36 You know, the person who ran the faded command reported to me, we figured out this was happening. We're like in the office, the small office,, literally sitting back to back. He's trying to fix the thing. I'm trying to recover what we can recover and update the whole company about what happened. So there were several things out of that. One is I wrote several emails updating the whole company. It was just all at or something, whatever the handle was at the time. And the two emails I sent, one to Eng and one to the whole company, are, I believe, still linked at Twitter. Twitter has a Go link service. So if any current Twitter employees
Starting point is 00:22:12 want to go to go slash best email ever and go slash second best email ever, those are both mine. And it was like, you know, good news. We have lots of space in Hadoop. Bad news. It was totally an email I wrote with a sort of, this might be the last email I write to this company
Starting point is 00:22:37 because I might get fired after this because we lost all the freaking data, my job with data. And, you know, the engineer was mortified. I don't even know how many people still know who that was because I just kind of wrote it all from first person and was like, this happened. If you have any questions, ask me, blah, blah. Kind of tried to give him space and cover and and i remember my boss at the time basically was like one you know this happens don't worry about it i got you and two eventually this was gonna happen it doesn't feel like it right now but it's it was like that first year it was either 2010 or 2011 like it is gonna be such a relief that this happened now and won't happen anymore in this company than if this happened three or four years from now.
Starting point is 00:23:32 And sure enough, the amount of data we lost, it seemed so massive. It was like everything. We weren't generating that much data in a single day, three or four years later. Losing that data would be like, eh, it's a day's work. It sucks. And when public reporting is based on data in Hadoop, when all the machine learning recommendations, just like everything is so tied into the data platform that we built that would be
Starting point is 00:24:07 so much more impactful and so much more hurtful that first year it was like okay well some of the operations we used to do wouldn't be able we can't do and some but we can recover but it'll take us a few days and it felt huge at the time and And in retrospect, it just didn't matter that much. It didn't affect the trajectory of the company at all. In the big picture, it didn't matter. You don't want those things to happen, but also it's useful to have this sort of perspective of how bad is it now versus where are we going to be
Starting point is 00:24:43 in a couple more years, right? And obviously doing the right thing when it happens, right? Like when it happens, making sure it can't happen anymore. It's so wise what your manager said, right? In terms of like, oh, this is like bound to happen in that. Okay, now when I say it, it sounds a bit cliche, but where has he worked before that gave him sort of this perspective? So this was Kaylee Torgerson and he had his own sort of data consulting practice before
Starting point is 00:25:12 Twitter convinced him. He was consulting with Twitter for like setting everything up and then they convinced him to join full-time. I think before that he knew some of the folks who wound up being at Twitter through maybe his work at or with Yahoo, something like that. Remember when Yahoo was a super legit tech company? Back in the day. It's interesting. I've heard this sort of argument before, right, where if you're having a lot of issues really trying to fight for this idea that's really about prevention and that's super important and
Starting point is 00:25:51 sometimes the answer is just hey just let it drop right like let it actually you know catch on fire and then people like actually look at what happens in order instead of you wasting all this time trying to, you know, advocate for it. Is that something that you've like, you, I guess, agree with, or you've seen it done well? I think it depends on the specific thing that we're talking about failing, right? There are some things that can't fail live, right? I don't know, missile control systems, right? Like things involved with people's lives, operations that are regulated, right? Like you can't do fraudulent transactions
Starting point is 00:26:37 and things of that nature. But there's a lot more that can fail than people think. I'm trying to remember the expression. It's something like, we all test and prod. Some people also have a staging environment. And I think there's less talk of it now. A few years ago, people talked about it a lot, but the notion of testing in prod
Starting point is 00:27:05 and if you know the book accelerate you know they sort of lean on that stuff where the idea isn't that you only test in prod the idea is that you know that stuff is going to fail in prod and if you trust so much your staging testing in your qa and acceptance and everything else then you're gonna fall really hard when things fail in prod. Nothing tests your system like real life and real users. So if you start from the assumptions that things will fail and you need to have the observability, the ability to debug, the ability to capture all the data you need to understand things,
Starting point is 00:27:42 and the ability to recover from errors. So save the appropriate state so you can do replay, expect that a transaction might happen several times and you need that impotency, things of that nature. You'll be in a much better place, right? So it's less, I think, about just let it fail, but more like expect that something will fail that you didn't expect.
Starting point is 00:28:03 Think about how you're going to recover, so some of us are lucky enough to have been actually taught something about distributed systems all of us who are working on the web are writing distributed systems it's just like some folks haven't been told that anytime your web server makes a call to a database congratulations you're in a distributed system now like if you made a right to a database, congratulations, you're in a distributed system now. If you made a write to a database, the database might be able to understand transactionality, but you're outside the thing. If you have multiple web servers trying to make the same write, they might write it three times and the database won't know it's the same thing unless you thought about it ahead of time.
Starting point is 00:28:39 It's a distributed system and a lot of folks are just sort of, I make the call and then the thing happens. And well, what if i make the call and then the thing happens and well what if you make the call twice what if you need to undo the call because it's actually part of a you know the classic example shopping carts right i tested it i tested it with my unit tests how would this happen yeah or you create this crazy elaborate systems that just take in some cases they're very good investment, but not everybody can make that investment and sort of being prepared for these kinds of errors and being able to detect
Starting point is 00:29:11 the errors, being able to recover from the errors is hugely important. And I think that goes, I mean, that's a data engineering thing, but it's also any kind of service engineering thing, right? And operating a 24-7 web service or web-based service, you need to know how, you need to think about those things and not just rely on sort of testing. And testing is good. Definitely test. Test as much as you can. So some of the things you're mentioning are also in the book that you have,
Starting point is 00:29:41 which is The Missing Readme. We'll definitely link it in the show notes and highly encourage people to go check it out. I get like 10 cents per copy. Please do. We also encourage people to buy the book, not just check it out. What prompted you to write the book in the first place? I think it was running into situations multiple times where folks who were new grads or kind of a couple years out would be good programmers and very capable of
Starting point is 00:30:13 doing whatever we needed them to do, but wouldn't know these things that a lot of us take for granted that are kind of unwritten. Once you're in the industry for a while, you kind of pick up how the industry does things. And because we sort of just picked it up from our peers, we forget that it's knowledge you have to acquire. And then the new person comes along and they're sort of struggling for a while. And then maybe they pick it up. It doesn't have to be that way, right?
Starting point is 00:30:45 And a lot of problems can can be avoided if you just explain why things are the way they are. And so after yet another round of sort of explaining to somebody who is very bright, intelligent, capable, but is doing things kind of the wrong way, just because nobody ever told them what observability is or like you know logs are not just print to standard error or something right or like what it what goes
Starting point is 00:31:14 into a log right it's just stuff like that that is it's not rocket science it's just stuff that you need to know and you can eventually learn right they just why don't we just write it down in one place so that people can... My vision for it was a tech lead has a stack of these things on their desk and they get their new batch of new grad hires or interns or whatever, and you're just like, read this over the next three weeks. If stuff is weird, do you understand why we're doing a standup or something, right? There's answers probably in there, but I can also, I'm happy to help you, but like here's an answer so that you can just read that and we'll be like 80% there.
Starting point is 00:31:51 And so I shared that observation with somebody I knew from Twitter, actually, like on Twitter, not from the company. Chris Riccomini, who was also tweeting about something along those lines and sort of we decided that it would be fun to actually write a book. We played around with ideas of courses and other things, but we settled on a book. It was really fun.
Starting point is 00:32:14 And it was surprisingly hard. Sort of write down the basic stuff. Oh, how does the basic stuff actually work? In that process, did you learn new things that, even though, right, like you said, these concepts are fundamental, but did you learn new things in the process of writing it? I think so. There were definitely some concepts that I knew, but I never examined and tried to explain that trying to write down caused me to re-examine. I also read a lot of sort of different approaches to how to build
Starting point is 00:32:47 the valuable systems or what kind of log goes where or how to handle exceptions, you know, dealing with null pointers. And at some point you sort of just have to decide which of the sets of ideas you go with. But that part was really interesting. Like most of the time writing the book isn't the actual writing, it's reading and trying to synthesize like, what am I trying to say here? And what do I actually believe? And I read this argument,
Starting point is 00:33:10 do I actually believe that argument? So there was a fair amount of that. We also, we have a chapter there about Scrum and Agile, even though I'm not a particularly strong believer in capital S Scrum and capital A Agile, it is hilarious to me that like the Agile, even though I'm not a particularly strong believer in capital S, Scrum and capital A, Agile, it is hilarious to me that the Agile manifesto, the first thing it says is something like people over process or getting things done over process, something like that. And then we codify these super elaborate processes and there's the retros and the-ups and the planning poker and this and that.
Starting point is 00:33:47 There's just like so much of it. And people talk shit about Agile and Scrum because what they're exposed to is the process and the process doesn't work for them, right? So I spent a bunch of time in the book kind of explaining what is the process trying to do and trying to say that, understand why it's there, and then you'll be able to use it or toss parts of it out.
Starting point is 00:34:09 What you can do is either just say, this process is stupid. I'm not going to do it. And then just have no process because you're probably going to fail. These things immerse for a reason. Or follow it in a sort of cargo cult way, right? Just sort of, well, we're supposed to write stories as a blank, I want to blank. So I'm going to write a user story that says,
Starting point is 00:34:32 as a sysadmin, I need, you know, the package version to be updated to 1.3.7. And it's like, no, that's not a user story. That is just a waste of words. If that's all you want just write down when you top reading i remember writing exactly that and being like this seems really stupid i mean you know what's funny like the number of hours i've seen spent on how story points should be should they be based on number of days, number of hours,
Starting point is 00:35:07 some random thing you make up and you say, well, it's logically this thing or relative to how much time other things take. And so the reason I was laughing so hard through the process, because I had a colleague on my team who was a scrum master. I love working with that colleague. Anytime he would have a stand up, he would basically say, I'm going to wear my scrum master hat. And I would always laugh out loud when he would say that, because I'm like, Hey, man, I care the least about your scrum master hat. Anyway, everything that you were saying was just reminding me of all those conversations and how many hours people burn to with the process and to plan to get stuff done. And sometimes like the amount of time it would take to get stuff done and sometimes like the amount of time it
Starting point is 00:35:45 would take to get that done is less than the amount of time it takes to just put the process behind it blind following of the process is very bad and the guy who invented points is now like that's i would like to take that back because people did not get what I was trying to do here. Whatever it was meant to be, the agile consultants have taken over. But there is merit to trying to say, these tasks are different sizes. How much can I actually take on? Having some sort of methodology to not commit.
Starting point is 00:36:22 Because without that stuff, every single team I've seen, you commit to a bunch of stuff and at best half of it is done at the end of your time box. Because you're kind of bad at estimating. We're all bad at estimating. Let's acknowledge the fact that we're bad at estimating and bake it in. How do we get better at estimating? Well, if we had some sort of projection, we could see how big our error is and then we could drive it now but then so like the logic makes sense it's there like oh is a point a day like should it be a fibonacci sequence and like there is reasons for making it a day there's reasons for making a fibonacci sequence but if you take it as religion and dogma it doesn't do anything like it
Starting point is 00:37:00 fails when you kind of take it as dogma. It works when you understand why it's there. And just like with any kind of, you know, like when you're in high school and they teach you the, the essay structure and it's very stilted and they insist that you write it that way. And you're like, but when I read, you know, John Didion or somebody like who's an excellent writer, they never do any of this. And that's just like, that's because they know the rules.
Starting point is 00:37:26 When you know the rules, you know which rules to break and which rules to stretch. When you don't know the rules, you just write nonsense that is impossible to follow. So learn the rules, understand what's there.
Starting point is 00:37:37 Once you understand what's there, absolutely toss it. And don't follow the thing without understanding why it's there. I did not expect this to be an agile i was hoping to hear about data engineering i think every engineer and talk to them about agile has some strong emotion associated with it either they hate it or they love it or hate to love it or love to hate it all All of that. So in terms of writing the book, like you mentioned,
Starting point is 00:38:06 it was a very hard thing to get through. And some other folks have also mentioned that writing a book is not just a lot of hard work, but it might not be as lucrative as one might think. Did you know, like what, no, well, one, I want to know your opinion, whether that's true or not. And again, I know the measure of lucrative can be very different.
Starting point is 00:38:28 I'm giggling because Damesh is giggling. That's why. I don't know what's going on. But I'm trying to validate what I heard and whether that's true or not. The other part is like, did you already know going in that this is what would happen? So first off, unless you have some sort of ridiculous hit on your hands, and there are a few, chances are you're going to make like pennies per hour for the amount of time you spend writing a book. So from a financial perspective, it doesn't make sense. For some people, it makes sense if they're using it to advance their consultancy, right? Or kind of create a brand and part of the brand is an author of technical books. Or there are some people who write these runaway hits and they, and they probably do make
Starting point is 00:39:25 decent money from it. Although I suspect the ones that I'm thinking of, we're not setting out to, to make money. They were just like, this book needs to be written. So I'm going to write it. And then if I make some money from it, for example, you know, in the data space, Martin Kleppman's book, right. Designing Data Intensive Applications is, it's like 10 years old now. And it's an absolute classic that everybody gets as their first recommendation for if you're involved in data or distributed systems at all, you have to buy this book.
Starting point is 00:39:51 By the way, you have to buy this book. So he's probably made some decent money out of it. I highly doubt he wrote it because he was expecting to cash in. He wrote it because he felt that this book needed to exist and he's very good at explaining things to like cash in, right? Like he wrote it because he felt that this book needed to exist.
Starting point is 00:40:08 And he's very good at explaining things. And he has a very encyclopedic and broad knowledge of the space. So, yeah, I knew that going in. I know a few folks who have written books before, and they all said that. I think Josh Wills showed up on your podcast, so him and a bunch of others. I was pretty lucky to work with folks who were very knowledgeable and published a bunch of RLE stuff. It was the sort of thing that I felt is a service.
Starting point is 00:40:34 I was hoping, like, the vision wasn't piles of money in my bank account. It was stacks of books on tech leads' desks, like the vision I described before. And I was like, that would be awesome and and cool and i want to make that happen so we can just generally lift up the level a couple inches yeah you know so that was it what is the process of writing like especially when you're working with the co-author so do you want to know like the actual writing or sort of the pitching and everything else actually both so one aspect is like you both of know like the actual writing or sort of the pitching and everything else? Actually both. So one aspect is like both of you, like you saw the co-author on Twitter and you're like, hey, seems like there's some common topics we are talking about. So let's get together and write the book.
Starting point is 00:41:17 But then when you come together, like what does the process of writing a book look like? Because and the reason I'm asking this is we've spoken with folks about writing in general before. It's something that is a topic that both of us are interested in. And we've also heard a lot of our listeners interested in like how to get better at writing. And one aspect is like, it's hard enough to write a blog post,
Starting point is 00:41:37 let alone write a book. So what does that process look like going from outlines to chapters to the final book? And I would also love to know some aspects of like finding a publisher, for example, like how do you go about doing that and things like that? So I guess going backwards a little bit in terms of the actual writing, when you have sort of an outline or at least a general idea of what the different chapters are, I don't know how it works for other co-authors. For me and Chris, it worked pretty well because we're both pretty experienced open source
Starting point is 00:42:11 contributors. And we're very used to this general idea of you say, I'm going to do a thing. There are some people weigh in on the design and what to watch out for. Maybe you have a first draft of it and they say, oh, don't do it that way. Do it a different way. And you rewrite it. And maybe they put a patch on your patch to sort of adjust some things. And then some third person says, hey, you should add a test here or there.
Starting point is 00:42:35 And you just have this very iterative, heavily reviewed, collaborative development process. And both of us sort of marinated in that kind of environment. Like both of us are Apache committers and non-Apache PMCs. So that part of the collaboration came very natural, right? We just, I wrote a thing. I have hated, but there's some stuff here. Please take a look. He does a review, edits it with the edits visible in a Google Doc, accept, accept, accept. Oh, let's talk about this gigantic sidebar
Starting point is 00:43:12 discussion and comments, find the resolve and so on. But it felt very natural. It felt like writing code except in English. So that part was great. We had some hijinks in like going between Google Docs and trying to use Git and trying to use Microsoft Word Live because of like publisher constraints. That part wasn't so fun, but the actual sort of exchanging comments
Starting point is 00:43:40 and figuring out what we're trying to do, I think we had like maybe three calls slash video calls through the whole time. The rest of it was just back and forth on the docks. Wow. I wouldn't say like a month ago, Chris and I hadn't met. We've known each other since like 2011.
Starting point is 00:43:55 We hadn't met in person until a month ago. So you wrote a book together, but you met the person. Wow. Yeah. And we live like an hour and a half away from each other. But it was like pandemic kind of, you know, but, but yeah, that part was great. And we sort of, in the first step of that was writing the pitch and sort of making sure that
Starting point is 00:44:18 we're on the same page and talking through like, what do you actually want from the book? What should we include? What shouldn't we include? Why is this important, right? And just making sure we're on the same page. And I think both of us kind of, there's enough mutual respect, you know, that we don't feel it's a loss of face to give on something or, and we trust the other person,
Starting point is 00:44:38 like if they insist on something that like they know what they're talking about, right? So that was good. In terms of finding a publisher, most tech publishers have very clear sort of proposal guidelines. O'Reilly has a Google Doc that you can copy that basically you fill out. And it asks, you know, what is this book about? Why are you the right person to write this book?
Starting point is 00:45:04 You know, give us an outline. Are there other books about this? How is yours different? Those kind of questions, right? And you just like email workwithusatoriley.com and like they get back to you. And then for Missing Read Me, we actually wound up going with No Starch,
Starting point is 00:45:20 which is a different publisher because we thought that they got what we were trying to do a little bit better, because Missing Read Me is kind of a weird book in terms of defining its audience, because it's a technical book, but it's not about code, but it's not a, it kind of straddles a bunch of things.
Starting point is 00:45:38 And for that one, we also, we had a draft chapter. It was chapter six. The first thing we wrote was a chapter and test, because that's what everybody loves right and and we had uh we asked for a sample edit from from our publishers like the publishers who were like we would like to publish a book we said okay we will have several publishers who said they would like to do that we want to go with somebody who gets what we're trying to do can we see what an edit looks like because we we didn't we hadn't ever worked with an editor.
Starting point is 00:46:08 Wait, so this is before you said yes to the publisher, they would edit the chapter that you wrote? Yeah. I don't think that's particularly common, but for, I don't think that's common, but we were in a situation where we were choosing between publishers and they were pretty involved and both of them were sort of pushing the book in different directions and we're like well how do we decide i think the real work will be like what
Starting point is 00:46:32 is the actual value of working with the publisher versus self-publishing on online right it's a professional editor and the distribution network right um and so we to find out, we understand what the distribution network does. We don't understand what the professional editor does because we hadn't worked with one. So we wanted to see what that feels like. So we got notes back. Was it super different? Like the, from like the different editors?
Starting point is 00:46:58 Like the tone and the... Yeah. Yeah, it was. Now it's been years, so I don don't quite remember but i do remember that both the sort of the kind of comments we got and the volume of comments we got was quite different and i think we actually went with the ones who gave us more you know reading because we wanted the tough love and they were very helpful youcially, self-publishing probably would have gotten us more money, but fewer readers is because you take a much bigger cut
Starting point is 00:47:31 when you publish on Amazon or something. You don't get as much distribution, not as much visibility. But I think the editor contributed a lot and sort of pushed us to, to stay to some conventions and structure that we were deviating from being sort of newbie writers really helped us also clarify things, be more succinct, synced kind of very, I would sometimes go on as I do in this podcast and have like flowery
Starting point is 00:48:03 language. And we're just like cut it out ruthlessly, just like make it very uh very easy to read you know like appeal to a broad audience not of all of whom are native or very fluent English speakers you know things like that so I think it was very helpful so there's one interesting thing like some of the blog posts that I like reading are people kind of describing things in a very organic way, as if like if you're reading their blog post, it's as if the person is talking to you. One example that comes to mind is Tim Urban. He drafting a blog post and I was thinking about this on me and I'm like, okay, I can make it super succinct and cut it and make it crisp. On the other hand, I can just write as if I was talking.
Starting point is 00:49:00 And I liked the latter part in terms of when I read stuff in general i just tend to i tend to read those fully read those logbooks as opposed to uh the ones which are already formally written because it's just easy to read that so when you're trying to can i can i ask you a question you ever look up recipes online recipes online yeah yeah yeah all the time. Yeah. So do you read the, like, I was going out for a walk with my dog and I was watching the fall, fleeing? No, I just don't. No, no, no.
Starting point is 00:49:30 It's just like the bullet points, like step one, step two, step three. Yeah. Okay, it makes sense. What you're saying is it depends on the content of the book, of the book in this case. And how much you vibe with it, right?
Starting point is 00:49:41 There are actually some recipe writers where, like, I enjoy the essay enough. I'm like, yeah, this is the stuff, actually. I'm kind of enjoying this. It's because of the with it, right? Like there are actually some recipe writers where like I enjoy the essay enough. I'm like, yeah, this is actually, I'm kind of enjoying this because it's creative writing, right? Like I just like write an interesting essay about like whatever, what the smell of the soup evokes for them.
Starting point is 00:49:56 And it's engaging. And some of it is just like it's tried, like you're just padding the page so you can shove more ads in front of me. Like just give me the ingredients, right? I think it really depends on the writer and and the audience egg tarts and uh state for the writing um no no when i was trying to go with this is is it you who defines the tone of what the content should be? Or is it up to the editor to push you in a direction and abide by the voice that they want you to have in the book?
Starting point is 00:50:33 I think the publisher reserves the right to not publish your book if they think it's bad. Oh, okay. But you don't have to take the editor. Ultimately, you're the writer. The editor is making suggestions. Interesting. Yeah. So like on my personal blog and Chris on his blog have much more of a sort of individual voice maybe.
Starting point is 00:50:54 But yeah, I guess it was a little bit more sort of clear, succinct. The more you insert asides and stories and other things of that nature, the more sort of you stand the chance of losing the reader. Yeah, that's fair. In this case, for folks who might noodle with the idea of writing a book, would you recommend it? I'd say know why you're doing it, because it'll take longer than you think it'll be more work than you think and it can be highly rewarding you know i hear from people who've read the book and and i get such nice feedback you know it's it's it is it is very rewarding to to know that it's out there and that it's actually helped some
Starting point is 00:51:39 people definitely don't do it for like money or fame, you know? But if you're driven to sort of like, I know this book needs to exist and I don't know who else will write it, so I might as well do it, right? And you think you can actually go through it. In the end, like any large project that you finish and are proud of, right, has its own reward.
Starting point is 00:52:04 But that's how I view it, right? I got some money out of it. It paid for that stupid Microsoft Office Live license. It was more than that. It was more than that. But it's definitely not, you know, I could earn way more money with that time doing like, I don't know, expert interviews or something.
Starting point is 00:52:24 In terms of roughly number of hours, I know it's super hard to quantify this. How long do you think this took? I don't know. It was so spread out over the pandemic year and I didn't keep track, but easily in the hundreds, like low hundreds, maybe 200. I see. I think Chris published, Chris Riccomini, my co-author, published a blog post about writing a technical book when it came out, and he might have an estimate for how much he spent.
Starting point is 00:52:56 True, true, true. Because he's a cool author. People could Google it. Yeah. You will find that on LinkedIn in the show notes. Go ahead, go on. Do you think you would have done it if it was just you? I'm asking because for me, for this podcast, for example,
Starting point is 00:53:10 Ronik has been huge in terms of kicking my ass, being like, yo, Gual, where's this thing you promised to do two weeks ago? Go, go, you know, just kidding, just kidding. But did that kind of play into a factor in terms of like getting the thing actually out the door? I think definitely having a commitment to somebody else and knowing that they depend on me to do my part definitely helped me actually finish it. I probably wouldn't have finished it if it was just me being like, I could sleep an extra hour or I could get up and go, right? Like, I'll go for sleep.
Starting point is 00:53:43 I'm also very grateful for the product. Thank you. Thank you. Oh, same, same one. Uh, these days one has been thinking me every day, what things I need to do. Excellent. Anyway. Um, cool.
Starting point is 00:53:55 Yeah. So maybe the doing a bit of a pivot in the last 15 minutes, but yeah, let's talk about acquisitions. So this is something that you mentioned before you've been on like sort of both sides of it and the reason why i'm very curious is that recently i had a friend whose sister kind of went through this process of they worked at a startup for like a few years and then they like got bought up by like a bigger company and they seem pretty interesting in terms of right like how you think about like how much equity is worth right like in the interview process right like you know if the company is
Starting point is 00:54:29 really good they maybe give you some projections right in terms of like oh yeah if we exit for this much money right this is how much your stocks are worth yeah like just very curious in terms of how to think about acquisition i guess from like from an engineer's perspective. And also, how do you integrate with a company and all those sort of things? Yeah. I think I don't have a lot of insight about the financial part of it. You know, stock options are not real money until they are. And all of those projections about, like, if we exit for this versus that, like, this is how much you get.
Starting point is 00:55:04 Don't factor in dilution or if they do, it gets very complicated and there are calculators online and like you can play with those, but it's monopoly money, you know? But in terms of integration, yeah, so I was kind of on both sides of this. I've been at Twitter when we acquired companies
Starting point is 00:55:22 and we integrated them into our teams, also at Zymergen and at Ginkgo Bioworks. At Ginkgo, I led acquisition on a couple of things or did technical due diligence. Actually, I had all three of them. I did technical due diligence. And the biotech company that I worked for, Zymergen, where I was the CTO and the PN, got acquired by Ginkgo. And so my job was to make sure that the software team gets properly integrated into a new environment.
Starting point is 00:55:49 And I think there are several different ways that this can work. And it's important during the acquisition process for all parties to agree on what they're doing. If it's a acqui-hire or intended to be a growth of the acquirer's engineering team, it's one thing. If it's the Facebook acquiring Instagram or WhatsApp kind of situation where you're acquiring
Starting point is 00:56:18 a product and a team and it's going to be standalone, it's very different. You definitely don't want... And I think the second case is you want to keep the team intact. You want to have a conversation upfront about what does and doesn't need to change and on what schedule in terms of transitioning onto the parent company's infrastructure, in terms of adopting their onto the parent company's infrastructure, in terms of adopting their standards versus not adopting their standards and so on. Most of the acquisitions I've been on either side of have been more of the first nature where you integrate the teams. And the danger there is that the integration doesn't go. And like for years afterwards, people are like, well, you know, this is company A engineers or company A practice, you know, and that's why things look the way they do.
Starting point is 00:57:11 And you want to be very clear about sort of how and in what order you're going to integrate things or if there are competing solutions, which one's taking over and when, and kind of work aggressively to merge. One thing I did very intentionally in the Ginkgo's Amargin acquisition was work with my counterpart on the acquirer side to as much as possible have teams that are, because the software engineering teams were actually roughly similar. I think we added like 30% to their engineering team, maybe even more. So shuffling the teams so that there are acquired engineers on multiple teams within the acquirer's department.
Starting point is 00:57:57 And that does two things. One, it sort of prevents that us and them kind of mentality, right? Because like your team is your team now, and that helps get over that barrier mentally. But two, you also have these kind of tendrils in multiple teams, right? So it becomes that much easier to find out how things work when you're like, what's going on with whatever core services team? Well, I know somebody there, like it's baked in, right?
Starting point is 00:58:24 Because I worked with her in my previous company. So whatever good stuff kind of comes with your culture can sort of be introduced to multiple teams all at once. And the communication flows a little bit better because so much in a larger company of communication flows through not like organizational lines, but who you know. And a team coming in as an acquired team doesn't know anybody. Right? So fundamentally, the disadvantage and I don't mean disadvantage is like a competition of us versus them. But like,
Starting point is 00:58:57 it's like you're joining a new company, except you didn't have an interview, right? They have no idea what they're doing or why you're having this kind of support network that's read throughout, lets the organic network work. So I think that's pretty important. And yeah, remembering that the acquirer is the acquirer and there's a reason they acquired you. If they were hoping to, I don't know, modernize their architecture and you come in and the teams are like, but our stuff works.
Starting point is 00:59:25 Well, with all due respect, the whole point was that we're going to change some things. So let's talk about what things make sense to change and which things don't make sense to change. And maybe not change all things at once, but we're going to modernize the architecture because that's the whole point of us being acquired. And you can't come in and be like you all are idiots you're doing everything wrong we're gonna do things our way you know like how could you possibly have made the classics or like it's a legacy system like of course there are gonna be things that are broken if you looked at your system with fresh eyes you would also see a bunch of things that are broken right like? Like chill out a little bit,
Starting point is 01:00:11 cut out the well at whatever, Ginkgo Wii, at Google Wii, at whatever, right? Just try to find what to appreciate in the new environment. And remember that there's a reason they brought you in and there's value in what you're bringing to. So in a way, as you're going through the acquisition process, sounds like that I might do a poor job of describing it, but two goals you're trying to fulfill. One is from a technology stack standpoint, a successful merging of teams means you don't have two independent stacks, but you have one stack. Where aspects of the new stack are kind of embedded into the new one, slowly evolve, parts that you would still still keep but towards the goal of reaching one end with it looks the same and the second aspect is the team where the teams don't end up being two independent teams but rather one team and people are bringing some different ideas to the table from the acquisition side but at the end of the day you have all of
Starting point is 01:01:03 the new folks spread across the entire company to kind of spread that connectivity issue in a way. Yeah. Yeah. I think that's right. I think you captured it better than I did, actually. Yeah. You got it. And also, especially when it's a small startup being acquired by a much larger company, knowing that there's opportunities to do something else. And part of the benefit, something you get at a larger company that you don't get at a small company is an opportunity to sort of get a different job without changing your jobs, right?
Starting point is 01:01:34 So if you've been doing whatever, data infrastructure for a couple of years and now you want to do, I don't know, front end, right? Like you're just really, like there's a way to transition. Like you can just make the connection, you can transition and like they might have the support systems and all of those things.
Starting point is 01:01:52 Or just being like, you know, I've been on, I don't know, support systems and I want to go into ads, right? When companies get acquired by the likes of Google or Salesforce or Meta, right? There's just so many different things that engineers can do, right? When companies get acquired by the likes of Google or Salesforce or Meta, right? There's just so many different things that engineers can do, right? It opens up opportunities internally. And a lot of the times, I know from the time we were at Twitter and Twitter was pretty acquisition happy for a while and acquired small teams that were really good. And some of the best acquisitions wound up being, you know, it's four people that a year or two later all work on different teams.
Starting point is 01:02:29 But it's like, oh, yeah, they came from that startup. Like, they're all really good, you know. And their influence is felt kind of throughout the organization. They're not necessarily like a small little team that you kind of deploy to different places to fix things, right? It's just that they wind up having an influence all over the place. And one thing you mentioned from the acquirer side, like doing the technical due diligence, which is something you were involved in once you joined Ginkgo.
Starting point is 01:02:54 So what does technical due diligence look like from the acquirer's standpoint? It really depends on the context, right? Like you might go through the tech stack, understand, start developing a vision of how it would integrate with your tech stack. Depending on the nature of acquisition, you might get to interview people
Starting point is 01:03:13 or you might just get to talk to sort of the CTO or sort of like the engineering leads, but they don't want to tip their hat and they want internally the rumors to start so they don't expose you to the team. So it can get a little bit nuanced. So maybe you'll see some design docs and things like that. There's usually something called a clean room.
Starting point is 01:03:33 But once there's a fairly strong understanding that this might actually happen, the company that's being acquired starts sharing a bunch of information about their financials, about their cap table, about their product, about their customers, like all that. And it goes into a sort of a third party where you can look at the documents there, but then if the deal falls through, your access gets revoked. So as an engineering leader, you might get access to that sort of thing and look through and ask additional engineering questions and so on. As an IC, you might be invited into sort of architecture discussion or presentation if things are a little bit more open. And so they're like, we really want to understand whatever, how their streaming platform works, right? Like let's get our streaming platform person. They're the only person who understands any of this.
Starting point is 01:04:29 We need them in the room. I see. And in this case, like many times you, the actual system in the works is rarely the same as what you have in the design docs, partly because in design docs, you start somewhere, you start implementing and the system evolves just like a living organism of sorts. When you're doing the due diligence, you want to make sure you understand not just the good
Starting point is 01:04:54 parts of the system, but also the limitations. And it's not that the person on the other side is trying to hide something, but they might not see the limitations the same way you do. So they might not be as forthcoming with some information as you might expect it to be. So at the end of the due diligence, like what does the goal look like? At what point do you feel like, okay, I'm satisfied enough that this looks okay? Yeah, I think you go into it assuming that things don't work 100% of the time or 100% perfectly. And you're looking for deal breakers. If you're doing
Starting point is 01:05:26 technical due diligence, unless it's specifically like we acquiring the magical technology that it's going to be magical, and if it's not magical, the deal is not worth it. You're usually acquiring for some other reason. You're probably acquiring for a combination of there's some code and really good talent and it positions us well for whatever strategic Right. So if you're at a point where you're doing technical due diligence, you're looking for deal killers, not for like, I wouldn't have done it quite that way. Right. Like that's not a deal killer. That just adds to your integration estimate. Makes sense. I think that's, yeah, that kind of puts perspective, I mean, puts things in perspective for me, at least. So that makes sense. Yeah. An important thing there is not so much, yeah, to like look for problems, but look for what will it actually look like to integrate or incorporate this into what we're doing, right? Like, you know, how are they doing authorization and authentication?
Starting point is 01:06:26 Is that going to be an easy lift and shift, or are we going to, like, have to rethink the whole thing? Some subtle gotchas that are kind of deep in the weeds but can really change the timeline for, like, getting the thing to actually work. And maybe, like, adjust the strategy, right? Like, okay, we should let this run separately for a while because it you know they run on gcp we're on an aws that's a massive migration maybe we should just never do it right like it'd be easier to rewrite it like just have them write a new one on our system or whatever right like those kind of
Starting point is 01:07:00 things like you're working out a high level initial plan of like what's likely to be and hopefully talking to your counterpart there so that you're all on the same page about like what technically needs to happen. You're not getting down into like, you know, how they deal with transactionality. Yeah, you use Memcache, I would use Redis. That's not the conversation you're going to have. Yeah, like they're using Redis, we're using Memcache. Maybe you talk about that. Like, do we run it for a while? I see. Yeah. But like they're using Redis, we're using MAM cache. Maybe you talk about that. Like, do we run it for a while?
Starting point is 01:07:27 Like I was a little bit sure. I see. So in a way, technical due diligence is kind of a data point for, I would say, CEO and others who are the decision makers in this process to figure out at what point this thing would be operational as part of an integrated stack and how much it would cost us to make that happen. And if there are any deal breakers in the process yeah that sounds right uh well dimitri uh sorry about running late on time i didn't pay attention but this has been another awesome conversation with you uh and we wanted to talk about what you're doing next uh but
Starting point is 01:08:03 maybe we'll talk about that some other time and thank you so much let's totally do that and thank you so much for sharing all the stories all right my pleasure thanks so much take care bye hey thank you so much for listening to the show. You can subscribe wherever you get your podcasts and learn more about us at softwaremisadventures.com. You can also write to us at hello at softwaremisadventures.com. We would love to hear from you. Until next time, take care.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.