The Pragmatic Engineer - Developer Experience at Uber with Gautam Korlam

Starting point is 00:00:00 Vibe coding. This is a brand new term. What do you think it sounds for it? Vibe coding is just you trying to figure out how the system should behave when you're prototyping. The good thing about vibe coding is you're able to like basically iterate much faster with this agentic loops. That now you're just focusing on how would I actually achieve my outcome rather than the exact way to do it because you can always change those details later.

Starting point is 00:00:21 If you have the right abstractions, you could swap layers out. So I think that's changing the way people think about software because if you're someone who has taste and knows this is the outcome I want to see on the U.X. You may not even need to know a framework. You should experiment with it till it feels right. And of course, you need to massage it to make sure it doesn't make it harder to your wall later on. I think a lot of the software today gets you to the initial.

Starting point is 00:00:41 It looks great, but then how do you maintain it is basically sort of left for later time. And I think a lot of people who are still coding with AI are early in the prototype phase. Gotham Coral and worked at Uber for almost 10 years, joining as Android engineer number 8 and was the founding engineer for the mobile platform team. He grew from software engineer 2 level all the way to principal engineer and worked on developer experience and scalability challenges throughout his decade at Uber. Gotham is now the co-founder of guitar and a Gentic AI star that automates code maintenance. In today's conversation we cover how Gotham accidentally deleted Uber's job on Monor repo, which had thousands of developers committing to it. Uber's in-house

Starting point is 00:01:17 engineering tools like Monor repo, Submit queue, local developer analytics and death pods. AIs impact on software development, why vibe coding will spread and why Gautom believes Junior Degers will thrive with AI tools. Uber's engineering culture is very unique, even across tech companies, and this episode is a perfect inside look of its early days. If you enjoy this show,

Starting point is 00:01:37 please subscribe to the podcast on any podcast platform and on YouTube. Welcome to the podcast. It's great to have you. Oh, thanks so much, Gary. Glad to be here. Thanks for having me. It's good to reconnect, because we talked a lot while you were at Uber. You know, you were on the platform team and your work touched a lot of the systems that we worked on. Every engineer causes outage that Uber.

Starting point is 00:02:01 I did it. People on my team did. What was a memorable outage that you might have caused accidentally or deliberately? So this is a funny story. Towards the end of my career there, I accidentally deleted the Java monor repo at Uber. And there were thousands of devs committing theirs. Yeah, the Java Monor repo, that was

Starting point is 00:02:24 one of the, we had two two big back in monoripos, right? Go and Java. Go in Java, yeah. So this was after we did the migration on, like almost everyone using the Java Monor repo. Yeah, it was pretty funny because I

Starting point is 00:02:37 what the story is essentially, I don't think many people know about this because I record pretty quickly, but I was trying to test something out like a test repo and I was like, hey, I need to copy like the Git push command because we created the repos on push. So I copied the URL from the Javamon repo.

Starting point is 00:02:57 I forgot to change it. And then when I pushed, usually at Uber, they prevent you from force pushing. And it said, hey, I can't force push. And I didn't really read the message. Then I was like, why does this. I should be able to force push. And I did with some special flags. And there's only a few people, maybe like three people.

Starting point is 00:03:17 at Uber who can force push with that flag because I was part of the platform team and we didn't want anyone else to do it and I pushed and then someone brings me on Slack, hey, so the repo says initial commit, what's going on? So you were at a principal engineer at this point, right? Like you were the most senior people and you were there for a long time, right? This is like not some, you know, you didn't know what's going on. So I mean, that's why I had the permission to to push. So whole repo gone for until you reverted minutes later. It was. It was. It was tricky because we had to get back from a backup. Yeah, it was not too bad.

Starting point is 00:03:52 It was like a few minutes. And I think we had an instant review. It was luckily at a time and we had a submit queue, so the queue would pause automatically and said, hey, no work is lost, right? We had really good backups in place. So we reverted quickly. And I was like, ah, got to be very careful with those like copy paste commands. Wow. Well, but I guess you tested the backup actually worked.

Starting point is 00:04:16 You should have just said that. It was just a test. I want just to see if the... Stress test. ...actually worked. Yeah, chaos engineering. This episode was brought to you by Sentry. Buggy lines of code and long API calls are impossible to debug,

Starting point is 00:04:30 and random app crashes are things no software engineer is a fan of. This is why over 4 million developers use Sentry to fix errors in crashes and solve hidden or tricky performance issues. Centricos debugging time in half, no more soul-crushing lock sifting, or vague user reports like, it broke, Fix it. Get the context you need to know what happened, when it happened, and the impact, down to

Starting point is 00:04:54 the device, browser and even a replay of what the user did before the error. Central will alert the right div on your team with the exact broken line of code so they can push a fix fast or let Autofix handle the repetitive fixes so your team can focus on the real problems. Sentry help Monday.com reduce their errors by 60% and spend up time to resolution for next door by 45 minutes per dev per issue. get your whole team on Century and Seconds by heading to Century.I.O. slash pragmatic. That is S-E-N-T-R-Y-O-S-Pragmatic.

Starting point is 00:05:27 Or use the code Pragmatic on sign-up for three months on the team plan and 50,000 errors per month for free. You reverted, thankfully, not, you know, like nothing crazy happening, but this is a pretty big deal. How, you know, like, how can we imagine? Like, what was the response of, like, your manager, the incident interview? I mean, yes, sure, you were an experienced engineer, but come on. This was a big deal, right? So the good thing was since we had the summit queue, and we can talk about it a little bit more,

Starting point is 00:05:53 the only thing it affected was the amount of time it would take for a commit to merge onto Main. So usually everything would get serialized. So all that it looked like was there was a little 20-minute delay where we recovered everything, and then everything just floated naturally. So it was just like a random CI failure you would see, but we had so much recovery automated in place

Starting point is 00:06:14 that it was almost a non-event. and the only SLO we broke was the latency SLO, no reliability SLOs were broken. So that was pretty good. Like all the systems we put into place in the many years actually worked to save me that day. Okay. And this is a good way to say into the next one,

Starting point is 00:06:30 Uber's unique engineering stack. I worked at Uber. I worked for a lot shorter time than you, but we overlap. You know, you were there when I started and you were still there when I left. Uber had a lot of unique systems. Can you talk about some of them

Starting point is 00:06:45 and how they came about it, and why? Why did Uber build so much of its stack? Well, Uber, and also you built a lot of that stack, right? Yeah, absolutely. So when I joined Uber in 2014, I was like Android engineer number eight, I believe, if I remember correctly. And there was no unit test. So I wrote the first unit test for the Android code base.

Starting point is 00:07:04 Then I set up an artifactory. Yeah, it was pretty funny. We were shipping so fast that people were just, you know, integration testing using their mobile phones, because it was just very, very fast. growth. This was in 2014, 2015? 2014 is when I joined, yeah.

Starting point is 00:07:20 So then I brought in artifactry so we could share stuff. And over time, what ended up happening was a lot of the stuff you take for granted today, the cloud native SaaS products for observability or hosting source code were just not built for our scale back in the day. So we had to build a lot of stuff in-house to solve some very crazy problems. So we had commits going in almost one, every single. minute, I think, and that might not seem like a lot, but when you have thousands of engineers working on code basis where you can easily go into merge conflicts, that was a big deal.

Starting point is 00:07:54 I hear a lot of companies saying it, or at the time it was popular to say, it's not built for our scale. What was that scale? I mean, you mentioned the one one commit per minute, which I can actually attest to that, that build systems could not handle, at least mobile bill systems could not handle that, but what other scale was considered just like way bigger than what commercial vendors supported. Yeah, absolutely. So I think the bill system part was definitely one because bills are taking just way too long.

Starting point is 00:08:22 I think we embarked on like a big app rewrite, I think, when you were there. I think you talked about it before. He was funny. I was actually in Amsterdam and that happened in between. And then my boss calls me up, hey, this build time is like getting crazy. We need to somehow bring it under control. Oh, that's you. Yeah, I was there.

Starting point is 00:08:41 And it was almost an hour. something plus. So if we actually sealize all the commits one by one to make sure there was no merge conflicts, it would just take forever. And if you did not do that, our main bench would be red. We had this graph where it was like red, green, red green, red green, and it was impossible to work because there were so many teams trying to hit their deadlines. And you were depending on so much work from the other teams, like the networking library or the platform experimentation or a feature library from some other team. It was just not possible to do this with our green main, essentially. So it's not just the

Starting point is 00:09:15 build system time, but also the fact that when you push something, if you don't run tests against the other part of the code, then you're just not going to have a good time because you will just be stuck reverting stuff. Someone would be on call all the time. So it was a combination of factors. It made it very hard to work with. Yeah, and also, like, we did do that on that project specifically. We turned off end-to-end test, I think, for two or three days

Starting point is 00:09:39 to speed things up and it sped things up. And then we had so many regressions. It took more time to, I think, you know, for two or three days, we turned it off. And for the next week, everyone was fixing the regressions that kind of sneaked in underneath. Yeah, I think we got a request at some point, can you just turn off CI? And I was like, you know what's going to happen if we actually did that and I had to really talk people down saying you cannot just turn things off. It's going to make things worse. So there was the submit queue.

Starting point is 00:10:09 Can we talk about what Submit Q was and why it was unique, at least initially to Uber? Even to his date, I'm pretty sure only a few companies have anything like that. Yeah, I think Submit Q is essentially a way for you to guarantee a green main. So it's a way to kind of serialize your comments coming in and make sure that they play nicely with each other. There are some more open solutions now, but when we did it at the time, we had actually a paper that we published. It was fairly novel because we had. very low tolerance for main being red. So we'd always want to guarantee a green main,

Starting point is 00:10:46 which means we had to figure out how do we actually test changes coming in and make sure that the cross-dependencies between different commits were considered. So it's just a way to kind of handle commits at scale and make sure that when everything merges, everything still builds with each other. And that's a trickier problem than it sounds because we had to test things in combination, discard some paths, and do a lot of processing on the back end. but there was just one of the systems. I think we had to build a ton more,

Starting point is 00:11:12 which I can talk about as well. I was the consumer on this side. I was just using it. It was just a built system for me. But I didn't realize for quite a while that, you know, because my team, the 10 of us who were working on it,

Starting point is 00:11:24 we didn't really like have merge conflicts with each other because we were working in different parts for the code. But it took me a while to appreciate that when I was pushing something, like at any given time, there would be maybe 10, 20, 30 people on the same code base in mobile working, and each build would take 20, 30, 40 minutes. And then behind the scenes, I actually, I understood only when I read the paper that you were

Starting point is 00:11:49 looking, checking, like, is this code path in conflict? If so, which one should we build beforehand? There were like all sorts of probability models. There's a bunch of math in that paper as well, right? There's a lot of ML models to, like, I think we had to, like, estimate what would potentially cause a failure and speculatively try to go on paths that might. be green to try to optimize and then backtrack if you didn't make that goal and we would make that model better over time.

Starting point is 00:12:15 So it's quite interesting. I think if you didn't notice, that's a good thing. It means that it worked as it's supposed to do. One thing that Uber did as an engineering practice was monorepos. Can you tell us how it's Uber had multiple monorepos, but it all started with the iOS monorepo. And I recall you were there. Can you tell us how the decision was made to go from having a bunch of separate repos initially and iOS later on Android and say like, all right, let's just do one big repository. And why did this happen and what happened next?

Starting point is 00:12:49 So it's funny. I mentioned, I think early on we didn't have tests. At that point, very, very early on, we had two apps, rider and driver on both iOS and Android. And we had another repository called library, which is a common component between the rider and driver app, which was just a. submodule he would pull in. And it was really painful to pull in as a submodule. So we said, hey, let's set up some sort of cocoa parts artifactory thing. You could download stuff from somewhere.

Starting point is 00:13:15 But as we grew, we realized we couldn't put everything in one place. We had a networking team, experimentation team, a analytics focus team. They all wanted their own sort of nice little playground so they could experiment. So we went off and had hundreds of repos for a while. But everyone was just, you know, unaffected. by everyone else could focus on their priorities and deliver. But then when you wanted to pull in these changes, let's imagine the networking library changed.

Starting point is 00:13:44 You had to update networking, then the analytics library that relied on it, then maybe the experimentation, maybe some other feature library, it was really painful to upgrade and pump. And we were sort of on the hook because anything that the product teams don't do, the platform team ends up being the kind of like catch-all.

Starting point is 00:14:01 And you're like, hey, this is not going to scale. I think someone on the iOS team was like Hey, we should just go to a Monter Report. We'll just take a weekend. Famous last words. I know, but they took a while. I didn't take a weekend. But the iOS team did it and it was very effective for them.

Starting point is 00:14:20 It was a lot of pain initially. And I think happened the same time they were moving to a new language Swift as well. So that was like double whammy. They had to do both. They pulled it off and the gains in productivity were massive because you don't have to go bump libraries because we were not an open source, you know, make sure the interfaces are nice for external consumption.

Starting point is 00:14:40 It's one company, one code base. Why can't I update my library's public API and just update all consumers in one shot? It should be possible. So mobile did it. And then Android quickly followed suit. We actually froze the code for a weekend and we did everyone over. We moored everyone over. And it was painful at the start because the build systems, as I mentioned, were not

Starting point is 00:15:02 meant for hundreds of modules in the repository. We have seen some changes to Google to make it better. And then after a while, we migrated to a new build system buck from Facebook at the time. And we wrote some auto conversion from the existing system to the new system. So we had both at the same time. That was really successful because once things were fast, the NPS improved again, monorepo is good. So the just is monorepo, non-monerpo, if you don't have tooling, it's going to suck. at a bigger scale you have to invest in tooling

Starting point is 00:15:34 as your companies grow typically this need becomes more and more obvious as things start slowing down and then later we had the jaw monotipo and we migrated over to Basel and we had the Go Monotepo and I think what we realized with the monoripo was that the main thing that helped us do

Starting point is 00:15:51 was it was standardized which meant to make a big change one team could go do it like a centralized team could just take the burden of updating this one networking library for everyone or updating Google Play services for everyone. You just cannot do that in like hundreds of repositories. It's just very wasteful amount of work. And what was the biggest pushback that the teams initially had? Because I remember the mobile ones that they actually happened and it was

Starting point is 00:16:16 you know, like very practical. But then when the Java and the goal was happening, I remember a lot of teams were dragging their feet. Like I'm not sure we want to do this. It's going to, you know, slow things down. Do you remember what the biggest, you know, like, skepticism was and then how it turned out? Yeah, the biggest skepticism is usually right now, before Monrepo, I could break an API and not worry about it. The tax is paid by someone eventually. It's just deferred.

Starting point is 00:16:44 It still needs to get paid. But in the Monrepo, if I do it, I have to think about my interface design more. And my bills are going to be slower for this argument. With the new bill system, we kind of said, hey, the bills are not going to be slower. But people would always come up with reasons saying, hey, if I did this upgrade then I had to do it for everyone so we said okay you know what

Starting point is 00:17:04 we will make it possible for you to just upgrade your part of the code in some scenarios but otherwise we want to have everyone be standard it does make things a little slower but then as a business it makes a ton of sense because you might not see the macro picture on your team that you might be slowing slow a little bit but everyone else gets like a huge productivity boost

Starting point is 00:17:25 because they didn't have to spend the time you did on each and every one of those teams to like get up to speed. Yeah. Is it safe to say that monorepos often, not always, they, they bring kind of a almost necessary friction where like, you know,

Starting point is 00:17:42 like making certain changes will be painful because they should be painful. Like because when they were not painful, they were masking something down the road. That's a, you know, I update a dependency to something else. And then it has to trickle down. Otherwise, you'll have these problems,

Starting point is 00:17:58 but they'll only come out in weeks. or months later. Yeah, to be fair, like, you can also do stuff without the monorepo. Like, I know companies like Amazon, Netflix, do multi-repo setups. But they build tooling, especially to handle those cases where you have golden version sets that work with each other. It's a little bit like a monorepo without being a monorepo. So you can make both work, but monrepo is just generally easier, in my opinion.

Starting point is 00:18:25 Just having considered both. But if you want to go the other route, you absolutely can. You still have to invest either way as your code base goes. And what were some other kind of novel slash innovative things that, I mean, it sounds like you had to invent some of these things at Uber because there was just not a solution that you could buy or it was just really expensive or not built for this use case? Absolutely.

Starting point is 00:18:48 So the other thing I think which is interesting for us is, as I mentioned, build times had a problem. We only knew because we first tried to measure it. I think back in the day, a lot of the measurement products for developer observability, but just not built for the entire STLC. So what I mean by that is, like, you might have like a CI Jenkins back in the day that would tell you how long your bills would take,

Starting point is 00:19:08 but you had no idea how much your indexing time would be in your IDE, and we had very complex projects. Or you had no idea what was the time between pushing and code review. So we built a system at Uber called Local Developer Analytics. We call it LDA for short, which was like a little demon that used to run on your machine, collect information about your system, like what's your CPU usage, memory usage, but also like integrate deeply into a lot of the

Starting point is 00:19:34 CLA tools, the IDE. So we would, for example, know when you opened your project, which file you went to, which part of the file you were actually like coding the most, which files would have the most bugs. And then you could also see like a funnel, like in a traditional product funnel because we would care about like developers as like a end user. You would see like analytics saying, hey, there was a drop in the funnel. People could not create PRs because this.

Starting point is 00:19:57 part of the build process would error out more often than the others. So you would go and know what to target. And it's fairly novel because I gave it a few talks at a few different conferences and not even big companies had something like this back in the day. And I think we still use that today to power a lot of the dashboards and the deep analytics that we get from the developers, for close. Yeah. Like what do most, like you've been in the developer productivity, you know, you talk with a lot

Starting point is 00:20:24 of people and know a lot of people, but what do most companies do? Do they just ask developers, like, how does it feel? I think that's a very important thing to do for sure, because even if you have all the measurement, if the development doesn't feel right to you, you still want to ask that. Like, you want to have a survey, which we did. In fact, when I remember when we started doing the surveys,

Starting point is 00:20:43 I think we had like an NPS score, negative 50 something. And when I left, it was like positive 8 or something like that. So I call it as a win that we did a lot of effort. So you do your survey as the very first step, which we did. but then you measure the easier things like your commit time to like review code or time to like build code stuff that's out of the developers control

Starting point is 00:21:03 you want to like minimize and also time spent in meetings like those are things that you could probably get easily but the deeper stuff is only worth it if you see that those bottlenecks are adding up like if you're making changes but you're not seeing like people are happier because you'll always see developers complain

Starting point is 00:21:20 build times are slow or ID indexing is slow that's when you know okay you need to measure things but now I think product like Vosco JetBains have better analytics. I think we've worked with JetBains, for example, in trying to make them understand, hey, these are some of our problems. It'll be great to have solutions out of the box with these.

Starting point is 00:21:36 And some of those did make it upstream into products that everyone uses today. So we call that as like we kind of had to forge our own paths and eventually all the vendors would catch up based on what the industry was doing. Yeah, but it's interesting how back then there was nothing. and you kind of like pinpointed. I guess one advantage that Uber always had is, A, we had a lot of developers, and there was actually a dedicated, you know,

Starting point is 00:22:02 what was your team's name? Was it developer platform? We were called developer experience. They had multiple names. I think we were mobile infrastructure first, then mobile developer platform, then developer platform. That was the final name we had.

Starting point is 00:22:14 But funnily, actually, the team was not that big. We had my team was about 10 people, and we supported about 1,000 engineers, and that's roughly the breakdown. So you would see that speaking to peers in the industry, the teams are about the same. You have a very small, quote, centralized team, and then you have a much bigger product organization that they support. So it's very common.

Starting point is 00:22:36 And that means that you have to find ways to scale yourself and be effective. And then there was another project that I remember that was just taking off just in my last year. It was called DevPods. What was that? So I think a lot of people might have heard the term cloud developer environments or developer environments in the cloud. So we call these tabpots and what it is essentially is a container of your code, your build system, artifacts, your IDE indices in the cloud. And what we did at Uber was fairly unique in the sense that we figured out some ways to make it boot really, really fast and have pretty much everything pre-index.

Starting point is 00:23:19 And it is one of the most loved products there, because you could essentially multiplex yourselves now. Rather than trying to switch branches, reindex your code or context, which you could have 10 different dev parts working on different features. And they could have everything warm. You could contact switch quickly. You could kick off a build and go do something else. It was actually great. I think we put a lot of engineering effort to make that scale really well. And it was a fairly small team as well.

Starting point is 00:23:46 Actually, a bunch of those folks are still in Amsterdam now working on the project. And can you just explain? Because I think most companies just do not have anything like that. Like as a developer, what can I imagine? Like, okay, I joined Uber. DeathPots is there. What does that mean for me? So when you're on board, previously, we used to have this like bootstraving script you would run.

Starting point is 00:24:05 It's funny because I remember this now. We have a bootstrap shell script that everyone just keeps updating. There was a shell script. And then which usually worked well, but sometimes it crashed. And then we would have to, you know, ping the on-call. of your team, but it was a pretty involved script. Like it would take sometimes minutes to run. Yeah, a lot of magic under the hood.

Starting point is 00:24:29 And the problem with that is like it's hard to keep it forward and quite compatible with your system updates on Mac. So it was definitely like not worth maintaining over time. But the DevOps is interesting because I think just when we started seeing all these solutions like VS code remote or JetBrains projector become a little bit more mainstream, we said, hey, why don't we move this development stuff? into a container, it should be easy, right? But it's actually much more than that.

Starting point is 00:24:55 These containers can be huge, multiple gigabytes, right? Because some of these artifacts and indices are huge. We had to make sure we also use compute efficiently. So we would have to have multi-tenant because developer workloads are very spiky. They don't always have like a stable pattern like a server workload, which means that you have to like understand when to scale up and down. We have developers in different time zones. So we have to also like provision machines close.

Starting point is 00:25:19 closer to them. So the bill caches would be warm closer to where they're working. Otherwise, you don't upload everything from a data center in the US, but you might be working in India and the latency is really bad. It's a lot more nuanced, I think, than just spinning up a container. A lot of the solutions out there now are just focusing on contentizing stuff, but not thinking through a lot of these other factors that are important. So we went through and made a lot of improvements here. Some of that stuff actually, one of the funny things I remember is I think JetBains had this like one shared index thing that they had a while back where you could download an index of the project. But I think the way they were doing it previously,

Starting point is 00:25:59 you would have the full path of the system in the index. So you would download a denormalized path and then you would normalize it to your system. Just populating all the absolute paths would take a really long time. We measured it was like take an hour to actually like index that shared index for jetpaints. But then on DevOps, what we did, we just said, hey, everything's in our control.

Starting point is 00:26:21 Let's just put everyone at home user. So it'll be like closing the laptop and opening it again. So we could just throw the cache there. Everyone was at home user. So now no more like time spent like de-normalizing the cache, which meant we had like a six second boot up,

Starting point is 00:26:37 which is really insane for like a dev environment. So you would come in, you would say a devaport start something and you're done. You don't need to do any bootstrapping nor learn something. And you would have one flavor per monetary for you to work on. So a data engineer or like a background engineer or mobile engineer would have their own

Starting point is 00:26:51 flavor. And that's very optimized for that. So it's basically you go on and say like that pot start. And then in a few seconds, you have your web, sorry, your local editor. You're using your local editor, but it's kind of like connected magically from your perspective. So you feel like you're working locally, but it's just all set up. It works. You can build immediately.

Starting point is 00:27:13 You can run your tests, et cetera. Yeah, you'd have like lots of course as well. you could run a lot faster than running it locally. That's pretty awesome. I previously did a coverage on, did a deep type on cloud-based development environments, I think, two years ago. One thing I'm noticing is there's not much talk about these things, even though they're just so darn efficient. Do you see kind of excitement in this space or is just AI right now, you know, blurring everything out? Because to me, this was a really big win.

Starting point is 00:27:43 Yeah, I think the environments are still good. I think a lot of companies do want to use this for efficiency sake. The challenge usually comes with, I think the understanding that I have, talking to peers in industries, if you start to start with the solution rather than the developer workflow, hey, we have a container environment, make it work for your developer workflow versus you have a particular way to work. Let's make it work in a container or VM. So we took the other approach.

Starting point is 00:28:08 And that is why we were so effective. Hey, these caches had to be like super fast load. Other stuff can come in a sync. and not everyone would have that inside. If left to their own devices, teams might devise a Docker file that has stuff, but then how are you getting it up to date as your team churns out more code,

Starting point is 00:28:25 are your indices actually getting up to date? Otherwise, you're spending a bunch of time after bootstrapping to re-index stuff or redownload stuff. So you'd push these updates every few hours completely transparently in the background. So it was very much like a golden path that we had to like kind of happen upon.

Starting point is 00:28:41 And that's why I think if you have that knowledge of the entire developer workflow, you can adapt it to each company's individual working style. If you give a general purpose thing, it'll work, but it may not actually work so well out of the box and you may not have the investment you may want to make to make it work. Yeah, it sounds like there's no free lunch. You need to put it in the work. And Uber had like so many years of like understanding, investing, like being there. Speaking of years, you spent nine years at Uber. And when you joined, you were an inch two, right?

Starting point is 00:29:12 Or an inch two I joined Or one I guess Yeah What is the lowest level Whatever the lowest level was Yeah And then when you left

Starting point is 00:29:21 You were a principal engineer Which was Which is there's not many principal engineers Like several Like a few dozen or something like that So you were promoted Actually yeah Maybe a couple dozen

Starting point is 00:29:31 Yeah not that many Out of the closer to two Three thousand people working in tech And so you were promoted I think I counted four or five times In nine years Can we talk about your journey.

Starting point is 00:29:44 Like, how did you do it? And what did you learn on the way? Yeah, absolutely. I think one of the cheat codes is obviously joining early healths a lot, especially for getting on a rocket ship. But it's hard, right? Because when I joined, we didn't have as many people. The platform teams were tiny. It was so small. It was funny because I jumped straight from entry level to senior.

Starting point is 00:30:06 So I skipped a level because the first year was just put through the pieces of we need to get a lot of stuff done. and I was shipping really fast, so that helped, obviously, one jump up. But it gets harder over time. I think your roles also change a lot over time. I think one of my good friends early on in the platform team said, hey, if you can get into a niche and go deep, it can really help over long term, especially stuff that people may not want to do, like developer platform, telepooling, not everyone likes to do it. But if I said, hey, I really like this because I'm passionate about it, that helps because then you become the go-to person for that.

Starting point is 00:30:42 aspect. So as a company grew, it was very obvious that, hey, I had to scale myself to larger and larger efforts. So starting on mobile, then I started working on pack and stuff, and then on more holistic entire organization stuff. And that takes like kind of, you know, breaking your own comfort zone a lot, like challenging yourself every year or two. So every two years, I kind of do some introspection and say, hey, am I doing what I want, do I enjoy what I'm doing? What can challenge me more? And then, I try to go for that. And usually the path opens up if you go that way rather than trying to chase like a particular level.

Starting point is 00:31:19 Because then when you're pushing the boundaries, the next level kind of becomes obvious. And then past a particular point, it just also helps to have a lot of connections, like social capital, mentorship. It's very important at like a big company. So I definitely went and got some mentorship with some more senior folks. That helped a lot to understand how these principal engineers operate and what's required is more than just the engineering. it's also the business side of things. How do we actually solve problems and focus on the business metrics and bring that to engineering rather than start from bottom of?

Starting point is 00:31:55 And when you say, you know, like you need to get some social capital and mentoring, like it can sound a little bit of hand wavy to people. But how did you how did you get that? I'm not going to ask how you went about it. But it sounds like you didn't say like, oh, I want to get social capital. But what were things that you did that you think actually helped? build up and people saying, oh, you know, like, like Gotham, I really trust this person. I know him. He's a go-to person, et cetera. Yeah.

Starting point is 00:32:20 So one thing that helps is I have this habit of just helping people a lot. So when people come to me with like a problem with their dev environment, I would just drop everything and say, hey, let's start to figure it out. I'll spend a little bit of time, time box it. And if it fixes the problem, then I have some social capital. It kind of accumulates over time because they would send someone else to you. And over time, you can automate some of this stuff. And I think I was one of the early people on the platform team to have office servers. So I would say, hey, anyone can come in and just talk about your problems.

Starting point is 00:32:50 And we would, you know, just empathize a lot with everyone. Say, hey, I know this sucks. They're working on it. It helps a lot to humanize the problem, right? Because developers usually being you on Slack. And over time, what happens is like as you solve more and more problems, it's always good to like attempt. And even if you can't meet the expectation, just say, hey, I can't solve this right now.

Starting point is 00:33:09 then try to say, okay, that's not in my area, right? Let me just try to help you out. And that really helps build trust with the other side. And then when you go for mentorship, it's like, hey, I clearly know you have the right sort of like mindset, you have the right goals in mind or like the right intentions. So I would love to help you kind of go to the next level. So that's usually what helps get very strong mentorships. Yeah.

Starting point is 00:33:34 And I can just plus one a little bit on the helping people because, you know, I was in Amsterdam, which is a, we say it's a distributed side, but we were just not an HQ. We were a relatively small office, you know, like compared to SF or even Powell also in headcount. And we felt a little bit isolated. And sometimes, like a lot of times, we would feel blocked by SF in, in reviews and in anything. And after a while, I knew that if I wanted to, you know, you were one of the many people on the platform team, but I think either someone told me, like, there was this thing, like, if you if you cannot get it if you cannot return anyone on the platform team try gotham like he usually responds and just thinking back it was just well to me that like i did get responses from you know

Starting point is 00:34:19 sometimes and uh like unexpected for like unexpected like i i didn't think i would get anything i was just trying but you did like spend time on this and i just imagine that you had a lot of these things and i'm not saying you necessarily did it all the time and obviously people like me were doing it sparingly but it builds so much goodwill and also trust of like, wow, like, I just felt that whenever you did go on a problem, you actually took it seriously. Like, you, you kind of did it. And as you said, often you would tell like, I cannot do this right now or it's not an on issue, etc. So I didn't see everyone. In fact, I saw pretty few people doing it. So like it's, it's interesting how it feels like something that would not scale. But I guess to some extent it did. Is this

Starting point is 00:35:05 like the start mentality of like, you know, do things that don't scale? Yeah, I think it definitely doesn't scale in all cases. But you can make it work. I think it's funny because some people won't even try because it won't scale. My suggestion is like, hey, let's meet the person and see it may not even be an issue, right? Sometimes what you might discover is people are just finding it hard to find the right documentation because they just looked at the wrong guide. And that's where they're coming to you.

Starting point is 00:35:31 Then you can just go make that fix and then you have much larger impact. But if you don't talk to people, you have these broken windows. Like, you have the big picture, but then you lose touch of what's actually happening on the ground. It's hard. So I think my policy was always like, hey, I can always talk to someone, spend a few minutes. And if it's like a very esoteric thing, I'll say, hey, I can get back to you. But if it is not, I would rather just point you the right way. People joke when I left because they're like, hey, we should go build Gautum LLM because he would know the answers to like, you know, pointed things at least so that we could find the right information.

Starting point is 00:36:04 So I saw like an encyclopedia of where to find stuff. And you were a principal engineer towards the end of your tenure at Uber. What is a principal engineer at Uber? And also previously we had an episode at Meta. Meta has archetypes above staff engineer. Do you think you fit into an archetype or did you see kind of archetypes at the principal level at Uber? Yeah, I did listen to that episode. So that was very, it's a very good episode, by the way.

Starting point is 00:36:35 I think there is some resemblance to an archetype, although we don't actually correct that. So either you have depth in a particular area or you have a lot of breadth. And sometimes it's a lot of internal influence or you might have a lot of external influence. So it really depends on where you want to take your career. I focus a lot on depth. Like, as you can see, my specialization is like productivity and have some amount of breadth, because of all the social capital and talking to a lot of people usually makes you understand

Starting point is 00:37:06 what are some of the business problems that Uber is going through. So that was actually a good bonus for me because if I never talk to people, I won't even know, oh, these three teams are trying to solve the same problem. Maybe we can centralize it or do it more efficiently and I could come up with solutions and people be like, hey, how did you know? That would work because you just talk to people. And there were definitely lesser archetypes probably than meta, I would say, but usually it's a bunch of breath and depth and having people to sponsor you helps a lot. So the mentorship angle is also part of that. If you're working with more senior engineers and they see you actually grow, they can give you

Starting point is 00:37:42 feedback where you might be lacking and you might be able to kind of work towards that goal. So I think that's one of the things I would recommend anyone who wants to go in that path. With principal engineering, it's a lot more you have to enjoy like understanding how engineering meets business than just pure coding or just pure, what do you call it, not soft skills stuff. A lot more soft skills need to be honed at that point. It's like going and giving talks helps. And it's funny because when I was growing up, I was not very good at like, you know, public speaking, but I made myself over time more comfortable.

Starting point is 00:38:18 That helps also like build external influence. And a lot of the early folks that are using our product right now are people I met in the community and then they kind of have that trust that they built over time. So this is a lot of people who are using your product right now at guitar, right? Yeah. Yeah. So I guess that's a good reminder of like how, you know, I guess it's when you're doing something, it might, hopefully it'll help you outside of when you're at your current

Starting point is 00:38:48 company and also help you while you're at the company. Yeah. My tip is generally, if you really focus on the people side of things, you'll do. really well in these higher levels because it's a lot of relationship management both managing up and down and shielding your team from some of the noise. And if you don't enjoy talking to people, it's harder. I mean, you can still do it. There are people who do it, but it's a very unique path and not everyone can do it. But the most common path is usually like understanding and spending time, like talking to people and figuring our solutions that are more creative over time. And that also

Starting point is 00:39:23 kind of makes you a problem solver, not just like someone who can just, you know, code and finish up things up. But that's, of course, like, you can have different archetypes, as you mentioned. So it really depends on where you want to take things. Yeah, but I guess at the principal injury level, it's kind of given that you're just, you can code efficiently whenever and however is just like you might not do it all the time by choice, right? It's funny because there was a graph someone sent me of like number of years of the company and number of commits, year and number of reviews. And there was like a little graph,

Starting point is 00:39:57 like an asymptotic graph. And they were two or three dots at the very extremes. And I said, hey, one of them is me. The other one is you. Then I was like, damn, I should ask my boss for a raise. So you were still writing a lot of code and doing a lot of reviews? I do write a lot

Starting point is 00:40:14 of code. Even here at guitar, I write a ton of code. I love to review stuff because it really opens up your perspective of what else is happening. I like to comment on documents. And understand where there are inefficiencies potentially. But it just over time, I think you have a muscle to just do it. But it does not mean that you write tons of code.

Starting point is 00:40:33 It just means that you have consistent higher impact code. So a really interesting topic is, given that you were a principal engineer, which was either the highest level or maybe there was one level above it, but there weren't many levels above that. How was it like for your manager to manage you? And I know you also talk with a lot of principal engineers and you have some thoughts on what are some tips to manage these very, very senior engineers? Like, you know, can you help explain your relationship with your manager? Was it really a kind of a boss or more of a partnership?

Starting point is 00:41:06 And, you know, what worked at this level? I think that's a really good question because I think as you grow in levels, you're more and more sort of like a peer to your manager. You help your manager get stuff done too because you have broader influence. sometimes it can be hard as an EM to you know like push for a particular priority because there's not enough senior people looking back right it might be just your directed person that's sponsoring that but having someone on the team that's senior enough can completely change the odds for example to get something's prioritized one tip I will give you managers is like if someone's asking you hey I'm at a senior staff or I want to

Starting point is 00:41:48 become principal after like the senior levels if they come asked you it's hard you should probably not ask that question to a manager because at some point you will have to figure things out. There's very few managers who probably can help you after a particular point because they maybe done it before or have a lot of experience at the company. But most managers tend to work at like senior or senior plus level. And past that, it's more like they're almost like the principal engineer on the team is load balancing a lot of stuff technically. So the manager can focus on uplifting everyone else. So the relationship is very much like a peer, like more than like, hey, just a boss.

Starting point is 00:42:28 But it does help to get feedback. I think their managers are a very good asset when you want to do like non-engineering stuff. Like for example, you want to get stuff unblocked and the engineering angle is not working because you try to rationale with some team. They won't listen. Sometimes you just go to take the, hey, this is the priority when you get this done, to unblock yourself.

Starting point is 00:42:52 So that's what I would say for EMs who might be looking at higher level engineers. Basically give them agency, but of course, check in often and make sure that they're unblocked, because they can do wonderful things

Starting point is 00:43:07 as long as your org is not getting in the way from them trying to actually have impact. Yeah, this is always interesting because I don't think we really talk about us too much just because there's not many of these engineers, but even when I was a manager, I actually promoted someone,

Starting point is 00:43:25 actually two engineers, a level up where I was, right? I mean, it was a bit easier for me because I saw examples, but it's always interesting. And I feel it kind of goes hand in hand. Like for these things, those promotions really help usually the manager as well.

Starting point is 00:43:38 And as you said, for a lot of managers, it's a first. And there's no, there's no playbook after senior or staff. It's all unique. It's all based on the business needs for the company.

Starting point is 00:43:49 in that specific area. Yeah. But if you're able to keep up with the engineer, then they can pull you up as well with them, because then you can become higher level and you can focus on bigger picture stuff. Because you hear a lot of stuff from an org or a company perspective from the engineers

Starting point is 00:44:05 because just talk to so many people, then you might have opportunities that you may not have if you're solely focused on your team. Before we jump back into the episode, I want to let you know that the audiobook version of my best-selling book, the software engineer's guidebook, is out now. You can get it on all major,

Starting point is 00:44:19 audiobook platforms like Spotify, Audible, Liberal.fm, Apple Books, Google Play, and also DRM-FreeMB3 files. I started writing this book after a decade of working as a software engineer when I was working as an engineering manager for a few years already at Uber. Here's a review from the book from senior principal engineer Tanya Raley, who is the author of the staff engineer's path. From performance reviews to P95 latency, from team dynamics to testing, Garragay Divisifies All Aspects of a Software Career. This book is well named. It really does feel like the missing guidebook for the whole industry. You can get the book at EngGuidebook.com

Starting point is 00:44:55 or search for the software-insured guidebook on your favorite audiobook platform. I hope you'll enjoy it. One thing we just kind of mentioned as natural is you are working on a platform team. But when you started, I'm pretty sure there was not yet a platform team. And then there was a split called the Platforms Split. It was a very famous email that I think later we saw. I only read it. Can you talk about what was before there was platform and program teams and how did this

Starting point is 00:45:25 split actually play out from your perspective? Yeah. So I remember when I was interviewing to join Uber, they asked me, hey, what do you want to do? You want to do some platform stuff. I said, I want to build tools for developers. And they're like, okay, you're hired because no one wants to build tools for developers. The longer version is the platform team sort of existed. We just used to call it.

Starting point is 00:45:49 I think mobile had it first, I think, but then the other ones. We just had unofficial platform teams. They were working on core infrastructure pieces. The platform teams were essentially made so that the other teams could move faster and not duplicate stuff. There's a lot of stuff that goes into building features. And mobile was just very new back in the day. So things were just not as well figured out as they are today. So you had to build a ton of stuff internally.

Starting point is 00:46:14 So if you're doing this on every team, then it's a lot of ways. So I think when we decided to the platform programs bit, we said program teams do need to worry about a lot of the underlying layers, similar to a cloud native environment. You worry about your business logic, your product metrics, and we'll give you all the building blocks, which could be a mix of open source wrapped in our own layer and our own architecture that makes so that you don't mess with the other features. Because everyone has their own way of building banners or their own way of like, styling things, not at all consistent for like an app like Uber. You need to have a common design language, so you'd use common components. Stuff you would do on a website or take for granted today just didn't exist on mobile. So it had to be done that way to kind of make things move faster.

Starting point is 00:47:03 So you don't have to rebuild like another widget or rebuild like another analytics processing thing and do something in the back end. So one pain point, or I wonder how big a pain point is, but one thing that looked like challenging to me, I was not on the platform team. I was your customer, but we always pinged you with questions. And as you mentioned, like a year ago or so, there were 10 engineers for about a 40,000 engineers. What did support look like and how, what practices do you figure out? Because I saw a lot of practices like office hours, on-call rotations, of where to bring,

Starting point is 00:47:39 et cetera, that I have not seen anywhere else. Like the platform teams were advertising these things. it was almost like, I don't know, like having a vendor almost. Like they were like contracts. They were actually SLAs. It was like published how fast you can expect things where to escalate, how to tax something with high priority, etc. I've never seen anything like this. And how did you figure out like support?

Starting point is 00:48:05 And how important was it? Sure. So the angle that I think we thought about was like, hey, for everyone else, the end user is a customer. For us, the developer is a customer. let's not treat it as like another internal thing. Let's really be customer obsessed. That was the value we had at Uber, that we extremely focused on the customer experience.

Starting point is 00:48:23 Developer experience versus the customer experience, which means imagine if your Uber didn't show up, you'd be very upset. So if you're built and finish, exact same feeling, okay, I cannot ship my code. So we need to make sure that there is reliability and latency guarantees for that exact same reason. So all the product, I think, like minded folks had, like very whole,

Starting point is 00:48:44 much horn in on like product analytics. We said, hey, let's think about what is important for the developers. We asked them things like build time. They said, hey, I want to make sure my CS reliable has no flaky tests. Can you make sure the flaky tests are not blocking me? Because it's expensive with like the bill cut. You have to chase the release train. And that's not fun.

Starting point is 00:49:02 I'm sure you've seen your team complain about like, hey, I had this flaky test. I had to resubmit this diff. And that's not going to merge in time with the release train. Yeah. And just like staying later, I'm not sure. I think the release train left on Wednesday. night. So Wednesday night was the cut off and or morning. I'm not sure. But I know people stayed longer on those days to cash the release train, meaning getting their their PR, as we call the DIF merged

Starting point is 00:49:26 on there. Yeah. It was as you say, it was it was a big deal. I didn't actually realize that you were like obsessing about it, but it kind of makes sense because after a while it wasn't an issue. Yeah, because that's when people complain a lot that, hey, I tried to cash the train, but then something was out of my control. The infrastructure. was flaky. So we missed this deadline. Now, I'm on the hook as a platform team to ensure a heart fix happens, which means a bunch of other teams are going to get disturbed because they have to retest all their

Starting point is 00:49:55 workflows. It has a lot of ripple effects, right? And this are not obvious unless you think about, oh, we cared about testing because the end user will see the impact if you don't test all these features again. Yeah. So for us, actually, the metrics are very important to have, like, we would have SLOs, we would publish them, we would try to keep them, we would review if we, like, them and then do incidents management and like retrospective so we could like avoid those.

Starting point is 00:50:19 A lot of the good tooling came out of that was to ensure high reliability, high, like low latency, which means you could ship product really fast as a company and the infrastructure would not get in that way. And that is why some of those metrics you see matter is because once we actually talk to the developers, we knew what actually mattered to them. That's what we focused on. So is it just, I'm just thinking about this, but is it safe to say that, I mean, I think we can agree that the platform team or the developer experience team at Uber did a great job.

Starting point is 00:50:50 But they did a great job because you ran it like a product team. Like you, as you said, you talk with the customers. You figure it out how to map how they feel or what they care about into metrics. You then monitor those metrics. And then you kept iterating on these things. Like if I replace this with, you know, like a paying customer is the same thing, right? Like that's what a great product team does. absolutely you have to run like a product

Starting point is 00:51:15 because then the incentives are good on both sides because if the developer doesn't like the product then you should you should kind of be like okay what do I do to make it better so if the company is shipping high quality product then we should also ship high quality product for our developers and that's how we took pride in making sure that our developers could deliver to the end user so yeah you're absolutely right

Starting point is 00:51:38 the metrics I think a lot of people might focus about at a high level may not be enough you're right you have to talk to people, you have to look at the funnel, where are people dropping off, like things like the onboarding, for example, for new engineers. Where would they have difficulty? Like the DevOps, for example, was like, hey, people would go look at these outdated docs and they would try to set up an environment and then just spend a lot of time. So having a devour just eliminates that. And then later you might have a situation where a build failed and there's a very easy quick fix, you should automate that. You should not ask the user to like go to it. If there's a linker failure,

Starting point is 00:52:12 in my opinion, we fail because a lynch should not completely either fix or not, just telling them, hey, this is a problem, is not going to do anything. You're so much pressure to ship product that if it's not blocking me, I will merge anyway. So if it's blocking me, if it can be auto fix, why can't you have auto fix? That's kind of how we took things. I like that. What is your take on measuring developer productivity? Your team was called developer experience.

Starting point is 00:52:42 I'm sure developer productivity came in. This is a very heated topic these days. It seems like for the past few years, and there's a lot of back and forth of what can you measure, what can you not measure? Should you measure diffs per engineer or per team? You probably thought a lot about this or have been in the middle of this. What worked and what didn't?

Starting point is 00:53:01 Yeah. So I think the very first thing we measured was just sentiment. So talking to developers, as I mentioned, the NPS survey was very easy because it was a very quick signal to see if things are in the right spot. If everyone's super happy, then there's no problem, but that's really the case. There's always problems that would pop up. I think as you mature your measurement, I think a lot of the frameworks we see out in the open are sort of inspired by what Uber did early on and took some of the good picks out.

Starting point is 00:53:29 I think when people talk about like Divers per engineer, they're not really saying, hey, you should commit more DIPP engineer. is something preventing you from moving as fast as you could potentially. Because I can tell you that engineers, if basically nothing blocked them, would love to write code. If you're an engineer, you want to ship code, you want to feel proud about what you shipped. There's no one who will come in and say, hey, I don't want to write these many dives. I want to write code because it makes me happier. And what we were looking at those signals for was, hey, do you have too many meetings?

Starting point is 00:54:00 Do you not have focus time? Because devs were usually a matter of focus time. So when we improved that, we saw focus time improved. I think when we had COVID everyone was working from home, the DevSpo engineer shot up, and everyone thought that, hey, people are working so hard, it is amazing. It's just because people just didn't have too many meetings initially. And over time when everything went to Zoom,

Starting point is 00:54:22 it kind of came back again down a little bit. It's usually a way to make sure that you, our infrastructure, is letting people move as fast as they can. And as you remove roadblocks, you're focusing on the right things. And in fact, actually, one of the things we measured is time for code review, more than anything else, the time to review was the biggest bottleneck. Because as you mentioned, you would put up a PR in Amsterdam, might wait for review from SF. That's already like a 10-hour delay. Yeah.

Starting point is 00:54:50 And I think the P90s were pretty crazy. Like, it would take days to get code reviewed. And that would just be the biggest blocker more than anything to ship product. and honestly like CI time is like a blip compared to the amount of time it takes to review right so in the big picture I think we work to improve turnaround time by nudging people by auto approving PRs that had minor changes that semantically would not change the meaning or the behavior of the pier that helped a lot because you didn't have to get another approval for making like a common fix or you're doing a small stylistic change for example so that helped

Starting point is 00:55:25 kind of unblocked a lot of those scenarios we had auto approve or auto land which helped because if you review approved, then it would merge immediately. So you don't have to rebase and worry about conflicts. A lot of these kind of small things can actually have a really big impact. But if you don't measure, you wouldn't know. So I think in terms of developer productivity, when you talk to your engineers, they might say, hey, I feel less productive because I can't do something. Then your metric should flow from that rather than just blindly taking a framework

Starting point is 00:55:52 and saying, hey, this is a framework in the industry, this should be it, right? It may not actually work. I mean, it's not also the same for all kinds of developers. Is it safe to say that at Uber, you know, developer productivity was measured in the sense that there were these, you know, charts metrics. It showed the diffs per engineer on your team. You could see it and how it changed over time, the time to code review, focus time, et cetera. But is it safe to say that these were measured with the intention of finding bottlenecks and

Starting point is 00:56:24 what to improve? and they were not measured with the intention of using them for performance reviews, promotion, or figuring out who, you know, who's the low performer. Like, is that a fair assessment? Yeah, that's the usual controversy. People tend to misuse it. The guidance is always, like, you know, look at them at a high level. And that is where a lot of these were aggregated, right?

Starting point is 00:56:47 And this, you can find ways to, like, dissententwise people from bringing us up, right? Because you can easily catch bias. We had, for example, folks on the performance company that would be there just to catch bias that, hey, are you just looking at divs or you're not looking at the quality? And sometimes, like, people may not have as much output because they have other contributions they're making. They might be writing docs or they might be like influencing strategy. They might be working on like unblocking teams or like figuring out product and business goals and distilling them to like requirements and engineering. So it was definitely like, I think initially people thought of it that way, but that was never the intention on that was never the case. in my experience, at least on all the companies I've been,

Starting point is 00:57:25 we never looked at it as like anything more than, hey, it's just good to know that, hey, if you haven't done any PRs, that's a red flag potentially. We had zero PRs in the entire year. Explain yourself, like, you will not like, we'll clarify with the manager, hey, what's going on? Are they, what are they doing? But we never say that as like a blanket, okay, that means nothing.

Starting point is 00:57:44 So we always want to use this as a way to say, if there are red flags, we catch it, that there might be something else going on, but it was mostly used for unblocking, looking teams from shipping past. Yeah. So it sounds like it's just like if you have data and you can gather it and that data can actually help you build the right things.

Starting point is 00:58:02 I mean, it will be silly not to do it. And as you said, start with what the pain points are, what people complain about. I mean, we always complained about, well,

Starting point is 00:58:10 that was there. It was always about code review and how long it took code review to take. And I don't think we ever measured it. I think I left by the time this was there. But if we would have measured it, it would have just really shown that our team was, we probably have like, like our median was probably more than a day because we were waiting a team on the other side of the Atlantic

Starting point is 00:58:29 and then we figured out how to cut those ties. Code is rarely a bottleneck, especially now with like AI tools, you could just generate a bunch of code and throw up here. That's rarely the bottleneck. You're bringing alignment or figuring out product requirements or making sure you're not breaking the experience takes the hardest. So just measuring output in terms of PRs is probably going to be harder as a justifiable thing after this new age of AI tools because people are just shipping so much more. Like we use a lot of AI tools at work, including our own, and we ship a ton more than we used to back it over.

Starting point is 00:59:02 So you're now in the business of AI tools to help engineering productivity and engineers work better. How do you see these tools changing how we do software development? I'm talking, obviously, about the usual suspects, the coding, auto-complice, the co-pilots, also the agents and also everything else that's coming. Yeah, absolutely. So we're very thick in the space. Obviously, the auto-comput is a huge boost because it kind of, in many cases,

Starting point is 00:59:32 helps you sort of like less, type less, but think more. So I think you probably use a bunch of these tools. I do too. It removes a lot of the grant work, but it won't remove your thought process because the taste in what you want to do is still there, just not the how sometimes. It's like, okay, this makes sense, right? but I feel like a lot of the tools are either focused too much on your completion or they're only on the code review. I think back from when we learned things at Uber, we had a lot of inefficiencies in things we wished we could solve with some sort of intelligent layer.

Starting point is 01:00:06 Like I was, for example, going unblocking things. Like, hey, I knew exactly what would need to happen to like unblock a build. So having some sort of like an agentic understanding of things that are happening in the IDE, why did you type this character? that affects like what's happening in code review and that affects what happens on deploy that affects what happens on like your incident for example there is nothing that's cross-cutting so that's kind of what we're trying to solve that guitar is like have this whole STLC under the purview and not just focus on the point solution of I'm just an ID code complete or I'm just going to be on your CI reviewing stuff a lot of those solutions currently are I believe good starting points and my experience like how I've seen technology shifts happen the much nicer or more important unlocks happen a little bit later into the cycle. Since it's very early in the cycle with AI, at this point, people are just experimenting stuff. But I believe that there's a huge value add for agents, just because, as you mentioned, there are some things that are so obvious when you think about it that, hey, we wish we wrote

Starting point is 01:01:08 like a script to automate this, right, for support or for documentation. Those can be potentially driven by agents, but then still have the human in the loop when I think, need to be approved. That's kind of how we think about it. And like as you're using these tools, like how does it change your development? Like are like obviously we're writing less code because it's now like there's just more suggestions or there's also agents. But you know, like what does it allow you to do? Are you, you know, just moving faster or are you actually having deeper thoughts? I'm just interested in like what it means for for you, the current version of these agents that

Starting point is 01:01:47 And we know where, you know, it's a start of a cycle. We never know how long it'll go, but clearly it's, it's only been like no more than two years. So. Yeah. I think it allows you to explore more paths that you previously didn't have time to explore because you could experiment much faster and discard ideas much quicker. Because when you're prototyping, you don't need the full rigor. You can put them behind a flag and then say, hey, this looks good. I think people are calling it vibe coding these days.

Starting point is 01:02:12 Like, hey, this has the right vibe. We can experiment with this. Of course, it's harder when you have a lot of other constraints. strains. Like, you know, you have to make sure the software is, you know, secure, compliant. A lot of people are not thinking about that when prototyping. So it lets you prototype faster, which is great. And I think what's also happening is I feel now there's all of talk about, hey, junior engineers are going to get replaced. I think it's the opposite. My take is that junior engineers are going to thrive because they are coming with new knowledge, new ways of

Starting point is 01:02:44 working with these tools and are going to be much more effective because they don't have bias of working a particular way. They're able to like achieve things that, for example, it would take a long time to like even wrap up. It doesn't mean that you shouldn't know the fundamentals. In fact, like, if you think about 20 years ago, they had like a DBA and a HTML and like a Java application engineer. There was no data engineer.

Starting point is 01:03:11 And a webmaster sometimes. And there was no like SRI. No. There's no like front-end person, there's no mobile person, there's no like data engineer, I'm an engineer. There's so many more archetypes now or like, you know, sub-categories. I think with AI, the general engineer is going to see a rise again, which means I'm able to jump into, for example, front-end

Starting point is 01:03:34 without not doing much front-end. Because I can understand that, okay, this is what I expect to happen. and I would like to have this outcome with my software, which means I can then focus on the other parts of the problem. You mentioned vibe coding, and this is a brand new term. What do you think it stands for? Is it just prototyping? Is it just like throwing out ideas, or is it a bit more than that?

Starting point is 01:04:01 I think vibe coding is just you trying to figure out how the system should behave when you're prototyping. The good thing about vibe coding is like, you're able to like basically iterate much faster with this agentic loops that now you're just focusing on how would I actually achieve my outcome rather than the exact way to do it because you can always change those details later.

Starting point is 01:04:24 If you have the right abstractions, you could swap layers out. So I think that's changing the way people think about software because if you're someone who has taste and knows like this is the outcome I want, to see on the ux you may not even need to know a framework you should experiment with it till it feels right and of course like you need to massage it to make sure it doesn't make it hard a day wall later on i think a lot of the software today gets you to the initial it looks great but then

Starting point is 01:04:51 how do you maintain it is basically sort of like left for later time and i think a lot of people who are still coding with AI are early in the prototype phase i don't think at enterprise scale that would work because i can imagine it will probably mess with your entire abstraction layer you already put in So you have to really constrain it to work well. So vibe coding is great for prototyping at this point. But that's also why I think you can move a lot faster with less constraints because AI has a lot more freedom to explore. One of the problems that we're seeing,

Starting point is 01:05:22 and if you're using it, you'll also see as like there's this, Adiosmany called the 70% problem that, you know, like vibe coding and these things that they get you started. But then it can get just really confused and stuck. And people see this. Like people who are not really technical, they kind of get stuck. And that's where experience engineers really shine.

Starting point is 01:05:41 You said previously that you think junior engineers will do great there. How do you think less experienced engineers can deal with this? That's one. And then the other is like, is this where we might see, you know, senior engineers actually, you know, thrive and spend more of their time? Yeah, I think the main differentiator there is like when things go wrong, understanding why they went wrong, which means strong CS fundamentals, strong sort of system knowledge of,

Starting point is 01:06:07 hey, how does the entire thing work end to end? So senior engineers usually tend to have a better, bigger picture. So if they adopt these new tools, actually, they can get a lot more productive. I can tell from my own experience that adopting these tools, I'm able to get a lot more stuff done because I know, for example, this is how I expect my system to work,

Starting point is 01:06:26 which means I can delegate to, I used to give it to a more junior person to, like, you know, understand the system. Now I can just give it to AI. I can check the work and I can come at it. But if you're very stuck in the base and not want to use these tools, you can still do it. It's going to be a lot slower and that's totally fine too. It's just a matter of how quickly you want to ship.

Starting point is 01:06:47 And for things that are not critical, you may want to still experiment to see what's out there. Because the things, the way they are going, the tools are getting better every day, which means over time, a lot of the mistakes that are happening today will start to disappear. They'll be like other agents that will check the work of the first agent. so they're not going to be as many mistakes. You still want to have a human in the loop before you check in. Otherwise, you will not have full guarantee that whatever your shipping with the user is going to be good. But you definitely want synergies to focus on the high-level system-level knowledge,

Starting point is 01:07:20 and the agentic loops can be more nuanced and more focused on smaller parts of the puzzle. After nine years of Uber working on developer experience, you've now co-founded a company called Guitar. How are you thinking about continuing this developer experience? And also, what's your approach to AI? What are your bets on where the space is going to go? Yeah, that's a good question. So when we co-founded Guitar with my co-founder, Sully, and Raj,

Starting point is 01:07:49 we basically saw that a lot of the inefficiencies we had, tools were not available back in the day where Uber. We saw because we had a golden path for developers. Now with agents, I think, we talked about DevOps previously, right? Like, hey, there's a golden path for a developer to get stuff done. So what if an agent had a golden path

Starting point is 01:08:10 to get things done for a particular enterprise? So we're thinking of guitar, like the agents we're building at guitar. We just are going to launch a new product pretty soon. And you can check us out at GitHub.a. The agentic AI, we build, is actually inside the IDE on the code review system, inside your deployment, it's across the board.

Starting point is 01:08:32 So there's no reason why you can't auto-complete like a deployment shop and not just be stuck inside IDE. And having an understanding of things like, hey, how are things going in production? Which part of the code is slow? These dots are usually what engineers connect, right? The context is what flows through your brain. And if there's a standardized path for an agent

Starting point is 01:08:52 to evolve software and maintain it, it becomes much easier to focus on the business value. And one of the things that people dread is, hey, maintaining stuff like updating libraries or fixing tech, debt, or adding coverage, we feel like a lot of that grant work can be done very efficiently with agents. So that's where our focus is that we don't want to be just like a code complete or within your point solution. We want to have it across the entire stack of software development.

Starting point is 01:09:18 And we have that experience working on this for many years, talking to peers in the industry. There's a lot of potential for disrupting the space. and the stuff we wish we had, hey, I wish someone was on call during this time to fix a small thing, we can make that happen now because we can give the agent powers to unblock people when you might be asleep,

Starting point is 01:09:38 which means you'll be on call less often. So we're thinking of it from a holistic developer experience standpoint and producing more code is just one part of it. It's about doing it reliably, maintaining it and making it understandable so newer developers can understand what's going on. Because if we just let AI run amok on your code base, and not have an idea what's going on, then that's a recipe for disaster.

Starting point is 01:09:59 So we feel like there's a lot of opportunity for making the entire experience end-to-end much more standardized so that the golden path for developers and agents is like really well later. Now, like, you know, when I'm listening to this, like one part of me is like, you know, if I'm a business owner, I'm like, oh, that sounds great because over time, I might just have less developers because I'll have a really capable agent. And as a developer, like, obviously one thing that's, into your head is like, wow, you know, like we might have a smaller team. Hopefully I'll still be there, but I might have fewer colleagues or, you know, I might

Starting point is 01:10:34 be a fortunate I want to get the cut. Like, let's just run with this, that this future is coming and we will have a lot more capable agents. In this case, what are, what do you think developers will do? And what are, what are skill sets that are worth investing in so that you will still be an efficient, a great developer in, in the future, you know, seen as a highly efficient developer 10 years from now, you know, your managers will be like, oh, my God.

Starting point is 01:11:00 Like, we cannot do work without this person. We need this person here. And we should probably give them the raise because they're underpaid. Yeah, I think, as you said, there might be less people in a business, but there'll be more businesses, right? That's typically how a lot of these technologies go. If building software is easy, then the value-onlock will be, like, actually the taste and the experience for the end user.

Starting point is 01:11:23 Like, what do we actually provide? If every website looks the same because you product, typed it in like a website builder, then that's not going to be your differentiated. Like even I think during the mobile era, a lot of apps came up to do things like to-do lists or, you know, like reminders or like, you know, just scrolling social media. But then eventually some clients are great UX. They want to not because they're just a joy to use. So knowing what matters to your end customer is going to be differentiated for engineers to like

Starting point is 01:11:52 hone in on. So that means understanding the business really well. right and then the other thing is like how do you build something for scale essentially as you as you like grow the business you don't need as an engineers probably and that's okay and that might be fine because if you're at a smaller company with less a number of engineers if you have equity for example you might have more of the pie which means that's a good way to kind of incentivize people to experiment with more ideas and usually competition is good in that way right so you may not be bound by those restrictions that you have to always be like a big company, I think we'll start

Starting point is 01:12:27 seeing smaller, more efficient teams who are going to be more productive and will deeply understand the product. And that's going to be, I think, a good way to compete with other products in that space. Like the taste, the art that goes into like crafting grade Ux or great end user experiences is going to matter more than how you wrote the code. Now, if I'm thinking, because I think it's a good thing back that, for example, even 15 years ago, 10, 15 years ago, we already had this, right? Like WhatsApp, they were barely at 50 people when Facebook acquired them for $19 billion, which is crazy. This was no way I know nothing, but clearly they were a very efficient team. Instagram, I think 12 or 13 people. Also, they built an app that was already 30 million users

Starting point is 01:13:11 with that small. Even today, that would be a big deal. So it's like to get that skill set of you being able to, as an engineer, being able to do a lot with these tools with the small team. You mentioned taste. You mentioned art. What other things that are worth focusing on that? It should probably help, you know, like do a lot more with less. Yeah, I think having an understanding of the system layer is good. So I think the scalable part we didn't cover. So how does your app or software scale to more users? What parts of the system are less efficient, more efficient because all code might do the same things, but less or more efficiently. And how do you maintain it?

Starting point is 01:13:51 So that's the challenge a lot of people learn into. As software systems grow, is it easy to upgrade because you have less external dependencies? Is it harder because you have so many versions? The software might work the same today, but okay, three years down the line, if you can't upgrade and move faster, your competition is going to basically move faster than you because it started on a clean slate. We see that all the time. I think there's a lot of legacy technology or database systems.

Starting point is 01:14:15 people are trying to move to modern systems. It's a big cost, right? And that's very expensive, a lot of engineering time wasted. So if building for basically maintainability is going to be important as well. Building for like either AI agent maintainability or human understanding so that you don't have to like, you know, look at outdated docs or do an expensive migration. I mean, for example, if I could wave a magic wand and move all my micropos to a mono depot and everything just work, that would be awesome.

Starting point is 01:14:43 I don't think any system can do that today yet. Maybe it would in the future. But if that's a reality because you're designed your software to move that way, that's great. So having that knowledge is not because AI agents are fundamentally based on expert knowledge. So if your knowledge is like understanding of your entire business, because they can't really tell you how should you write your code for an user. What matters to one business may not matter to the other. some parts of the software might work well for one segment but not work so well for the other segment. So really understanding those business values will differentiate just AI-driven code versus how do you actually use that code to make your business like actually thrive?

Starting point is 01:15:25 So again, it might not happen, right? Or it might happen slower than we expect, et cetera. But what I'm hearing is like even if it does happen, having expert knowledge helps, having taste and you kind of get taste from seeing things. failing things. So is it safe to say that it's probably a pretty good strategy to try your best to go and work at companies that are either building stuff at scale or they just have kind of taste. You know, there are startups that are known to do these things so that you can build up one of the muscles of like scale, taste, if you can switch between them because in the future, I assume that the some of the kind of repeated, like the successful people who you have a pattern will

Starting point is 01:16:08 we'll probably have the pattern of like, oh, they worked at companies that kind of did pretty groundbreaking stuff at their time. And now that we have these tools, well, yeah, they're doing even more groundbreaking stuff. And they probably build a bunch of their experience. Yeah, the experience matters a lot. Like, knowing a lot of the stack helps. Like, when I was at Uber, I didn't do much work on front end, but now I'm at guitar. I'm doing like front end.

Starting point is 01:16:29 I'm doing infra. I'm doing a bunch of stuff. And I'm using agents to help me. Like, we use our own agent to like rewrite a bunch of our stuff. And I was like, okay, now I understand why this work. that way. I'm a general engineer now. I have strong fundamentals. Let's say I can see the system. I know this is how it's supposed to work. Now I uplevel myself without spending. It's kind of like that matrix movie where Neo gets that, you know, upgrade to like no kung fu. But it doesn't mean

Starting point is 01:16:55 that he can beat the master yet, you know, like he still has to train up and understand it. Yeah, I like that. So with that, but let's go with some rapid questions. I'll just shoot out some of some things and then you go what is your favorite programming language and why? Rust. Rust. Since when? Since guitar actually I was like initially like oh my God's Rust has a bad reputation for being hard

Starting point is 01:17:19 but it makes things so safe. It's hard to have I think like outages if you design it right. I mean you can still have them but like you can really understand the type safety is there to help you out and you start treating it like your friend then it unlocks a lot of stuff. So I'm a big of... So someone who's...

Starting point is 01:17:37 You've spent a lot in Java and Go, right? Like, you've probably spent like half your career in those... Yeah, it's funny because when I graduated college, I told my friends, I would never work in a company that does Java, and then I did Java for nine years. So, you know, never say never. What's an AI power dev tool that you use and like? We use our own agentic thing quite a bit.

Starting point is 01:18:02 Oh, nice. Jimmy, which is a new agentic. Jimmy, yeah, because guitar, you know. We use the cursor as well for like, you know, the auto-complete, but we feel like Jimmy gives a lot more end-to-end stuff for us. So we've been using it pretty heavily, but you can also try it out hopefully soon. And then I use a bunch of other smaller, like, clawed directly, obviously, for question and answer stuff.

Starting point is 01:18:26 How do you play with like some of the newer deep seek research models and stuff like that? Because I've been hearing good things about them for more deeper work, but I haven't had a chance here. And what's a book that you'd recommend? My favorite book, I don't read too much, but I read this book early in my career called Head First Design Patterns, which is a little old at this point, but the patterns are really good because it really makes you understand how you can layer things. A lot of software today is just abstraction or abstraction or abstraction if you think about them. So it really helps you understand what pattern makes sense when and when to use it, how to transition. between them. I would highly recommend that. If you're trying to go into more systems that approach and want to design software for maintainability, that's very crucial. Making it plugable,

Starting point is 01:19:15 composable and replaceable under the hood. A lot of the migrations we did at Uber was zero downtime with like no code fees. If we did the Java migration, there was no code fees. So that was a drowning achievement. Yeah. Yeah. And it banks these days still, you know, like they will shut down their thing for some migration for like a whole day on the weekend. in 2025. It's a big different, as you say. Well, Gautom, this was, I really enjoyed this conversation.

Starting point is 01:19:42 It was good to reconnect and just see how developer, productivity, empowering developers, you're still like just running with it. Yeah, I think this is the area. I love working in this area. I can't see myself doing anything else. I've been doing it for 10 years. I hope to do it for 10 more.

Starting point is 01:20:00 We'll see. But I hope I can make more developers productive. I hope you also found this episode about Uber's internal tools, develop productivity and AI's impact on software engineering as interesting as I did. You can find Gotham on social media as linked in the show notes below and check out his company, Guitar at Guitar.A.I. For more deep dives on Uber's engineering culture, check out the pragmatic engineering articles linked in the show notes below.

Starting point is 01:20:23 If you've enjoyed this podcast, please do subscribe on your favorite podcast platform and on YouTube. This helps more people discover the podcast and a special thank you if you leave a rating. Thanks and see you in the next. one.

The Pragmatic Engineer - Developer Experience at Uber with Gautam Korlam

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.