Software at Scale - Software at Scale 4 - Akshay Shekhar: Software Engineer, AWS

Starting point is 00:00:00 Welcome to Software at Scale, a podcast where we discuss the technical stories behind large software applications. I'm your host, Akshaya, and thank you for listening. Hey Akshaya, thanks for being a guest on the Software at Scale podcast. Could you give me a quick introduction on yourself? What's your story? What got you into tech? So I'm from a very small town in India. And so I got into computers mostly because I had this thing of figuring out how stuff works, which probably came in because of my dad, because he was very handy at home. So anything would be broken. He would whip out the screwdriver, open it, see what is broken, fix it.

Starting point is 00:00:44 But as a kid, i wasn't allowed to do that right and i think he discovered computers because he's a lawyer and one of his clients gave him a floppy disk and he was like what is a floppy disk so he learned about computers and he's into technology so he bought a computer and like when i was six we invested all almost all our money into buying an iMac like the old ones with the big backs and I just remember seeing it and being amazed about like how much complexity and depth there was I couldn't like comprehend how this thing was working but I have like a specific story where my parents were trying to figure out how to do colored printer prints and they were trying out

Starting point is 00:01:25 different printers and i could see that oh this printer looks like our printer and it has a colored page it's like can you click on this and they did and it started working and that was like first positive feedback that these computer things are interesting and you know how to do it and then over the course of years um like a couple of years later, we got a computer and I was like, how does this thing work? Because it's a thing that you can't really break unless you do something bad. So from there, I got into computers and a family friend gifted me a book on C++, which I read like four times in one year. But the problem was it was like Visual Studio, Visual C++ 6 or something. In my small town,

Starting point is 00:02:07 you couldn't find a place to buy Visual Studio and the internet wasn't that good to get it. So for a year, I just read the book. I didn't know how to actually write any code for it. From there, we got internet and from the internet, I started storing web pages and from web pages,

Starting point is 00:02:23 learning how to write code without any guidance. Like I would read the JavaScript and try to match it to C++. And like, OK, this is how it works. Yeah, from there, I found Linux and from Linux open source development. And then by the time I reached college, I was working on elementary OS, shipping the OS to like millions of downloads at least. You can't map it to people because we don't record that data. And from college, just like giving interviews and landed in the company I work for and from there different countries and now Canada. So you had a C++ book for a year, but you couldn't compile any code. Maybe that's what kept you

Starting point is 00:03:04 interested in programming because if you had to wait for C++ build times, you might have gotten bored of it. And I'm guessing for webpages, you would right-click inspect element and try to do something? No, no. So internet was too expensive for us to have it available all the time. So I would have a fixed time to look at the web page,

Starting point is 00:03:27 to look at the internet. So I was like, why waste this? I would open a page, save it, and then read what it is. Click on a link, save it, and then read what it is. And at least back then on Firefox, it would create an HTML page and a folder with all the assets. And then after I didn't have the connection, I would go through each and every asset. I and like, okay, what does this mean?

Starting point is 00:03:48 Minify JavaScript, jQuery was the bane of my existence. So I learned how to unminify by just doing find and replace, like figuring out tokens, which need to be, which need to be followed by a new line. Notepad++ was really helpful helpful there and internet speeds must have not been good right which year was this early 2000s yeah early 2000s so i think my parents bought a computer that i could use in 2007 so that i was around 12 or 13 oh no i was 11 so 2006 yeah and i'm aware that internet speeds in india progressively got better and better then but it still seems like a big change going from editing these files to being an open source developer on elementary OS and elementary is not a distro? It's basically a skin on Ubuntu. What exactly is it?

Starting point is 00:04:48 So everyone says that. So what happened was that I got a hand-me-down iPhone from my mom, which was like an iPhone 3G. And by the time I had discovered Linux, because I went down the path of what is hacking. So from hacking, you see people using Linux as Linux is the next thing. So I was using Ubuntu and I had this iPhone and I tried connecting it and couldn't really use it there. So I start Googling and I find this application called beat box. Was it, I think, and I installed it and start using it. And it breaks when i use the iphone

Starting point is 00:05:25 but i can still copy over the music so i'm like why is this not working so i go to the website and see a link to irc and join the rc channel and say this is broken and i remember this um a friend of mine we like worked together for four years after that cody gar, he was like, you know what? It is broken. Can you fix it? And I thought to myself, I mean, maybe. Let's see. So from there, I installed elementary OS, got the sources. Linux just made it so simple. You do apt

Starting point is 00:05:55 source something, and you get the source of the package. And I made a fix, and they learned what version control was and how code reviews work. Got some feedback. And I was like, oh, this is easy. Oh, there's another bug I need to fix.

Starting point is 00:06:14 And from there, I went on to maintain the terminal for a couple of years. Apart from other applications that we had. So I used elementary OS maybe 10 or 15 years ago. So thank you for your work. And I like using Linux as well and I didn't like the look of Ubuntu at that time with that really harsh purple color. Yeah I don't know why they selected that brown initially. So Ubuntu like actually mailed me a CD of Ubuntu OS from the UK because downloading it was so painful. This was back in the day when Canonical did that.

Starting point is 00:06:48 I think that really launched me into Linux because imagine being a 12-year-old in the middle of nowhere in India and you get this CD for free from the UK, which you install and you have working Linux. Because if I had to download it, it would either be too expensive or it would have taken a month or like two months. So they just sent me, I think it was 710 that they sent me. I have the CD somewhere lying around. Do you remember what the code name of that Ubuntu was?

Starting point is 00:07:20 Yeah. I think this was before Lucid Links, but there's so many of those. Natty novel? I think it's too many years ago, I think this was before Lucid Links, but there's so many of those. Natty Nava, I think? It's too many years ago, I guess. Yeah, and you mentioned that elementary OS is a skin over Ubuntu. It's actually like we started with Ubuntu and started replacing the applications and slowly went till the installer, replaced underlying services,

Starting point is 00:07:42 came up with new things. So it is a complete operating system, right? It just has better applications. We use components. It's like you have Linux kernel and GNU tool chain on top of it, right? So elementary is the UI layer plus some system services. And a replacement for the default calendar and some other apps, right? Yep, yep, yep.

Starting point is 00:08:04 Do you know what the status of the project is right now? I haven't followed up in a while. Yeah, I drop in from time to time and Daniel Foray and Cassidy James are still there, who have been taking this project further. And so I remember in college, I didn't have a lot of money. So I would do bugs for elementary OS and we would have bug bounties. And now it's completely sustainable. So we have two or three permanent employees who work for elementary OS and have sustainable development over there.

Starting point is 00:08:36 But I'm like tangentially aware of it because I follow them on Twitter and dropping into Slack from time to time just to see what's up. I didn't understand the part about sustainability. How does elementary manage to stay sustainable? Are there a lot of downsides? I remember this very controversial thing happened a while back where there was a blog post that elementary was published.

Starting point is 00:09:02 And to be honest, all of us, like the core community, like 20 people reviewed the blog post that elementary was published and to be honest uh all of us like the core community like 20 people reviewed the blog post and no one noticed but what we basically said was that there are millions of people downloading it we need less than one percent of the people to donate one dollar and if you do that we will have enough money to like fund full-time engineering on elementary os because at the time we were like 16, 17 year olds, like a group of us finding free time to do it. And if we had just some funding, we would be helpful.

Starting point is 00:09:34 So there was this big controversy where people complained that, why are you asking for open source when other people also contribute? Our Linux was a lot of fun, that conversation, that whole thread was amazing. But yeah, so what happens is when you download elementary OS, you see like a box saying that, do you want to download money? It can be from zero to any number.

Starting point is 00:09:58 And now elementary OS also has an app center where you have applications and you also are like, it's like the Humber bundle. You can pay from zero to N number of dollars as you please. So adding that slider was controversial. There was some wording in the blog post that a lot of people did not agree with. I don't remember the exact wording. This was again, like 2011 or 12. So two things stick out to me there.

Starting point is 00:10:26 Firstly, it seems like there was a core contributor who was super receptive and helpful when you tried to send patches and that made a bit of a difference. I think it was more than that. I think my whole career in software development

Starting point is 00:10:43 has been like trying to find difficult problems. And when those problems align with something I want to do anyway, is when that I really get excited about it, right? So elementary OS was what I had on my laptop and I would find bugs and pass them for myself. And we had a good community and I'm still in touch with like four or five people.

Starting point is 00:11:07 I met them after years when I finally moved to the US before I moved to Canada. I finally like flew over to California and met a few people. It was amazing. Like we worked together for four years, but yes, them like core contributors

Starting point is 00:11:21 being nice and welcoming on IRC and them like tolerating my terrible code was useful. Open source was all about remote work before remote work became a buzzword. It's also interesting that there might be a blog post that was released what eight or nine years ago and somebody might just leave a comment and forget about it but for the people who wrote that post things could stick on for a really long time yeah i i think a lot so a lot of the vocal voices in the linux community at that time at least i'm not very involved now so i don't know but at that time, at least. I'm not very involved now, so I don't know. But at that time, we were very cautious and very suspicious of everyone. So I remember we had this one library that provided some core features. So if you are an application developer and you want

Starting point is 00:12:17 to write an application, you don't want to reinvent the wheel, right? So we built a library on top of GTK called Cranite, or Cranit as Americans call it. And on that library, we had new widgets that you could use in your application. And I remember when we announced a new version, again, our Linux went bonkers, and they were telling us that this is vendor lock-in. Now anyone who develops an application for elementary OS

Starting point is 00:12:42 won't be able to ship it for other platforms. And we were like, yes, you can. It's open source code. You can build it over there and ship it. It's not like we are trying to restrict development. But people were very suspicious and cautious. And that was like 10 years ago. Nothing changed.

Starting point is 00:13:01 Nothing impacted. nothing changed nothing impacted but whenever I go to like a thread about discussing Linux and I still see the same conversation patterns just leaves a bad taste in my mouth I also remember some of the system

Starting point is 00:13:12 D stuff I don't have a great opinion on which side is right or not but the amount of vitriol at that time

Starting point is 00:13:19 was pretty high yeah I think I learned one thing from that whole debacle and like all the other debacles was like that being a purist is of no use. You do whatever works for you.

Starting point is 00:13:32 So system that debacle was about going about the Unix way and having one application with one purpose. But if you look at Linux and other Unix utilities, a lot of them do have a lot of things built in. They all don't follow that

Starting point is 00:13:47 philosophy. So I think you have to see the bigger picture and say, does this component actually add value? Or am I not adapting it because it doesn't fit my philosophy? All things being said, it doesn't seem like it's super easy to be an

Starting point is 00:14:03 open source developer because all of your work which is in the open anybody can criticize it with pretty much like no consequences but at the same time the amount of impact you can have if you work on a high visibility like high traction project can be pretty massive yeah Yeah, I remember getting this one email from this one school, like primary school in China where the teacher sent us an email with photos of the students

Starting point is 00:14:33 saying that, thank you for providing elementary OS because we were writing these applications in Vala, which is a language that compiles to C and were very efficient. And he was like, we had these old computers and we tried installing Fedora and Ubuntu on it, but they use a lot of memory and we couldn't get it to work. So we installed elementary OS

Starting point is 00:14:51 and now these kids can use computers. Just having that impact was like, I remember I was like glowing for weeks. I'm so happy with what we had achieved. I've met like a few people like you who have actually used elementary OS and they talk about it and I'm just happy. achieved. I've met a few people like you who have actually used elementary OS and they talk about it and I'm just happy

Starting point is 00:15:08 even when I meet people who have used my software from my employer, for example, it's good, it's fine, but it's my job to build those things. For elementary OS, I still feel very satisfied and still want to give back to the community.

Starting point is 00:15:24 I also have a lot of respect for people who do contribute to open source with a full-time job and like other responsibilities that they might have. It's because I've been unable to do it in the past five years since I like started working full-time and people who do that, like add immense value. So that provides an excellent segue to the next question I had. So you were doing this open source development, I think, pre-college.

Starting point is 00:15:50 So how was your college experience, Larry? Indian college system is complicated, but I got into this one college where in the first year, what I realized was that looking at the normal path, I realized that I wouldn't be able to get a good job if I did what everyone else was doing. So I really like sat down and thought, what should my path be? And what I realized was that I was missing out on a lot of things, because I wanted to give the SAT exam and come here to the US and do my bachelor's, but that didn't pan out. It was too resource intensive for me to go through with it and like i didn't put in the time and effort so in college what i started doing was i started

Starting point is 00:16:30 going to conferences um started uh so there's this whole there used to be at least a market where students from the u.s and the universities uh would give out their homework as contract projects i started off doing homework for the fourth year students in my college, but that market was pretty small. So I just went online and through elementary OS, people I knew there, I got references and people started like sending their projects and assignments over to me.

Starting point is 00:16:59 So I would like study the course material and build their homework. And that was another path of learning. And in one of these conferences, I attended this workshop for Raspberry Pis by this company called Inventrom in India. It's a startup. And the founder was there. The four engineers who comprised the company were there. And I just went up to them and said,

Starting point is 00:17:21 do you guys have internships? Because I would love to work on hardware. Because moving from software to hardware plus software was something I wanted to try out. And they're like, okay, fine. Send us your resume. And I don't know what clicked, but they sent me an email back saying that, oh, yeah, sure. Come on over. So one summer, I convinced my college as a first-year like, let me attend an internship rather than like get study courses.

Starting point is 00:17:48 That was another complete side quest that I had to conquer. But I went there, learned a bunch from those folks. I'm still in touch. The company, the startup is still around. It's funded. They're in India. Whenever they used to come to the U S while I was in the U S we would meet up and like attend makers fair and stuff but yeah that gave me a lot of like clarity about how the ecosystem works what a job feels like what does the startup feel like because uh we would like the first time i went over there were no boundaries we would we worked for two months straight right i landed there we would

Starting point is 00:18:25 all reach the office in the morning and work till 8 p.m 9 p.m go out have dinner together go home rinse and repeat didn't care if it was a weekday or a weekend this was again goa which is a beautiful beautiful part of the country so i saw the whole city with them as well because we would like leave early one day and just explore something everyone in the startup was like brilliant like even still like uh there was this one guy who for his engineering project uh he built this program in matlab that recognized sound and did stuff with it think of it as like our echo or Google Home, right? But he did it in MATLAB and that was all good. But the response time was too much, right? Because MATLAB was slow. So

Starting point is 00:19:12 he spent one year rewriting the whole program in C. And my mind was blown away. So those, just talking to these folks, right? Because they thought very differently. That just expanded my brain i think so that was a fun initial internship and it looks like you had a bunch of other internships as well right yeah so after that i was like let me try and work for a large company so with the help of few connections and few very helpful people i got into this uh company called Continental Corporations, if you've heard of it. It's a German company that does tires. They are what they're famous for, like car tires, but they also do engines and like car accessories and car

Starting point is 00:19:56 systems. So I got an internship over there in the mechatronics team. And my project was to build an application that takes data from embedded devices over USB and like plots, graphs and stores that data for analysis. So that was like a completely different experience. And the team was very nice. Like the project was for two months and I was done in two weeks. So they didn't expect me to finish it but at that point i'd been like software for but long enough that just writing that in c-sharp was

Starting point is 00:20:30 like a no-brainer but then got to explore like a motorcycle engine diagnostic software as well so they let me go to the factory and hook up my new software to a motorcycle run the diagnostic and see the report and say okay okay, this is working now. But what I realized was at the end of the thing, one of the engineers asked me, so many brilliant people over there, one of them asked me that, do you want to come back and work with us? Like, sure, why not? He's like, but yeah, you won't be in this team.

Starting point is 00:21:02 You will be in that team because that team does software. We do hardware. So I was like, but yeah, you won't be in this team. You will be in that team because that team does software. We do hardware. So I was like, huh. So I went over and saw that they were just doing standard scripts with the convoluted version control system. And I was like, I just, I still need that feel of like the startup freedom where you're allowed to do whatever just to get things working

Starting point is 00:21:21 and then improve it. It seems great, but it seems like you pivoted from these smaller companies to amazon which is one of the largest tech companies in the world so what was the story amazon was never on my uh radar i never so whenever i thought of like large tech companies i thought of google or i don't think facebook was a large company at that point but microsoft for example like never never thought of Amazon as a tech company, mostly because I wasn't exposed to it until I had read this article about OBDOS, which was like the C++ binary

Starting point is 00:21:55 that Amazon initially in the early years used for the website and just read about the challenges about linking this multi-gigabyte binary and that they ran out of memory. And I was like, oh, Amazon does software engineering too. And I think a lot of professors in my university were very helpful. Like, throughout my life, I've had people who were like,

Starting point is 00:22:16 no, you're wrong. And I was, like, lucky enough to listen to them. They're like, okay, Amazon is a very great company. You should go and work for it. And I had this competing job offer from a company in Japan, which also did cloud hardware software. And I didn't know which one to choose, right? Because I didn't know which one was better.

Starting point is 00:22:36 So again, reached out to folks who I knew and I was like, where should I go? So I ended up doing an internship at Amazon. And after the internship, even though I got more offers even from the inventory room, I was like there is something magical here and I want to stay longer to see why and how. So one blog post on how large their C++ binary is inspiring enough to want to work there. I think the advantage of tech blogs

Starting point is 00:23:06 is vastly underrated. I have a co-worker who told me that he was looking for companies where they write services in Go and I think Dropbox just had a blog put in by someone who knows maybe five years ago where we mentioned our stack. So when this co school worker was researching

Starting point is 00:23:27 they were like oh dropbox uses go it seems like a good place to work so the the impact of these blog posts is pretty high yeah go is you brought up go and go it's a language that i completely adore right so i got got the Japanese companies. Basically, I got in because I went to GopherCon India and they had an engineer there who referred me. But yeah, I think blog posts and open source software adds tremendous value. I think what I learned, most of what I learned

Starting point is 00:23:57 was mostly by reading other people's code. I still do it at this time. And blog posts and open source software just add tremendous value there. So it also seems like you were interested in problems at scale, right? Your C++ code base being so large that you can't compile it in memory is not the kind of problem you deal with in most companies. So what was your first job at Amazon?

Starting point is 00:24:24 What team were you on? So when I started off at Amazon, I was in this team in Amazon Marketplace, which did data extraction. So when you join in as a seller, after a certain threshold, Amazon needs to verify that you are who you say you are. And at that point, it was done through manual verification.

Starting point is 00:24:48 So you would upload your documents and a human being would look at those documents. And it was a long process because it's verifying that the document has the correct spelling. And then they would say, okay, your spelling and your name is incorrect. That would go back. Then the seller would see that email, update it, and come back, and it would just be a long process. And we wanted to improve that experience. So my team, what it did was it did data extraction automatically

Starting point is 00:25:14 and comparison. So my internship project was actually going to be like the comparison part, comparing addresses. Comparing addresses, comparing addresses, right? Comparing addresses, I learned was a tremendous problem because addresses are not a uniform field, right? We had the structured document and this unstructured document. So my team did data extraction for images

Starting point is 00:25:38 and comparison logic. And that is where I got started. The problems were amazing, and the people were so brilliant that I was completely blown away. I had this one teammate who would just sit next to me, and I would be stuck on something, and I would talk to her and tell her my problem

Starting point is 00:25:58 while I'm completely gone, and she would debug my thinking, and I think that helped so much. Problems I encountered were address matching was one. That was a difficult problem. Also, since I was in open source a lot,

Starting point is 00:26:16 I knew about C++ and how it was going and knew about the standards. And the team was using an older standard. So apart from my internship project i said how would i upgrade the language completely like go from c++ 98 to c++ 14 and just use the latest gcc and stuff right so my manager again brilliant people everyone was i can't say enough my manager was like okay go for it give it a try Just don't spend too much time on it.

Starting point is 00:26:46 So in that process, I ended up reading and patching code for the entire company. I found we depended on libraries that were written like seven years ago in C++, which needed to be updated because the default standard had changed. And I think at that time, we used in one header file, there was a hash map, which was a GNU extension and not the new maps. So just going through the Amazon's entire history and code base,

Starting point is 00:27:13 it was like a complete fun experience altogether. I remember those large builds that took hours to finish. Well, I was underplaying it, but yeah. So when modifying other people's code did you ever break something and did your manager ever ping you asking wait what the hell Oh man

Starting point is 00:27:31 So again open source you don't write unit tests right I started learning to write unit tests and like testing in general when I joined Amazon and in that library that I wrote for address matching I had to compare dates. And I was like, why not? Let's do time.now, right? I mean, the Java Equalento, for example. And

Starting point is 00:27:52 wrote the test and everything was fine. And I was on vacation one day and I get an email notification on my phone. And it was my day off, middle of the week. And I look at my phone and I see someone is emailing me about a broken version package. Like my package of breaking the one, one isolated unit of the art build graph. Right. It's like, why is this breaking? And I go there and I learned that it was uh february the 29th and my assumptions in the unit test were wrong so so yeah it's fun that was a like principal engineer pinging me

Starting point is 00:28:34 a brand new sd1 uh like freshly converted to a full-time engineering role and i was like man they're going to fire me but yeah that happened multiple times oh i can totally relate to that kind of fear for those tests in particular we have a special name for them at dropbox and we call them time bombs it happens a lot with promotion related logic so we're offering some kind of discount to users until a certain date, and there's a unit test for that. And when that date expires, the unit test starts failing. As far as I know, it's not super easy to mock out system time. I don't know if that's changed now, but that was always a challenge

Starting point is 00:29:24 and still is. Yeah. I mean, there are multiple hacks that you can go around by it. Like GVisor does something, you know what the Go playground actually, they had a blog post talking about how they've walked time. I think you can internally change the time now, or I think an LD preload hack would might work there. But yeah, Go has a clock thing, right? Instead of saying time.now, you can create a

Starting point is 00:29:49 clock and just use that. Yeah, it seems like mocking out time directly in your tests without messing with the runtime is probably the best way to go about this. It seems like a code smell to use system time in a unit test.

Starting point is 00:30:06 I wanted to ask you about address matching. On the surface, it doesn't seem like an impossibly hard problem since we have a database, I'm guessing, of tests of addresses. But I'm sure there must have been a lot of complications. Can you talk a little bit

Starting point is 00:30:22 more about that? For example, like street, the word street can be represented in different ways in different countries. So it can be street, it can be ST, but ST can also be station. In French, street is spelled differently, right? But the rest of the address is still english like roman basically so there's so many edge cases that i had to come up with a rule engine and like went into too much depth uh but still had to keep it like i think the most interesting thing was that there was this business requirement document that a product manager wrote which was very clear i think he had done all the thinking and that is why probably my manager

Starting point is 00:31:05 and the team was so confident about giving the software to me, right? Because the ambiguity there was installed already. I just had to write the code that matched that specification. Sounds like you can start off with a rules engine and in today's day and age, you can inject some machine learning or something into that. Oh yeah, of course yeah of course of course this was and we still like my intern project this is not even my full-time job so sounds like a fun intern project and the right flavor for an intern project right in case it works it deeply speeds up some processes and in case it fails there's human

Starting point is 00:31:43 verification and it's not super critical so when you went back to amazon did you go to the same team or did you go somewhere else i think um i got on that team because i mentioned c++ in our like onboarding interview one of the senior managers there uh like amazing person he moved on from Amazon to Apple, I think. He heard me say C++. He's like, okay, this is my team and it does C++. Do you want to go there? And I liked the team so much that I came back to the team again. And then I worked on the image processing part of the system and optimizing so that we could process more documents faster. So yeah, I came back to the same team. And I spent two years there.

Starting point is 00:32:25 I think that's the longest I've spent the time in one team. This sounds like the kind of technology that can easily be spun into its own startup. I've seen startups that are already looking into reading legal documents or financial documents? So the funny thing is that me and my colleague, who's my very good friend, who also is in Vancouver now, and I met him like 15 minutes before we started talking, was we would sit and chat and every couple of months, we would be like, he would come to me or I would go to him. And I'm like, man, remember that one thing that I built

Starting point is 00:33:01 or he built a few months ago? It's like, yeah, yeah. So that's the startup and this company funded them for like these many thousands of or millions of dollars like man yeah so so that happened continuously like even till now we build things that after a couple of months someone else is also doing it and they get a lot of funding for it maybe that's something you can think about in a few years like oh and that seems like a good idea and i know how to solve it at scale so see i've thought about this a lot i'm trying to figure out

Starting point is 00:33:32 that my real passion is like building things and learning how to build things um and i don't think the startup life is for me like have maybe I would enjoy working for one, but running one has like different challenges. And today I don't find them attractive. Yeah, there's the set of things you need to do in order to be successful. And that is so different from the regular software engineering job, recruiting people, selling your vision, and all of that. And I've been thinking about this myself as well. And at some level,

Starting point is 00:34:13 you want to do technically interesting work. And I don't know how much you can get that, especially in a non-technical startup, something that doesn't require too much innovation to build, if that makes sense. Exactly. You just have to get it to work like 60% so that then you can incrementally improve it. But I think another thing that happens is that if you do this and you do a startup and you have

Starting point is 00:34:39 projects that you start and 60% in your demo. There are two things that can happen. Either you succeed or you fail. Fail is then rinse and repeat. This cycle continues or you come back under an employer. So that is one outcome. And that outcome isn't really attractive. But again, I enjoy failing. So I think I would enjoy doing that. But if you are successful,

Starting point is 00:35:04 then you need to scale up, for example. And the more you are further away from the development, the more mushy the idea of the thing becomes, if that makes sense. So a CEO of the company

Starting point is 00:35:19 doesn't know from day to day what an engineer is doing, what problems they're solving. They have a very broad, high-level information. And right now at this point, I enjoy the details. Abstract level is fun, but details are like more fun. That sounds like a true engineer. And some of the technical challenges that you also face in a larger company like Amazon, you don't get that at smaller companies. At least it doesn't seem like it.

Starting point is 00:35:52 And that's what keeps me at mid to large companies as well. Because it seems like I wouldn't do any of this work in a really tiny one. I think Amazon does this really well. We have small two pizza teams as it's famously known, right? And a team has a lot of autonomy. So I'm very familiar with the details of my team and like a few more teams.

Starting point is 00:36:17 And I have full autonomy to decide what to do. And there's no top-down orders coming in. There is direction but there is no like order that you have to do this so that I really enjoy here especially right so I want to dig deeper into that autonomy part that you mentioned I've always worked in companies which follow kind of the philosophy of Google, which is as much consistency as possible, limit the number of languages and frameworks allowed. And you get a bunch of benefits like monorepo and all of that.

Starting point is 00:36:57 But Amazon seems like it's like a free-for-all, pick whatever is most suited for your framework. I want to ask your opinion on why do you think one is better than the other or what's your general opinion on this okay i have an example that might not be very accurate but you'll get the idea um you have communism and you have capitalism for example right so in communism the whole idea is that you have, well, not the whole idea, but one part of the idea is that you have a central authority. That's how USSR did communism, right? They had a central planning system, right?

Starting point is 00:37:34 So you had one group of people who were deciding and making decisions for the entire country. Whereas capitalism, it's more of a market-based economy, right? Because what we realize is that the closer to the market you get, the market knows more than any central authority that controls these markets. And the job of the central authority becomes to create incentives rather than dictate direction. So when you have a very uniform, consistent system, it works really well. But there is this step function change when you have to scale beyond a point.

Starting point is 00:38:15 So let me explain why. So if you are on a team with like six people and everyone has their own language and their own service that they're working on it's not tenable but when you have one team consistent it is it works amazingly well when you go from that 50 people let's say you from 10 people to 50 people it still works but when you start teaching like 200 people right then your modality changes. You need different kinds of tools. Scaling up is a very painful process. Let's take an example of Google, for example. They had to build their own tooling,

Starting point is 00:38:54 their own build farms, their own upgrades to their version control system just to support that. Because whenever you are in the top 10% or top 1% of a consumer of something, you need a lot of work there. So that's one way to do it. The other way to do it is to give groups of organizations autonomy. So my team, we have a consistent set of language choices, framework choices that we go with. In our org, we have some common things, but everyone is not told to do something.

Starting point is 00:39:29 But when you go to the AWS level, we just have incentives and things that are not allowed because they don't have infrastructure behind it. And when you go to the whole company level, there are very few things that are there. What that helps us do is that helps us tailor our environment to our context.

Starting point is 00:39:49 Let's say if your approved languages are Java, Python, and Go, and some team needs to write embedded software for server racks so that they can ingest data faster or network switches. None of these languages make sense for them. They would be the first team doing something which is out of the scope. And for them, doing that will be a hassle. But if you had support for different tooling and more diversity, you would be able to evolve faster, I think. But that becomes a problem when you hit a scale. Before that scale, consistency is completely fine.

Starting point is 00:40:25 So you're saying that there's always going to be teams that have some different use case that doesn't fit the set of constraints that your organization is applying, and you end up slowing down anybody who has to do something slightly different, which might be fairly innovative. Exactly, right? who has to do something slightly different, which might be fairly innovative. Exactly right.

Starting point is 00:40:47 So talking through like an example to clarify my understanding, let's say that your organization has a prescribed set of JavaScript frameworks that are allowed. Let's say it only allows React.js. The problem with that is if a newer, better framework comes along, then React might be the suboptimal long-term choice, and it'll slow down your organization. But there's still benefits of sticking with one framework. For example, you could have internal teams just working on improvements to react.js, common running time or something like that, which is what you'd lose that.

Starting point is 00:41:30 If you had to think about a lot of different frameworks that your company might be using. I completely agree. And I think the difference between is that scope. So let's say I like economics and government examples. The difference between is that scope. So let's say, I like economics and government examples. So if you look at your district government, they know exactly about your context. They will say, okay, we have beaches in this town, so we have these rules for beaches.

Starting point is 00:42:01 And that rule scales up to, let's say, the state level. California has, you can scale up that ruling till there. But if I did that rule scales up to, let's say the state level, California has, you can scale up that ruling till there. But if I did that at the federal level, for a state in the Midlands, that rule doesn't make sense, right? Again, going back to the React.js example, for your team, it completely makes sense. Maybe for your group of teams, I don't know how Dropbox does organization structure, but let's say your four teams that build similar software or are in one domain, React.js makes complete sense. But let's say there is another team that does enterprise software or builds websites for banks, which

Starting point is 00:42:37 are still on Internet Explorer. For them, using React would be such a hassle. Maybe it's not true anymore. I haven't done front-end development in a while, but you get the point, I guess. So in essence, there's too much decision-making power to do central of an authority. You just have to figure out what is the

Starting point is 00:42:57 scale at which you need to break the decisions up. Again, if you do it too granularly, then there's too much overhead. But if you do it too broadly, then there's too much overhead but if you do it too broadly then your evolution like slows down i think there's this brilliant talk by one of the engineers from wanderlist that was acquired by microsoft about how they let any team pick any language and they were very focused on having like a diverse set of languages and frameworks being used in the company, just so that they didn't have like one monoculture where if there was one

Starting point is 00:43:30 bug, everything broke. So their way was to build resilience within teams. So you can have different targets where you want to have diversity of tooling and systems. Overall, it seems like with all of these decisions and in software engineering in general, it's all just a bunch of trade-offs

Starting point is 00:43:50 and one has to decide which trade-off to make at that particular time. But in general, it seems like Amazon wants to enable teams to ship software as fast as possible? So first of all, I can't say anything for Amazon mostly because I'm giving my own opinions, but

Starting point is 00:44:14 at the same time, it's such a big company that it's not possible for anyone, one person to know how the whole company operates. I'm 110% sure we have parts of the organization which are slow and which are very considerate and which use this tooling for everything.

Starting point is 00:44:32 And there are parts of the organization which have a very fast-moving pace and they want to improve things faster. So just enabling that is the reason why we have different contexts, I guess. So how long has it been with Amazon for you? Maybe three or four years? Almost five years. I'll finish five years in July, I think.

Starting point is 00:45:02 Okay. I've read that you switched teams after a while. So what did you do next? So from there, I went to like start optimizing our image processing system and working with our research engineers, which was like a brilliant experience.

Starting point is 00:45:19 What happened was that I remember this one incident where I was trying to optimize this one algorithm, which where division took a long time. And I wrote code for SIMD to optimize this division. And after I published the code review, the search engineer comes in and looks at it and he's like, yeah, this works perfect. Thank you for optimizing this. It's merged, goes into production and reduces latency by a lot because we were doing a lot

Starting point is 00:45:46 of those operations. And a month later, another software engineer in my team comes in and says, Akshay, by the way, why did you push that SIMD changes? Because we were doing this division multiple times. He's like, okay, fine. He goes back and comes back to me and say, you know what we were doing? We were doing the inverse square root, like to calculate distance. And the RHS was a constant that we were squaring. So that wasn't required at all, right? So just

Starting point is 00:46:14 looking at the bigger picture just came in and he saw the thing, he removed my SIMD code, removed that one square operation, and the code was just faster again because we didn't need to do that operation. So that taught me how to look at the bigger picture. Yeah, it's super simple to just look at code profiles and look at, oh, see, this seems like the bottleneck in our code because most time is being spent here. We're not looking at the bigger picture. But to clarify my understanding,

Starting point is 00:46:42 I didn't really understand how you sped it up. It seems like you could just cancel two parts or something so we were doing RHS equals RHS is less than LHS right and I optimized one side of the equation I didn't look into

Starting point is 00:47:00 the other side of the equation okay so you sped up divisions on a number but basically you could just hard code the result because the input number was a constant interesting okay so in in general what did you learn from working as a software engineer with in in this particular like mathematical area like what was different from being a regular software engineer working on purely non-mathematical things the domain so i think i learned to learn the domain that because building software for, from a purely technical perspective is easy,

Starting point is 00:47:48 but understanding the domain is very important. I think that, I think that was the biggest learning. I saw how the research engineers understood the domain and then wrote papers explaining the solution because your software, you can add a lot of complexity in your software. If you don't understand the domain clearly. And finding the right abstraction requires knowing the domain. I think that is one thing I really got out of it.

Starting point is 00:48:14 I'm again talking at a very abstract level. There were incidents where I engineered something one way and didn't understand the domain correctly and it ended up being an overcomplication that we went back and simplified. Thinking super deeply about the problem at hand and then writing code that's the simplest possible code

Starting point is 00:48:36 for a particular problem. That seems like a great skill for an engineer to develop. And where did this journey go next so what did you do after this team so from there i moved to seattle to work on aws networking which runs like services for the whole infrastructure and i think one advice my manager gave me so amazon has had this culture of like switching teams it's very normal and uh after two years everyone starts looking at you and asking okay if you're switching or not because uh just

Starting point is 00:49:10 moves the knowledge around in the company so i moved to this team and my i was i had this conversation with my old manager and he said akshay whatever you do when you join the team for the first couple of months listen don't speak listen and i was like okay that sounds like a good idea but i didn't have context to apply that knowledge right so i moved from here to seattle which was like from india to seattle which was another another tangent if you want to go down on but once i landed there in aws and i was like uh i started looking at the software and working on things and features and bugs and tickets i was like why would you do this and i would like look at the code and get annoyed and i was like okay now i understand don't speak just listen and i would ask people questions and i had to understand i didn't follow

Starting point is 00:50:04 the device completely i failed a lot of times and like people questions and they had to understand. I didn't follow the device completely. I failed a lot of times and like ranted and we had team groups where we would just sit and rant about things. And that would give us a deeper understanding of what we were working on. So networking has to be like, that all needed to build very stable software. You can imagine.

Starting point is 00:50:20 So we have this thing called tiers of availability. And tier one is anything that a customer hits. So if you are a tier one service, you have to go through special ops trainings and you have to build high quality software. It's not like other people build low quality software, but tier one is supposed to be

Starting point is 00:50:37 the customer impact. This team had tier zero software, as we called it, because if we go down, tier one software is impacted and hence customers. So from there, I started learning more and applying this context there. And I started learning that, oh, things are like this

Starting point is 00:50:59 because the context over here is different, right? So just understanding that context and building the domain knowledge over there. I remember when I joined, I was like, okay, we have this service, that service. This client calls us. And after a week, I was like, okay, which other service was calling us?

Starting point is 00:51:14 So I started drawing our domain basically on the whiteboard. Every time someone came and said, have you seen this or have you fixed this? It's like, okay, this is a new piece that I'm unaware of. And I would draw it on the whiteboard, make the connections. And after like eight months, I had the complete domain mapped out. And we had a new manager join in and he came in and took a photo of this and put it on one of the documentation pages.

Starting point is 00:51:40 If you could tell me a little concrete, Lisa, what were you annoyed about? What were some restrictions? And if you give me a chance concretely, so what were you annoyed about? What were some restrictions? And if you give me a chance to guess, let me think. Since you're working at a really low level of the stack, your release process would have to be really slow and verify that things are working correctly. Was that the sort of problems you were dealing with? So I think people spoke about this in the recent Kinesis analysis when Kinesis went down, is that we have tiers of services. So every service can't use AWS. Of course, right? Just imagine if Lambda uses Kinesis and Kinesis uses Lambda. So who comes first? Chicken and egg problem, right?

Starting point is 00:52:26 So we couldn't use a lot of things. That was one restriction. Verification needed to be solid. And you see those regions, the dropdown box, when you launch into AWS console. We had to deploy to all of them and things that were being built. So that added some lag in the deployment pipeline. So that was a restriction. You couldn't use a lot of pre-existing package service

Starting point is 00:52:51 that would have made my life simpler just because we were at this lower layer. So it was like sitting here and looking at the code and saying, okay, I can simplify this abstraction if I use this hosted service, but I can't. And that was one thing that was annoying, for example. So you can't use a tier two service

Starting point is 00:53:14 if you're a tier zero service. That makes total sense. But that also gives you the opportunity to geek out a little bit. Yeah. And rebuild stuff that generally you wouldn't get to use since it's all prepackaged for you.

Starting point is 00:53:30 Also, the history, how you got here. One of the principal engineering tenants in Amazon is respect what came before. That team really taught me what that means because today, if I tell you the problem space and I tell you to

Starting point is 00:53:46 imagine the solution, today if I tell you to redesign Dropbox, you would design it probably differently from how it exists. But people built it that way because that is how they understood the problem and that was their context.

Starting point is 00:54:02 So I can sit here and rant and say why would you even use rsync to do diffs between files? Why don't you just use this one paper? But the thing is that the folks writing it at that time didn't feel like using that for some reason, right? That was their context. So you have to either go back and understand.

Starting point is 00:54:25 I think there's this principle called uh chesterton's fence is that before you remove a fence you have to understand why it was put in there before right so it's easy to rant about how the software is today and these are the problems that we face and i don't have time to fix it because other things are high priority but the way it, is because people who built it had a different view of the world and maybe the world was different. No one can predict how the world will be, how your team will be, how the org

Starting point is 00:54:54 would be. If you don't mind sharing, what was the exact component you were working on within AWS networking? So we were in the networking so so the org was AWS networking. So all the racks, their existence was tracked in our service, each device everywhere

Starting point is 00:55:15 where it exists in the data center in which region. And then there are restrictions of what, so you'll see a lot of our services are regional, right? So one region can't communicate to the other, how to store data and what data is being stored. So, and the more data you store in one service and make the authoritative source, right? The less things that this service can do

Starting point is 00:55:42 because it has to, it has a complex domain to map so our domain became like i had to learn about a lot of networking hardware we had this one engineer who joined this team as an intern and was there for like four years and he knew exactly everything about the service so i would go there and say, ask him that, okay, why is this store top of the rack device connected to this thing? And he will, he would like, okay, I can't answer this now. Let me go to the whiteboard with you. And he was like, okay, so this device started off like this. Then we had this one, then we had this one and he would give you the complete context.

Starting point is 00:56:18 And I was like, after five minutes, I was like, okay, please stop. Let me take notes. Let me absorb this and come back with more questions so this seems like a machine database of sorts yeah yeah and it seems really useful since you need to keep track of things like we don't want to deploy this service without any rack diversity because if this rack goes down this entire service would go down things like that yeah i mean one rack failure is funny because, like, hardware fails all the time. You don't notice it.

Starting point is 00:56:50 The beauty of, like, having a hosted system is that you don't even see it. But that is one thing, yes. The other thing is, like, just storing this much amount of data. New hardware that has to be added has to be stored in the system. You, of course, can't use DynamoDB.

Starting point is 00:57:05 Well, we did use Dynamo, but the system was not the best utilization of Dynamo because when the service was built, how to use Dynamo wasn't very clear at that point. So today you have so many features like transactions and secondary indexes, but it's easy to forget that these things weren't there. People worked around this. We needed to be strongly consistent,

Starting point is 00:57:32 but the underlying systems were eventually consistent. And then that just added a lot of complexity. I think I met so many good engineers there and I've got, I learned about TLA plus in that team and I started trying to model out systems to see how it goes, what back source was for example because we had to maintain consistency. So if you could tell me a little bit about the organization structure of AWS and I'm mostly looking for your opinion here.

Starting point is 00:58:04 There's so many different teams at so many different layers of the stack that AWS basically provides as a service and especially the lower down in the stack you go the larger issues can be or the more widespread an issue can be but still in general AWS has this reputation of being fairly reliable. What kind of organizational structure can facilitate that? What kind of incentives are provided by the organization? I'm trying to understand how can such a large ship keep sailing reliably or what makes it happen? So engineers are definitely good i consider myself one of the dumbest people in the team at least right because i make so many mistakes but i think our systems

Starting point is 00:58:53 have an assumption that like most companies where it will fail how do you handle failure and there is this tremendous amount of work done by our infrastructure teams. There's a lot of institutional knowledge about building systems that have redundancies. And these layers of systems help the overall organization survive. Just having teams be responsible for their own context and them knowing how to prevent failures in their own area and just the top level leadership setting the right goals and priorities, I think has been tremendous.

Starting point is 00:59:37 I just imagine AWS, but Amazon in general, as like a big market-based economy because every team has their own charter. Engineers are free to transfer between teams. So if you don't like your manager, you can just put in a request and go to the other team. We have this internal job portal where you can just press apply

Starting point is 00:59:58 and the other team will interview you and get you in. That's how I moved to Teams, right? So there's a big market happening internally, which has these incentives set in. And these incentives are what helps us like implicitly build software that is reliable and that is customer driven, right? Biggest thing I've learned here is like,

Starting point is 01:00:19 you have to be customer obsessed. You have to find what is best for the customer, not for you. So I'm thinking of a failure mode and maybe you can help me walk through this scenario. Let's say that there's a particular team that for attrition or for some other reason, it can't meet its availability goals,

Starting point is 01:00:43 which are basically maybe set by the organization. So what happens or what do you think happens in a case like that? So it's actually the other way around. It's very similar to how I think Alexei on the podcast said on the first episode is that if you have a lot of attrition, your team gets merged into one team.

Starting point is 01:01:04 We build services, not microservices, but in most of the teams I've been. You provide what your own SLA is. So today, if you go to the console, you can't go to Lambda and say, Lambda, can you give me TP99? I don't care about five nines of availability, right? Lambda says, I'll give you this availability.

Starting point is 01:01:25 So every team says, this is my availability. This is how you measure it. And it just rolls up. And when you don't meet your defined, so you have your own defined goals. And when you don't meet them, there are COEs, for example, correction of errors. So what you saw in the Kinesis down was, like, I'm not

Starting point is 01:01:46 very knowledgeable about that specific incident, but we write these reports talking about what went wrong, what is the root cause of this problem, and how can we fix it in the future? And it's not to blame anyone, it's a post-mortem

Starting point is 01:02:01 which we use to drive more reliability, I guess. If in your concrete example, if a team's reliability is lower than what they promised, the team would write a correction of error document and they would say, okay, this incident happened, this is why it happened. And these are the steps we are going to take to avoid such incidents from happening in the future. And presumably

Starting point is 01:02:29 action items from that COE get tracked and have to get done amongst a certain amount of time. Yes. And leaders will hold the team accountable if they can't get it done. Yep. And it's your team that has set up that goal

Starting point is 01:02:47 of five nines or whatever. So what happens if your team says, okay, we'll try, we'll achieve this goal, but then you fail in that particular timeline and then you fail again. So what happens next? I mean, so that happened in one of my teams. I again would name the team, and then you fail again. So what happens next? I mean, so that happened in one of my teams.

Starting point is 01:03:09 I again would name the team, but we had a goal that we couldn't reach. We failed again. Actually, it was my subgroup and we failed again. So the leadership asked basically that, what is, why are we failing? Do we understand why we are failing? It wasn't blame. It was basically asking, do you know why are we failing? And we explained, this is exactly what we think is the reason behind our failure. And then the next question is, what can we do to fix it and then we came up with things that we like the coe process that we can do to fix it but that was about it right and then we were helped in doing it because you have to judge a thing based on the it's a trade-off right let's say if i had to solve the traveling salesman

Starting point is 01:04:01 problem for example i tell my leadership okay traveling salesman problem, for example, I tell my leadership, okay, traveling salesman problem is a simple thing. I'll solve it in like 15 minutes. I'll get back to you next week with the solution. They'll say, if you can go for it, maybe it's required for something. And I try to solve it and I fail.

Starting point is 01:04:19 And I come back and say, okay, one week wasn't enough. Let me take a month. They'll be like, are you sure? Do you need two months? No, one month is fine.

Starting point is 01:04:28 I'll go back and try it again. I fail again because it's a traveling salesman problem. I'll go back and say, okay, I failed again. And at that point, they'll be like, okay, this is the time to step back and understand why we failed. Do we know why we are failing? And we'll come up with the documents explaining. And I'll say, okay, this problem turns out to be harder. Do we want to invest a year's worth of effort solving it? Or do we want to do something else?

Starting point is 01:04:54 And at that point, we will have a communication and figure out what the next step should be. Yeah, that makes sense. And at that point, there can be like an N number of solutions. Something like we need to change headcount or increase headcount or it very rarely seen that the problem is technical only yeah and thanks for walking me through that scenario let's dive into I think what's your last role at Amazon so far, which is working on Honeycode, which is a low code or no code platform. I'm part of the indie hackers community. And it seems like low code and no code tools are all the buzz right now. Everybody's very

Starting point is 01:05:42 excited about them. Can you maybe help me walk through some of this stuff behind the scenes as an insider? What are some technical challenges that you need to think about while working on this system? I started writing code using these old, low-code platforms

Starting point is 01:05:59 way back. They used to be real something that was a drag-and-i so when i learned about the existence of this team and i applied to join here i was just excited because this is the thing that i wanted to use and i do use it i have a bunch of my own apps on honeycode that i use for like doing my own stuff when i moved from seattle to vanc, I put in all my details. I planned the whole move in an application. We weren't public then. So I was using this internal version just to track everything. The space makes me excited because we all need applications. The beauty of computers

Starting point is 01:06:41 and technology is that you can take the mundane and automate it, right? So that you don't have to do this mundane stuff again and again and again. So I just moved apartments, right? And moving, finding an apartment is a challenge. So I built an application, put in what the things are, which I value and things that I don't value, wrote a filter function to select based on those values in that order. And at the end of the day, I just said, okay, let me put in all the details and this is my application. This is my apartment that I want to go. And I just quickly scanned through it and I was like, yeah. Now the next time I, I did this last move, but this time I reused it. So the next time I have to move again, I'll be just putting that again. So that burden has been taken off.

Starting point is 01:07:26 I have a domain expert which I've built over the course of years. So that's why I get excited about this space because it is a product that I personally use all the time, even for vacation planning between friends. And building it is like a challenge by itself, which has been like a fascinating challenge to work on.

Starting point is 01:07:49 It's like, so each domain in the world is like a fractal. You explore it, you see more things, and you keep on exploring it till you reach like the end of your limit. So in this case, the domain just expands, right? Think about if you had to build an application to plan a trip. If you want to travel to another city with five friends, let's also say you have to figure out Airbnb and activities, for example.

Starting point is 01:08:22 It seems like a foreign idea now. Now you would have to build like a website for it, a database for it. You're building it from scratch. Now you also want your friends to add data to it. You don't want to be the sole person doing the research. Maybe you'll say, okay, this person is responsible for booking the Airbnb. This person is responsible for booking the wine tours, for example. And then they all want to do it.

Starting point is 01:08:46 And if you were the person building that application, you would have to like figure, think of the database schema, think of all the data patterns of like taking it, rendering on the website. What's beautiful about this space is that I have to think about this and come up with features

Starting point is 01:09:02 that enable you to build that application without actually looking at the database schema. So I have to think about how I would structure my database so that when you build an application, getting that data out and putting it in is optimized and fast and simple. Right. So that second level of thinking is like, it broke my brain for the first couple of months at least.

Starting point is 01:09:28 Yeah, that seems like a fascinating technical and API design problem. How do you show all of this to a user who doesn't write code, writes very little code, and enable them to build an application maybe through some kind of user graphical user interface yeah and that's not just the only complexity right so you have to have a good interface where for people who don't understand the system who don't understand application development who don't think in terms of schemas right that's one set of problems the other set of problems is an api design that lets you take that data of problems the other set of problems is an API design that lets you take that data efficiently then the other set of problems is like how do I build my internal

Starting point is 01:10:11 architecture of the service to be able to support all these use cases right I can't write a new service for every use case so I think I understand some of the problem, but how do you solve this? So one way I'm thinking of is you can have a NoSQL data store that automatically sets up the right indices for data that's accessed frequently on the fly. How do you even go about approaching this problem? That is one way to do it, right? So the constraints that we are thinking of are the constraints based on the details that we have discussed today. But as we discussed earlier,

Starting point is 01:10:52 there was like tons of context involved in building the system now. And I don't think I can go into the current architecture, but the tons of other things that come into place, which just make this system system very complex, right? Because you have to scale with the size of data, with the number of users, along with these access patterns,

Starting point is 01:11:12 and then figure out how you would do it. And something works, something doesn't work. So it's fun. It's a lot of fun. You probably can't see it. I have like a stack of papers, like scientific journals that I read. I was sorting these papers. This is my this year's paper collection and for the listeners it's like

Starting point is 01:11:32 10 inches high yeah i think there's going to be a lot of show notes for this episode i've already learned about the various kind of issues you can have and the more I think about it the more and more complex this problem seems to be because it seems like you'd have really no control over the structure of the application that your user can create and when they're gonna have like a traffic spike or in general, what kind of data patterns they have, what kind of data access you would have to go through. It seems like it's really hard to design a solution when the constraints are hard to think of or hard to know beforehand.

Starting point is 01:12:22 So I worked on this one feature recently which went into production where when you search for a flight on Expedia or something, you have these checkboxes where you can press a button and say, only show me Alaskan Airlines.

Starting point is 01:12:39 So to enable that use case, we added a feature where the builder of an application can say, okay, let the users of the application customize the list view in the application based on these table columns. I worked on this back-end implementation of this feature. Now, that sounds like a simple problem, right? But there's so much of complexity just if you start adding the size right so how would you find unique values in a list of items unique values depends on the scale but I'd probably just put them in a set okay so perfect

Starting point is 01:13:23 so let's say you have a list of items and you have to find the uniques and you put them in a set. Now, that means you have to look at each value, put them in a set. And if it's a hash set, then putting them is O of 1. But overall, you're still O of n. Now, let's say it takes like 100 nanoseconds or something, and you can extrapolate. There'll be a limit after which you can't scale, basically, because you're looking at everything. So you have to figure out what that point is.

Starting point is 01:13:52 Then you have to optimize it. How would you optimize it? And it needs to be faster than O of N, right? It seems challenging, and I didn't realize I'm in a software interview, but this seems like a fun problem so let's think about it if I want faster than ORFn the only thing I can really think of is that we can't be using a list because the worst case of the list is you you have to check until the last item that could be

Starting point is 01:14:22 a duplicate right so you need to store your items in something else. If your items are comparable, basically, you can find whether an item is less than another item. You basically put in a search tree. And from that, you can really quickly find all nodes that are the same because they're right next to each other? So what you're talking about

Starting point is 01:14:48 is a secondary data structure called an index, right? So you're basically creating an index on top of the list. Now, the next question is, what if your list has tons of duplicates? So there's something called a loose index scan, which I think Postgres implemented. So, the beauty of this field is that you have to go down into so much depth to give the best customer experience. It's a lot of fun. So, how exactly does that work?

Starting point is 01:15:19 Is there like a bloom filter eventually involved somewhere? I mean, I'm not talking about the implementation, but loose index scans let you look at the key and find the next key after it. So you only look at the first element of that key. And if you store it in a B3, which is an order statistics tree, then you know how many rows are clones of this row. So there are tons of ways to do it. It's just a fun computer science pure problem. And maybe a question to close out. Do you foresee that there'll be programming

Starting point is 01:15:57 in its current sense at all in 10 years or will everybody just be using low-code apps in order to build applications and do you think that is going to be the kind of work that you and me do today is going to be more engineers like that less engineers like that what's your general opinion so i i think there will be fewer people doing work that we do. Because that's not a new opinion. That's been true for history, like the entire human history. We find something to do.

Starting point is 01:16:33 And then we figure out a way to not do it, basically. When I moved yesterday, the boxes, if it was a couple of decades ago, I would have probably had to lift the boxes up. But now I just rented a hand truck and put the boxes there and had wheels to move it. So I saved some time. So I think over time, we will start automating and optimizing a lot of these things.

Starting point is 01:16:57 And part, the engineers of tomorrow will have a different set of challenges. They'll be building different things. If you look at the people who built Linux, for example, in Bell Labs, they were writing their own grep and set level. They invented grep and set, for example. And they were writing, they invented the C programming language just to be able to compile this code multiple places. Today, it's a solved problem. I write go and I can just pass

Starting point is 01:17:23 environment flags and it is built for different environments. So even if you go back more recently, you had to write vanilla JavaScript, which is terrible. And slowly and steadily, you got new version of ECMAScripts and now you have things. So our work will definitely evolve.

Starting point is 01:17:40 I think it'll definitely be different. Now, the number of people doing that work, I don't know. Maybe I think it'll definitely be different now the number of people doing that work i don't know maybe i think it will probably be more because we'll just go at a higher layer of abstraction and the lower layer of abstraction will be more solid i guess so there'll be even more programmers but they'll be working on stuff that's higher level than what we work on today yeah i think programmer is a bad word i think i mean we do write programs but i think we develop more of systems rather than programs because program is just one aspect of it right what you do in an organization is basically solve problems by building systems and tech companies solve those problems by writing software systems.

Starting point is 01:18:27 I think software is just an implementation detail of today. So the company of tomorrow, let's say the lawyer of tomorrow, they'll be building a system that handles GDPR requests on their own without needing to pull in a programmer to build something like that? I think in 99% of the cases, they wouldn't need to build a tool. I think they would be using

Starting point is 01:18:53 tools, but ability to use like low-code or no-code platforms will be the cutting edge for them to be agile. Let's say a lawyer today is working on GDPR, let's say compliance, and they know this. And I think the EU passed another regulation about data ownership. The ability for an organization

Starting point is 01:19:16 to build a system to manage that complexity gives them an edge, right? And if it took an organization like six months or a year to build a system to do that and get their ducks in a row versus an organization that did it in like a week, right? The organization that did it in a week

Starting point is 01:19:35 will have more resources to work on something else. So that will just be the edge. And I feel like that might be the evolution of how we work today. But again, humans are terrible at predicting the future. Yeah, I think that makes total sense. The faster

Starting point is 01:19:53 the organization can go, the better and the more efficient the organization is, the more likely it is to survive. Well, Akshay, thanks a lot for being a guest on this podcast and I hope we can do another round at some point.

Starting point is 01:20:11 I always enjoy these fun conversations where we sit and do, think about things.

Pet Camera - EBO Air 2

Software at Scale - Software at Scale 4 - Akshay Shekhar: Software Engineer, AWS

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.

Your Ad Here

Pet Camera - EBO Air 2

Software at Scale - Software at Scale 4 - Akshay Shekhar: Software Engineer, AWS

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.