TLB Hit 💥 - Episode 2: https://tlbh.it^M

Episode Date: February 21, 2021

What happens when you type https://tlbh.it in your browser's address bar, and press enter?...

Transcript
Discussion (0)
Starting point is 00:00:00 Hey, Chris, it's been a while. So our friends are all starting podcasts, which sounds pretty good. But I guess we're going for more the artisanal handcrafted thing for TLB Hit. Is that right? Yeah. Hey, JF. I think our only choice at this point is to claim that we're going for quality over quantity. You can see the little bubbles in our handcrafted episode designs so that you know that they're made with love.
Starting point is 00:00:23 All right. So standard disclaimer, we're here to share our love of random systems and systems programming topics. We only know so much and we'll try not to say things that are wrong. We'll follow up with errata or corrections and clarifications as we find out about them and put them on the website. We do the lifelong learners thing and we're always trying to push the boundaries of things that we know about and have conversations about what's interesting to develop and hone our understanding yeah that sounds really good okay so what should we talk about today we've got a bunch of ideas but here's one that I really liked
Starting point is 00:00:54 recently we should do kind of a silly interview question and we've heard that people ask this question so let's try to solve it so what is the question the question is you know you walk into interview all bluster and whatever and you say tell me what happens when we put an address into the browser's URL bar and hit the return character okay and we'll do that in as much detail as humanly possible now I don't think we're really gonna cover a lot of things right it's just like an hour or something and everyone listening to us will know a bunch more than we do about these things.
Starting point is 00:01:28 So, you know, tweet at us or whatever else with more juicy details that we can try to note in a future episode. So the idea here is we'll try to solve in like an hour or so what we know about what happens when you type an address into a URL bar in the browser. Yeah. And we're not going to start from like Maxwell's equations. We're not going to start with like spice models of transistors. We're going to start a little bit higher than that, I'd say. Yeah. Yeah. Yeah. All right. Sounds good. So I actually have a related story, which is that when I was working on chipsets, I had a
Starting point is 00:01:58 buddy who would walk into interview room, put a motherboard down on the table, kind of slam it down on the table, pointed it and ask the candidate to explain in as much detail as possible how some aspect of it worked. I don't personally recommend that, but I do think it's a fun concept in general to talk about because I think it's easy to lose yourself in all the wonderful details of a modern machine. There's really like a lot of complexity that comes together to make these things happen. And one of the most amazing things about modern computers is how much of it is actually possible to understand and potentially even do yourself. So there's lots of cool projects out there making Pico SOCs and
Starting point is 00:02:36 things like this. There's open source projects like QEMU, where you can create a software emulation layer for a whole machine from scratch. You can recreate a familiar machine and operating system experience in a browser with Asm.js or WebAssembly and so on. All the visibility of modern computing systems and the fact that you can understand everything down to nearly the transistors is in a way what's beautiful about working in computers. But just as importantly, you don't need to know it all to use them well. We have these very nice abstraction layers, and you can keep diving down into the layer that interests you
Starting point is 00:03:10 and kind of snowball up more knowledge and capabilities as you go, which is really the power of abstraction. Nice, nice. So it's like a really well-traught interview question in different fields, right? It's just kind of adapted to different areas. Yeah, because everyone has to deal in abstraction. It's like one of the few fundamental human faculties. I think it was Hume or some philosopher who had a couple of different ways that humans come up with stuff, imagination, abstraction, and things like this.
Starting point is 00:03:39 So yeah, relatedly, there's also this Powers of 10 video from 1977, which was mind boggling the first time I got to see it, where they show the expanding scale from it was like a cell on someone's hand up to the highest abstractions in the universe and down to the smallest ones all in one dialogue, like kind of one visualization. And it was super fun and enlightening. So both abstraction and decomposing things into the smallest pieces you can are both great ways to learn. Yeah, cool. Let's go back to our fun question for today, which is what happens when you go to your browser's address bar and type something like https colon slash slash tlbh dot it and you press enter. All right, so we have to start with the key press. You press enter. So out in the physical world before physical IO becomes digital IO, right? IO input output. Yeah.
Starting point is 00:04:30 Yeah. I don't really know much about that. I've seen like Tron, but that's about the extent of what I know about this. And Tron is a fairly accurate representation of how computers work. But I think, you know, we could probably try to cover it in a little bit more detail. Yeah. Let's do that. Yeah.
Starting point is 00:04:43 So, okay. Keyboards. These are like the fundamental unit of modern computer IO design. to cover it in a little bit more detail. Yeah, let's do that. Yeah. So, okay. Keyboards. These are like the fundamental unit of modern computer IO design. And they're all also based on switches, just like the internals of a computer are based on switches. So we kind of took these ideas of switches and they're inside the computer, but we also put them underneath your fingertips. So we're going to go with mechanical keyboards because they're enjoyable. They make the beautiful clacky noise. So very famous keyboard was the old IBM Model M keyboard, and it had this buckling spring inside. So eventually you'd push down on the spring and
Starting point is 00:05:14 it would buckle and that buckling action would cause the switch to close. And modern keyboards like mine instead have this like little leaf on it, which is a little plastic thing that blocks the connection of the switch. And when that moves down, electrical connectivity happens. So not buckling switch, but still a switch action on and off that's controlled by your finger. So then your keyboard actually has a little microcontroller inside of it, and it's looking for things that get pressed. And usually it's doing that through something called a general purpose input output pin. So microcontrollers can kind of look at voltages and whether things are connected
Starting point is 00:05:48 and sample the voltages to say, is this a zero or is this a one right now? And some of them are configured to read. And so what the microcontroller does is it reads these lines and it figures out what the key is that's pressed based on the ones and zero values from those input output lines going to the microcontroller. It figures out based on which ones and zero values from those input output lines going to the microcontroller. It figures out based on which ones are ones and which ones are zeros, exactly which key in the grid of keys on your keyboard is pressed.
Starting point is 00:06:13 And then it sees, oh, okay, that row and that column is which key is pressed or what set of keys are pressed. And then it says, oh, I see that the enter key is mapped to that row and column. And then it needs to send that key code, that scan code for the keyboard over USB to the computer. Because I have a USB mechanical keyboard. Got it. So that's the grid that they talked about on Tron, right?
Starting point is 00:06:35 Exactly. And then the light cycles will come in soon after. Right. Okay, cool. Tell me about the light cycles. All right. So the light cycles, you got to send stuff over to the computer. So in the modern era, the USB devices talk this standard human interface device protocol
Starting point is 00:06:49 that's layered on top of USB transactions. So what you do is you send packets to a standard driver inside your computer that knows how to talk to these human interface devices. So it's good that we've standardized on at least how keyboards and mice tend to work at this point in computer history. So USB is a neat, modern, minimal number of wires protocol for peripherals talking to computers. And what you do is you wiggle these differential transmission lines. So in order to get like good signal integrity and stuff, they have common ways that they do these protocols for things that talk over wires to computers.
Starting point is 00:07:24 And you're able to wiggle these transmission lines at pretty high speeds these days. They've done a lot of work on figuring out how to get these minimal number of wires to go really quickly when you're toggling between zeros and ones. And so I only personally know about low speed USB from a random project that I worked on in the past. So maybe USB 3 is different because it's even faster than stuff that I had learned about. But there's a serial engine that basically turns the signals you do into packets, and then the packets turn into transactions. So this is kind of like the OSI model of different layers of the different communication protocol pieces, right? And there's a few different types of transfers in USB. The keyboard firmware is going to shove the scan code from the key that you pressed
Starting point is 00:08:05 into an interrupt transfer that goes to the host. And that goes out on the wire over the light cycle. Got it. Got it. I was wondering when you press enter or something, is control M really the key code you see in the browser to go or like on a Unix machine or something? Is it more like control V or something? How does that work? Yeah, I wasn't sure about this. So I remapped the caps lock key on my keyboard to be the enter key. And that seemed to do the same thing in the browser. So I'm not sure. I'm not sure if there's a meaningful difference. Gotcha.
Starting point is 00:08:35 That's funny because I remap caps lock key to escape. And I guess that's the traditional VI versus Emacs thing. Yeah, I guess I couldn't really figure out how to undo the remapping of the caps lock key so i just had to throw that whole computer in the garbage unfortunately oh i know that interview question it's it's uh colon q is the answer that's a good one yeah it's a lot of people unstuck so now we're getting to the motherboard chip set and it's called a chip set because it's literally a set of chips these little rectangles with the black epoxy and logos on top of them that you've probably seen before. And there's a thing inside of your motherboard,
Starting point is 00:09:09 which is that big green printed circuit board with all the chips on it inside of your computer called the South Bridge. And the South Bridge talks to these computer peripherals, the things you're gonna plug into your computer to interact with the CPU. And you can see the term bridge here because it's bridging or turning other ways of communicating into something that the CPU. And you can see the term bridge here because it's bridging or turning other
Starting point is 00:09:25 ways of communicating into something that the CPU core complex understands how to do more natively, something like PCIe reads and writes. And one of the things that usually lives in the Southbridge is called the USB host controller. So the USB host controller is going to help talk to all the USB devices that you might plug into your computer. The USB host controller talks to those USB devices over the wires that are plugged in from the USB on one side, and then it talks PCIe up to the CPU on the other side over memory reads and writes through addresses that get set up at boot time. This is the part that interfaces with the CPU and core complex. So the writes or reads that happen with the USB host controller towards the core complex may have to go through a device memory management unit or IOMMU, which can prevent
Starting point is 00:10:12 wild writes from devices to arbitrary physical memory locations. This IOMMU can have a TLB inside for fast caching of address translation. And for places that we write to frequently, we're going to take TLB hits. Things like DMA, direct memory access, can happen through the PCIe side connections where the peripherals like the USB host controller can dump things directly into the same address space and main memory that's observed by the CPU. And the USB host controller is going to need to inform the CPU about the interrupt with the key press payload inside that came in
Starting point is 00:10:44 from the keyboard peripheral so it can handle it appropriately. So PCIe interrupts are just little write packets that happen over an address space to a memory location. And those get serialized over the PCIe wires and those get serviced in a coalesced interrupt handler inside the CPU. So I'll say, hey CPU, I have something that I need to do on my peripheral. And then the kernel will actually come pick up that note, which has been written to a memory location inside of the bottom half interrupt handler. So the kernel kind of breaks interrupt handling into two pieces, a bottom half and a top half. The bottom half, it just kind of quickly tries to handle the, oh, there's something here
Starting point is 00:11:18 for me to do. Let me make a note of that. And then the top half comes along and tries to actually do the work when there's time. So instead of keeping interrupts masked for a really long time, then it'll kind of punt things to happen in the top half of the kernel process. So that's just the terminology thing that they use. Ultimately, the events are pushed through file descriptors in our particular case as like, hey, there's an event that happened on this input device of this particular type with this particular code, and it had some value associated with it. Usually it's like triples in these input device of this particular type with this particular code, and it had some value associated with it. Usually it's like triples in these input device drivers. Gotcha. Sounds pretty
Starting point is 00:11:51 tricky, but why are events pushed as file descriptors? It's a bit odd that you touch the keyboard and then there's files involved in a keyboard, right? Is it because in Unix land, everything is a file descriptor? So I guess it makes sense, it makes it easy to handle these types of events through functions like select, pull, and e-pull or something like that to have files. But why is that? Right. So the kernel is trying to expose everything in a uniform way. So you could have just kind of arbitrary memory locations where things are changing that are
Starting point is 00:12:21 accessible to programs in user space. Or you could use this already existing abstraction of a file to talk about things that have happened, and then people can look at files. So in the Linux kernel, things like sysfs are ways to ask the kernel, hey, what's going on with the connectivity of my devices, this kind of thing. So a lot of things are kind of riding on this file system abstraction concept. Yeah, that makes sense. So in our particular case, we'll notice that an interrupt came from the keyboard device
Starting point is 00:12:51 and we'll serve it up to the driver, which will push that event over that file descriptor. So the kernel produces a key code. And if you're running an X11 or Xorg rather based window manager, then the X server is going to translate the key code into a symbol, which is basically just a fancy key code, but at the X11 abstraction layer, the X server will do a file descriptor event loop to basically look for these key codes that are happening and push them to the various windows that might be looking for what's happening with the keyboard. And there's a thread that spins and waits for these notifications and then decides what to do when events come in.
Starting point is 00:13:28 Again, like JF said, something like a selector and e-poll would happen where you get woken up when a file descriptor it's watching gets updated. You see the input, you notify the X client, which is a window of some kind, in this case, the browser window, and it'll be informed that the enter has been pressed. So now that event that got pushed to the browser window has to be handled by the browser process which is
Starting point is 00:13:50 the window manager application so the browser has a thing that connects the window manager gets the event the browser notices that the url bar widget inside of the chrome which is the part of the browser that's you know not the content that's being displayed by a web page, but the part that's kind of wrapped around the content displayed by a web page is the thing that has focus. And actually funny enough in Firefox, the widgets were even implemented with a web-like technology called Zool that uses XML based widget descriptions and JavaScript event handlers. So the on key press event that you'd use in a web page is also generated for the URL bar widget. There's kind of this reuse across the content and the Chrome inside of the Firefox browser architecture. In native widget toolkits there are similar things that
Starting point is 00:14:37 happen in any case but Firefox has this interesting reuse between those two sides. Chrome is part of the root window for the process, but the modern browsers have processes that manage groups of tabs for isolation, so that things like crashes, like if there's a bug in the browser code, then it's not affecting all web pages that are running inside of the browser. And that's done through projects like Electrolysis and Firefox. Yeah, yeah. And one thing that's interesting is trying to figure out what parts of a browser are sharing processes. And recently, I think like a few weeks ago, Chris Palmer of the Chrome team had a really neat presentation at Enigma, and we're going to put a link in the show notes. But he talks about how sandboxing works for Chrome, right? So how they separate things between different processes
Starting point is 00:15:26 within Chrome. And basically, the idea is that you can have one process for the browser root and one for, say, the GPU process, and then a bunch of separate renderer networking and storage processes. And that's evolved over time, which part of the browser has its own process and not. And sometimes, different things might end up even in the same process to save resources, right?
Starting point is 00:15:48 Because if you have thousands of tabs open, you don't want to have thousands and thousands of processes. And so sometimes they start, you know, putting things in the same place. But otherwise, having different processes allows you to increase isolation, which can potentially increase privacy and security of your browsers. It's kind of an interesting trade-off there that they do. And then there's the classic question of how do you, when you split things into processes, not incur all the penalty of things living in different address spaces? How do you retain fast communication between these pieces, even though we've separated them into process isolated bits? Right, right. So there's a lot of work that goes into that for all browsers and they kind of take different approaches,
Starting point is 00:16:28 but to try to reduce the, when you can have asynchrony, you want to have it, but when you need low latency, you want to have kind of fast communication between processes that aren't, you know, they're not, they need context switch to talk to each other or something like that. And so that's kind of an interesting part
Starting point is 00:16:44 of the design of a browser. And the other thing that's interesting is you mentioned the concept of Chrome, which is not the web page, but everything that's around the browser window. And it's funny because when Chrome started the browser, it was named after the part that you saw less of, right? It used to be the browsers had all these extra address bars and other stuff that you saw. And Chrome's idea was let's try to minimize it and have as little Chrome as possible. So it's funny that Chrome is named after a thing you're not supposed to see. So I thought that was kind of a cute thing. And so as you were talking about how to send messages between processes and Windows,
Starting point is 00:17:22 well, depending on how it's split up, how the key press notifications are handled, you'll want maybe one process to send a message to another process, to the one that controls the currently displayed tab, and ask it, hey, keys are being entered, and then the Enter key was pressed. Maybe you want to navigate to that URL because Enter was pressed. And so you want to do a cross process messaging and that
Starting point is 00:17:45 can be done in a bunch of ways, right? So in some operating systems, you might want to use pipes or something like that. But in a lot of cases, you want to use something like shared memory. And shared memory in itself, like it's something that's really neat, right? Because first, the newest thing is it involves TLB hits, of course, but also because shared memory is an interesting way that two processes can share physical pages at potentially different virtual addresses, right? So they ask the operating system, like, hey, there's a physical page here. Can you let us talk through it? Just share the same physical thing. What's interesting is shared memory is not really a thing that CNC++ acknowledged
Starting point is 00:18:21 they exist. In their memory model, model, they're not really a thing. The same way before C and C++ 11, threads didn't exist in the model of those languages. In reality, they exist, but in the model offered by languages, it just doesn't. And so the way it's kind of modeled right now is they're basically just treated as external modifications. So it feels odd to say that, but you ought to use
Starting point is 00:18:45 volatile to do shared memory accesses. But anyways, I'm really digressing here. This could be a whole other episode, but you were talking about tabs here. So tell me more about that. So you probably need a separate synchronization primitive hypothetically to do cross-process locking, say, because it needs different guarantees from the ones that you get from threads, which live inside of the abstract machine model. Yeah, exactly. Interesting.
Starting point is 00:19:12 Yeah, so one thing I do remember about tabs is they have a navigation cache for going backwards and forwards. So when we're updating the location, so we start kind of on a blank tab, and blank tabs, I think, are kind of their own entity in the browser universe. You know, maybe it takes you to a special display page or something like
Starting point is 00:19:30 this, but then when you navigate it away to a particular website, like TLB hit, like we said in the episode title, then it will say, Oh, if you hit the back button, maybe I take you back to the blank tab page or wherever you previously were before you navigated to this new place yeah yeah so what's funny is apparently one of the most complicated pages in the whole browser is about blank if you type about colon blank in your browser apparently that's one of the most weirdest most complicated pages there are in the whole browser but let's not talk about that let's go back to tlb hit instead so we're out of the whole browser. But let's not talk about that. Let's go back to TLB Hit instead. So we're out of the whole keyboard thing. We got the keys that were pressed and whatever.
Starting point is 00:20:08 And at this point, we've got a URL. This tabs like content frame thingy is trying to navigate to that URL. That's what the user asked. They wrote a URL, pressed Enter, and they want the content frame to navigate there. So the first thing we need to do is we need to parse the URL that the user wrote.
Starting point is 00:20:27 And what's really surprising is URL parsing really ain't easy. And how it actually works is the first thing you do is you look at the protocol if one's specified. So if you wrote HGPS colon slash slash, that's the protocol. If you didn't, the browser usually just infers a protocol. And it looks, depending on the protocol, HTTP or HTTPS, it has a different way to parse things because there's a bunch of other protocols.
Starting point is 00:20:51 There's like file and gopher and whatever else. Like, you know, each protocol is parsed differently. We'll just focus on the HTTP and HTTPS ones because they kind of work the same way. But the way those work, it's a bit surprising, but you kind of, you know, you look at the string and you kind of parse it inside out. And there's a bunch of weird things in the URL. You got to figure out where's the TLD, top level domain, and then what's the domain itself and then any subdomain. So that's going in leftwards into the URL. If we assume your left to
Starting point is 00:21:22 right scripts are being used. so you have TLD, then domain, subdomains. And then on the left of that, you might have a username and a password inside the URL. And then on the right of the TLD, so after the TLD, you might have a port, and then you might have like a slash, some path, then query parameters, and then query fragments. So URLs are kind of really tricky to parse here yeah i was also thinking the file colon slash slash that you can put into a browser that's probably like a pseudo protocol instead of a real yeah protocol and that probably has different exactly parsing for backslashes and things like this versus if you typed it into a web url oh yeah because you don't want to leave any
Starting point is 00:22:01 like you don't want to just take the path the file path and just hand it to the OS because there's some paths in the OS itself that are special. Like I remember in the early days of Chrome, there were like special, like undocumented things within Windows that Chrome didn't know to filter out. And then that would be the source of exploits because Chrome would just be like, oh, this is a fine file thing that let through. And that would be like some special device thing or whatever. So yeah, file is totally special here as well. Yeah, interesting. I guess on that note, let me take a brief moment for a kind of human being related point. I do kind of
Starting point is 00:22:34 find it funny how I worked for a browser company, Mozilla in this case, but mostly because I was working on the JavaScript engine, learned things as they pertained to the JS engine, just the piece that I had worked on. So even though I worked for a browser company, I have no idea how most of the browser actually works. And I know about a few interfacing bits, like, oh, how DOM nodes are reflected into JavaScript and maybe what the browser level cycle collector is doing.
Starting point is 00:23:02 But it was kind of also similar working for a GPU company where primarily you're working on CPU oriented things. I still probably can't really describe adequately the rendering pipeline for a GPU. So everybody has these interesting blind spots, which sometimes are counterintuitive, even given their history, where everyone is going through the XKCD, Mentos and Coke. You're one of the lucky 10,000 people to be learning this fact today experiences over and over again constantly. Yeah, that's interesting. It's kind of like the idea that Isaac Newton was probably the last person
Starting point is 00:23:36 to know all the human knowledge at the same point in time, or most of it. At this point in time, who knows everything about any topic even in our small field of computer science like who was the last person to know everything about cs i don't know and you know same thing for me when we both work at nvidia i i didn't touch any gpus ever and then i've worked on two browsers and i probably don't know much of this so we're kind of winging this interview question so far and i don't know how the interviewer feels about our answer but if i were the interviewer feels about our answer. But if I were the interviewer,
Starting point is 00:24:07 I'd be like, oh, so what? You said URL. What's the difference between URL and URI? And like, nobody knows, right? It was important like maybe 20 years ago. And then everyone says URL and then some people say URI just to nerd snipe each other and whatever. So we'll just ignore that, right?
Starting point is 00:24:20 We'll just assume we're doing well on the interview and keep going on. I guess it's the benefit of having our own podcast is we pretty much have to be here. So regardless of how bad we do. Yeah. Yeah. So we'll keep doing this, man. But yeah, so now that we've parsed the URL, we have a TLD, we have the domain, the subdomains, and we're going to ignore the rest of the the the request for now. We just like, OK, got it.
Starting point is 00:24:43 We kind of know who we want to talk to. We got a name, right? But how do we talk to that server? Well, we got to figure out which machine that name corresponds to. And the way you do that is through DNS resolution. So DNS is the domain name service and for our case, tlbh.it, well,.it is the TLD, the top level domain, right? It's for the domain for Italy. And after that, TLBH is the domain name. And so we've got to like go to Italy or something and then figure out the thing. That's not quite how it works, right? So there's these DNS servers that just map all of the domains to IP addresses and things
Starting point is 00:25:23 like that. They do other stuff than just IP addresses, but by and large, this is what we care about right now. The thing that's interesting is DNS resolution itself is notoriously still totally insecure, right? It's done in the clear, it uses UDP packets. And so we're gonna need to craft and send out a packet here, I think, right?
Starting point is 00:25:41 And we haven't talked about packets here. But what's interesting is there's a bunch of stuff on top of that. So for example, recently, Firefox rolled out DNS resolution over HTTPS, but they used Cloudflare as the intermediate to do that resolution. So to do like the, you know, use a secure connection to do DNS to Cloudflare. And then there was a big argument on the internet about that. Some folks were happy because like better privacy and security, but others were worried about Cloudflare itself and whatever else. It's all stacks, stacks of trust. You know, what's the root of trust,
Starting point is 00:26:13 you know, between the trust, no one ideal, where you would only have to trust yourself. How do you stack things on top and keep everything copacetic? Right, exactly. And then, you know, first. And then first of all, it's not just roots of trust all the way down. It also caches all the way down. Networking is all about caching stuff. And the first thing the browser does,
Starting point is 00:26:34 before even doing a DNS request, it says, hey, have I looked up this domain name before? And we know it has, because you've been to TLBHit before. But let's assume you haven't. The other thing it does when it hasn't looked up the domain name before is it says, is this URL a known malicious website? Because there's a lot of bad stuff on the internet. And browsers kind of got pretty smart about this. And they started having a list of websites they just like don't want you to go to.
Starting point is 00:26:59 Or they'll tell you like maybe you don't want to go there. It looks malicious. So there's something called, say, like Google Safe Browsing that exists inside of Chrome. It's used by other browsers. And what's interesting there is there's a lot of websites, like a lot. And there's only a few malicious ones. And that list kind of changes pretty frequently. And so instead of your local browser having a list of all the malicious ones. As we're talking about interview questions, if ever you interview at Google, the best thing you ever wanna do
Starting point is 00:27:30 is you wanna use a Bloom filter, right? And the original implementation of safe browsing, if I remember correctly, use Bloom filters. It's a probabilistic data structure that says something like, you ask, is this domain safe from the data structure? And the data structure usually says, oh, this is safe, just go ahead, keep looking it up.
Starting point is 00:27:48 Or otherwise it might say, I'm not sure. And then when you're not sure, it asks you, asks a separate server, is this domain safe? So you go to the Google central server and the Google says like, oh yeah, it's safe or it's not. So it's a probabilistic thing. When it says it's safe,
Starting point is 00:28:02 your snapshot knows for a fact that it's probably safe, because it's been probabilistic thing. When it says it's safe, your snapshot knows for a fact that it's probably safe, right? Because it's been validated before. But if it doesn't know, then you take an extra round trip to go to Google, say, is that safe? And then you come back. So that's kind of interesting. And then there was another thing a few weeks ago where WebKit, the engine behind Safari and other browsers, came out and said, oh, we're going to start proxying safe browsing requests through a separate server. And that got a bunch of other browsers came out and said like, oh, we're going to start proxying safe browsing requests through a separate server. And that got a bunch of other hacker news karma and whatever else. So we'll put that in the show notes.
Starting point is 00:28:30 There's a lot of interesting stuff to talk about in that space of how, you know, those caches work and how you check certain things. And then we were talking about, you know, how to parse a protocol, HTTP or HTTPS, but there's also the idea of protocol changes when you look at the URL. So for example, there's this thing called HSTS, which can change an HTTP request to an HTTPS request. So from insecure, kind of send out in a clear to secure connection. And one way that this is done is through something called HSTS preload. What happens there is all the HTTP to HTTPS domains are preloaded into the browser.
Starting point is 00:29:09 So your browser just has a list of all the domains that have said, I am HTTPS only. And so what's interesting there is the domain owners tell the browser vendors to always do HTTPS connections for them. So the browser will, even if you don't write a protocol, or if you write HTTP, say you go to google.com, Google will refuse to connect to google.com without S in there. So it'll just change your HTTP to HTTPS. And what's interesting is it's not google.com only
Starting point is 00:29:36 that can do that, like not just a domain, but a whole TLD can do that as well. So if you go to say like a.dev or the.google TLD, they've opted in to only have HTTPS for themselves, as well so if you go to say like a dot dev or the dot google tld they've opted in to only have htps for themselves as well as all of their subdomains right so if you reserve a dot dev domain on the the dev tld then you have to have htps if if you're if your tld doesn't do that or if you don't have the capability of doing only HSTS preload, one thing that you can do is there's this thing called HTTP headers. So once the client connects to you, you can respond with a bunch of headers. And one of the headers is called strict transport security. And if you reply with that, it says, well, next time you talk to me, do HTTPS.
Starting point is 00:30:22 So what that means is first connection is going to be HTTP, but one is HTTPS so this is interesting it's a bunch of caches these things are obviously you know they're in the browser cache they have timers some of expire after a while and it's super complex because the Internet is large securing privacy weren't really built into the internet from the beginning right like it was kind of a researchy military project thing like none of that was built with security and privacy in mind nobody knew what that actually meant back then i think the key was if a nuke hit the internet would it keep routing packets to the desired endpoints more than anything else exactly yeah because these security features were
Starting point is 00:31:02 surprising to the designers they weren't really built in from the beginning. And even now, like, you know, we look back and we say like, oh, well, the internet should have been designed differently. Well, even now, as these things are designed, we add security layers that it turns out themselves have, say, privacy holes or something like that. So there's these things like super cookies where you can use some of these caches, some
Starting point is 00:31:26 of these preloadings or whatever else, to create not a traditional cookie, which is the server tells you, store this information, and next time you talk to me, send it back to me. That's a cookie. But there's these super cookies that allow you to infer that you've talked to this client before. So you can say measure latency or figure out,
Starting point is 00:31:44 do you talk to me over HTTPS the first time around instead of HTTP or things like that? And if you reserve a certain number of subdomains or whatever and you do HSTS or something, then potentially you can create super cookies out of that. So it's kind of interesting, this security built on top of a nominally originally insecure system and privacy insensitive system, you know, trying to
Starting point is 00:32:06 fit those properties back into the system tends to not always work out the way it was intended. So it's kind of cute. Like there's a lot of weird, odd things on the internet that just weren't designed to kind of evolve over time. They sound delicious, but the problem is that whenever you create this complex system, it's difficult to foresee all the unexpected use cases that possibly can arise from a basic mechanism. Yeah, exactly. And, you know, what's interesting is, you know, you start with thinking about the internet and you're like, maybe more than just English speakers want to use the internet. So you start looking and you say, well, maybe we should support something like Unicode and domain name or something like that, right?
Starting point is 00:32:45 And we talked about parsing your URL being difficult. And the mental model we probably have is just URL is ASCII, but it's not. The way URLs work is nowadays they support Unicode. And this is like really different between TLDs, right? So it's up to the TLDs to decide which subset of Unicode they support, but Unicode itself is super complicated, right? Like, for example, there's a whole thing about canonicalization of Unicode. So for example, if you were to write out my name, I have a C in my name, there's multiple ways to write just a C, right? You can write C combining character
Starting point is 00:33:22 or you can write the C saddilla character. And there's multiple ways to canonicalize Unicode. And so if you have two domain names that look visually exactly the same, but they're actually different characters, are they the same domain? Well, to a user, it kind of ought to be, but to a computer, it's not. And it's actually super, super tricky to figure out, are these two strings the same, right? Like when you think about it from a user perspective, it's kind of obvious. You're looking, you're like, this is a different word or it's not. But from a program perspective, it's really hard. And so there's this thing called Punicode, which is used to encode URLs in ASCII as if like
Starting point is 00:34:02 deriving from the Unicode equivalent of their encoding. And so, you know, for example, if you have a domain name that uses Cyrillic characters, some of them look exactly the same as the regular ASCII characters. And you could, you know, reserve something that looks like google.com, but using Cyrillic characters. And in those cases, they would be confusable. And obviously, you don't want someone to give you a URL that looks like, say, google.com or yourbankname.com, and that's actually going to malicious.com instead.
Starting point is 00:34:34 The browser kind of has these heuristics to try to prevent confusable URLs from being displayed, to try to help users who shouldn't know about Unicode from, you know, going to a malicious website without knowing that. And what's cool is the Unicode consortium has a whole technical report on this. It's called Technical Report 36. We'll put in the show notes, but it's about Unicode security considerations about what's confusable, what's not, how do you handle this and whatever else. Yeah, it's a fascinating field where the Cyrillic characters might look exactly like what you would expect from say the Roman alphabet I guess and then you would end up totally not where you expect it to be and maybe if you got a secure connection you would have the
Starting point is 00:35:18 secure connection registrar telling you whether that domain looks too much like some other domain, that might be a consideration. Yeah, exactly. And even what is secure is really hard to explain to users, right? There's a lock on there. What does that even mean? I don't know. Google told me it was secure, I guess, but it's really hard to understand. And it's interesting, one of the people I used to work with at Google, Adrienne, I think she said at some point, her sister asked her what the handbag sign was in the browser window. And she eventually figured out she meant the lock. So even the fact that there's a lock,
Starting point is 00:35:55 it's a small little icon. Is it actually a handbag? For people who aren't used to the technical stuff, it's really non-obvious. So exposing security in a way that makes sense For people who aren't used to the technical stuff, it's really non-obvious. So exposing security in a way that makes sense is in itself a legit super hard part of computer science. Developing secure applications that are usable for people is really, really hard. And to some extent, security and convenience are often in tension, right? Where if you go to a website and it says, wait, this might not be secure, click through these three different things to pass. If things are supposed to be working,
Starting point is 00:36:30 then that's quite an inconvenience, but it's worth it probably to indicate to people, hey, this actually may not be what you're trying to do here. This might be something very bad happening. Yeah, exactly. And what's interesting is, you know, you can see a lot of places that we as tech professionals have trained users to just click like, yeah, okay, whatever, just let me through. That's actually kind of hard to not train users to not click yes through anything. And what's extra funny is, you know, maybe IE6 or whatever, IE5, it used to be that when you used the browser and you had an HTTPS connection, it would pop up a thing that would say like, this connection is secure. And it means that nobody else can see what you're sending over the internet. And then you had an okay button or a more info
Starting point is 00:37:16 dot, dot, dot button, which is reverse of what you have nowadays, right? Like they used to alert you when the connection was secure. So it's kind of funny. Stuff's evolved over the last, whatever, 20, 25 years or something. Yeah, constantly evolving and growing ecosystem as considerations arise too. Right. All right. So I guess, let's see, to get back to the question, many people may have tried out socket programming before. So fundamentally, we kind of open the socket,
Starting point is 00:37:45 which is a connection that can leave the machine via the network stack. And we could do an IP level connection or a TCP level connection. These are all possibilities when we're doing this kind of socket programming. And the sockets look like file descriptors, like we talked about earlier. We piggyback a lot of stuff on this abstraction of files and file descriptors, like we talked about earlier, we piggyback a lot of stuff on this abstraction of files and file descriptors. And so we create a UDP connection, a connectionless connection, not a persistent connection. It's kind of just throwing packets
Starting point is 00:38:17 against some potential endpoint. And then we're talking to our DNS resolution route. We're like, hey, I need to figure out, I've got this text here that says TLBH.IT. I need to figure out what IT really means in terms of someone to talk to. And then I need to figure out what the TLBH means in terms of somebody to talk to.
Starting point is 00:38:36 Maybe the browser knows about the.IT top level domain inherently. So I just need to figure out that second part. This is about where we're at. Yeah, yeah, yeah. And here at this point, we we're doing its GPS so it has to be secure and we gotta use some encryption scheme to talk about the endpoint so to do the HTTP request and so now we kind of got to talk about
Starting point is 00:38:59 crypto and unfortunately for our internet friends it's got nothing to do with cryptocurrency cryptology cryptonom cryptonomicon, or GameStop, or whatever else. So we mean cryptography here. And generally speaking, the browser is going to talk to the server through the transport layer security. So it's generally called TLS. So it'll start with a handshake where the browser and the server negotiate how to crypto each other basically so the server then gives a certificate back and the browser checks the certificates legit using what's called a root store of certificates which have been signed by certificate authorities sounds
Starting point is 00:39:35 really big and yeah basically what it is is someone trusted by the os or the browser signed off on the server certificate so the browser trusts it. It uses some fancy math that's easy to verify in one direction, but really hard or effectively impossible to reverse. So you've got to have some fancy big numbers to be able to reverse it. That's really hard to find them, but to verify that the big numbers were used, it's easy to do.
Starting point is 00:40:02 And so what's interesting there is that once trust is established between those two, the browser does something like use the server's public key, which is asymmetric encryption, so non-symmetrical encryption, to establish a session key to communicate using symmetric encryption going forward. So we start with asymmetric, where we can sign each other's things and whatever else, and we say, okay, let's use this key to talk to each other. And then they have this symmetric encryption,
Starting point is 00:40:27 which is faster and kind of simpler to do. And what's neat with this crypto stuff is that it also provides reliability on top of the transport. So what it does is you get a packet and you kind of a signature for it. And then you verify that the packet was signed properly. You kind of decrypt it or something.
Starting point is 00:40:49 And then if there was a random bit flip in transport, some data loss or whatever, then you would fail to authenticate the packet itself, right? So it kind of prevents not just people tampering, but just also random bit flips for reliability purpose. Now, I'm really hand waving here because the details are super fraught with peril. There's plenty of insecurity to be had here, right? Like I talked about math, like the math in there, some of it is known to be pretty secure for now, probably secure for the next like 15, 20 years. Some of the math is also known to be super insecure by now, right? So if you look back at the old protocol, some of them are known to be broken, super tricky to do properly. And there's all these corner cases and whatever else. But generally speaking, the idea is you use fancy math to figure out how can I trust that this person is who they say they are, right?
Starting point is 00:41:32 And there's kind of this layer of trust that's been created over time. Now, another fun fact about this is, you know, we use crypto all the time in browsers as well as other places. And a lot of the primitive operations are accelerated by modern hardware. So the same way we talk about TLB hits, there's also all these crypto things in the CPU that accelerate crypto. And crypto, by and large, it tends to kind of shuffle bits around a bunch, do a bunch of rounds and shuffle things around, do math with it. And so CPUs have things to accelerate that, to just move the bits around, XOR them and whatever else. So stuff like AES, you know, one round of AES might be accelerated by your CPU. And then you just call that instruction back to back to back to do crypto
Starting point is 00:42:15 more efficiently. Yeah. And we have actually instructions have been added to CPUs in the past bunch of years to do things like AES, which is doing our symmetric encryption between the two sides perhaps, or things like CRC32C, which is used by Ethernet to check some of the data in a packet. There's basically CPU and network protocol co-evolution going on here in the computer ecosystem, which is also very cool.
Starting point is 00:42:42 And even things like randomness, things that grab entropy from the CPU's thermals and uses that as its entropy pool. There's even cool things that happen like that to do with processors. So that's all really neat to see. Yeah, randomness in the CPU is another one of those kind of fraught with peril things, which sounds good. But then how do you verify that it's actually random, doesn't have a bias or whatever else, and it hasn't been tampered? That's all really tricky stuff. Okay, so I think at this point, we should talk about TCP. So we kind of talked about before that you could create a UDP connection, which is effectively connectionless connection, which is a bit of an
Starting point is 00:43:20 oxymoron. But basically, I can just shoot packets at the other side of this thing that's trying to connect to a particular say ip address but then we can layer something on top and i think actually in the history of tcp and ip first they were just going to create tcp as like a reliable connection protocol but then they realized for low latency purposes they wanted to kind of fork off the bottom part and have it build on top of the connection list portion which did not need reliability and so it could be more real-time oriented and so for tcp it piggybacks on top of the ip layer with the basic three-way handshake that we all kind of know and love of sin reaches out to create a connection act confirms that the
Starting point is 00:44:03 sin was received on the other side, and then you reply, sin ACK, I'm acknowledging that you gave me the ACK to my sin. And so then finally, when everything is done, you use a FIN packet, but that's stuff we'll save for another time. Maybe we'll talk about TCP offloads or something like this. Yeah, yeah, and there's so much complexity on there.
Starting point is 00:44:24 It's like, we can't really talk about this there's two i want to mention though uh there's extensions like tcp like tcp fast open uh what's what's cool about it is it saves what's called a tfo cookie fast open cookie for authentication and it sends it along with the initial sin packet right so the additional sin packet that's trying to establish the first connection allows you, if the server knows about this whole TFO thing, to skip the ACK packet or go faster with the ACK packet because it saves you a round trip. It's kind of cool. And then another cool different thing that you can do with networking is like multipath TCP, which allows you to create multiple connections to the same named endpoint, right? So same client, effectively same server talking to each other,
Starting point is 00:45:05 but different transport layers, such as like maybe one goes over Wi-Fi, one goes over 5G. So the route that it's taking is completely different, but they're both doing their own kind of, you know, TCP sawtooth connection to try to do congestion control, figure out how much bandwidth there is in each connection.
Starting point is 00:45:22 And as the quality of those connections changes over like say you're driving around or whatever then then you know you're sending more packets for the one communication you have with the server through say Wi-Fi or through 5g or whatever else it's kind of interesting where you there's all these new things that come up over time on top of networking that change how this stuff works right right? So there's a lot of stuff to talk about here. But basically, the idea of networking here is like, just like any other technology, it's something that works. It's been there for a long time. There's so much legacy on top of it. And folks try to build better things on top looking at like, oh, well, there's these
Starting point is 00:46:00 few bits not used here. Maybe we could do that. Or like, this is the longest pole in the communication. Maybe we can shorten it with this,, this is the longest pole in the communication. Maybe we can shorten it with this if both ends only knew about whatever. And, you know, so it's really complicated stuff. And the same as like, you know, POSIX is not perfect. Like you shouldn't design everything like POSIX. Well, like networking is not perfect.
Starting point is 00:46:17 There's, if you were to do everything over from scratch, if that were possible, throw away everything and start over, there's so many things you can do better. But even then, you would still have your own legacy. Like in 20 years, you would want to throw away your stuff and start over. And so maybe being conservative and keeping what works pretty well is useful as well. And as these things evolve, they also create opportunities, or as the computing ecosystem evolves, it creates opportunities where things like hyperscalers that
Starting point is 00:46:47 have data centers and the data centers, they have variable congestion maybe, but maybe things are not quite as lossy. People start coming up with protocols that are not as concerned with internet scale packet loss, but maybe, or the end-to-end connectivity being exactly retried from single places, but new ways of approaching that kind of problem at a data center scale where you're kind of in a giant building full of machines. And so you see a lot of papers these days on what can we do in lieu of TCP IP within this kind of domain before we exit into the broader internet perhaps. Okay. So now we have the server looking at the path that was requested. So eventually the request, the HTTP get request arrived for the, you know, base URL, just the slash.
Starting point is 00:47:35 There's no qualifier on the path that we're getting from the website. We're just trying to get whatever it is at the website, tlbh.it. And so the server gets this HTTP get request that sees that it's HTTPS, so it's secured, and it tries to figure out what it's supposed to do in order to serve up this request for whatever the base page is. So if you've tried to write web servers, you know, using maybe a web server framework or something like this, usually what you'll do is you'll configure routes, and those routes will cause certain things to happen in the server process in order to serve that request on that particular path. So if I do like slash foo,
Starting point is 00:48:14 I might have a foo handler that then kicks in as a function that gets served for the connection that was made from this client. And then there's things you can do like keep persistent connections and fancier things like this. But the basic query response model is HTTP get, and then response comes back with data. And there's a bunch of request and response status codes. There's ones for informational responses, successful responses, redirects, client errors, and server errors. And JF has his personal favorite. Yeah, yeah 418 i'm a teapot is my favorite response it's kind of an april fool thing but it's kind of funny and yeah like we're not covering so much of the stuff that happens on the server side partly because it's
Starting point is 00:48:55 it's pretty varied right we could talk about like apache mod rewrite or other stuff like that and all the cloud things and and caches proxies and and CDNs. CDNs are a huge thing in this space, but let's just ignore that for now. We're just talking about the client. Let's imagine that's how we choose to answer the interview question, and we're doing fine so far. In that interview question, we didn't really talk so far about how to create a network connection at all, right? The traditional way that you do that is with BSD socket interfaces. And that's like the tried and true, like it's been there for what, 30, 35 years. And most things that talk to the internet are built on top of that. And it works pretty well.
Starting point is 00:49:34 But just like POSIX, like there's a lot of good stuff. It's pretty good. Everyone knows it, but it's got some flaws too. So what's interesting is there's this neat other approach that was developed in the last few years by the IETF, the Internet Engineering Task Force, which side note, sounds amazing. It sounds like the Avengers for the internet or something. But they developed this thing called TAPS, T-A-P-S. And what's neat about their interface design, I'll put a link in the show notes, is they have the following goals for it. Instead of BSD sockets, they're trying to have like a single interface
Starting point is 00:50:08 for a variety of transport protocols. They're trying to have interfaces that are message oriented instead of stream oriented. And they're trying to do everything built around asynchrony, right? So when you create the connection from the client side, you're trying to make all of this asynchronous because it's going to take a while to hear back from the other side, you're trying to make all of this asynchronous because it's going
Starting point is 00:50:25 to take a while to hear back from the other side of the network response. And so having a whole interface that's built around, ask for something and then do other stuff and come back, maybe you can really mesh that well with something like coroutines or whatever in a programming language design perspective. And then what's cool also is those APIs are kind of designed for security as a first-class feature, which wasn't really the case for BSD sockets. And it's also built for multi-stream and multi-path as a first-class thing. Now, none of those things are impossible with BSD sockets. It's just kind of not the shape of the API, right? And what's cool about ETF taps is it's built with that
Starting point is 00:51:06 as a really first-class design principle. So it's kind of a more ergonomic design for all of these network interfaces. So what's the adoption status of that? I always worry when I hear about these new things, is it just kind of the N plus one standard or is there some path to it being used by more and more stuff over time?
Starting point is 00:51:24 Yeah, it's hard to tell because the way TAPS was designed was through a collaboration by different experts in the field. So not just one company, but a bunch of different ones. There's kind of some samples. I know the APIs that Apple exposes in their operating systems uses the TAPS approach at a high level. But if you look even at the most recent C++ proposal for networking, it's a C++ API. It's not just BSD sockets, but it's a wrapper. It's called C++ networking, but it's really a wrapper around sockets. And it doesn't use that
Starting point is 00:51:58 approach as a counterproposal, which I'm a co-author on that proposes using TAPS instead. Still being discussed whether C++ should do that or not. But at high level, taps is still pretty early if you look at the publications. And so it doesn't have that wide adoption yet, right? Like if you're a browser vendor and you're trying to have a browser that works on a few operating systems, you're not going to move to that until all the operating systems have that as a primitive, or you have some shim layer that exposes that in a uniform way. So yeah, it's still pretty early to do that. But I think it's pretty promising as a future API layer. I would certainly like to program to that more than the traditional socket approach.
Starting point is 00:52:37 And it does have a way to de-sugar into the existing way that folks are doing things. So it's not kind of an all or nothing by hand Right. Like anything that's designed to be asynchronous, you can transform into a synchronous program by just like blocking. But the security, you know, same thing. Like if it's only secure, you can, there's some switches to downgrade the security to be insecure and whatever else, but it's not, you know, it's not the primary thing that it tries to do. So it's really tricky to design networking well. Cool. I guess we covered a lot of the salient points that I think we're interested in, but it does leave a ton of details unsaid. There's only so much you can cover in so much time. We didn't talk about parsing or rendering the content
Starting point is 00:53:17 once it got back to the browser, how it gets displayed, or the fetch process leading to more requests, which can also delay the final rendering of the page, all these kinds of nuances. Yeah, totally. I mean, like browser parsing is another pile of legacy that's complex and tricky and rendering and all that stuff. Like there's so much to talk about. And I don't think either of us are really experts in there. So we'll leave that for the other interview question. One thing that's funny is you'll note that when you fetch tlbh.it, it performs one fetch. And I think if you go to the homepage, that probably fits in one response packet.
Starting point is 00:53:55 And then if you look at the webpage itself, it should perform exactly seven subsequent requests, which should all hit your cache if you visit tlbh.it more than once. Because when we wrote the lovingly handcrafted HTML for tlbh.it, we decided custom fonts are kind of neat. So we're going to use custom fonts. There's seven of them. And so that's why it performs seven requests. They're pretty small. We host them on the website. But when you render tlbh.IIT it should emit seven
Starting point is 00:54:25 different subsequent requests get the fonts if it doesn't have the fonts locally it should be really fast at just rendering the web page with your local fonts then switch back to the the custom fonts if that's not available so I would say our website is pretty minimal and responsive which is cute and what I love is know, you can look at this in the browser's built-in web inspector. You just look at it and it's really fun to debug browser stuff. Just like use the web inspector, look at the sources, modify the sources locally, try to see what the network requests are, try to debug it. There's so much stuff available in the debugging interfaces for the browser. It's a pretty powerful platform. Yeah. And I think there's so much stuff available in the debugging interfaces for the browser. It's a pretty powerful platform. Yeah. And I think there's generally this technique of HTTP rendering optimization
Starting point is 00:55:10 is based and built into all these browser tools where it's trying to get as many resources to come in upfront as quickly as possible to get a fast rendering to happen. You know, the blink of an eye, a human eye is like 100 or 200 milliseconds. So it's all about how do I get something rendered and up within that time period, and then going through all these tools that people have created in order to try to make that happen. And web debugging tools are like some of the best application development tooling that I've seen out there. Absolutely. Yeah, then there's plenty we didn't cover in this question. And we'd love to hear from folks out there what their favorite URL to eyeballs process trivia might be. I think it'd be really interesting to hear back from people about parts of the process that we didn't cover.
Starting point is 00:55:54 Yeah, that'd be great. We took a while to record this episode number two. I hope we take less time to record the next one. But if people have questions, answers, comments, things they want us to talk about errata they want to suggest you know hit us up on twitter uh tlb hit is our usual twitter handle otherwise uh you know send us email or whatever i think we have an email address i don't know how email works um oh yeah that's a whole other episode how does an email work it is it is well it's the logical place where every program eventually finishes right right? That's right. Every program has to grow until it serves email. Exactly. All right. Well, thanks, everyone, for listening. Chris,
Starting point is 00:56:30 thanks for being my co-host as always. Yep. Catch you later, JF. Thanks.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.