Two's Complement - The Rabbit Was Always There

Starting point is 00:00:00 I'm Matt Godbolt. And I'm Ben Rady. And this is Toos Compliment, a programming podcast. Hey, Ben. Hey, Matt. I can't believe we're laughing already. Ben, very, very, I don't know what the right word would be, but like you bent out. Auspiciously.

Starting point is 00:00:29 That's not even, though. That's not the right word. But you took your microphone and kind of bent it out of the way. So we could have a nice talk. And your headset microphone wasn't the right way. isn't in the way. Oh, gosh. It's too early in the week to be recording a podcast,

Starting point is 00:00:44 and yet here we are. You mean Friday? Yeah, I guess. Oh, no, now everyone knows when we record this. There's not a lot of time left, my man. That's true. Somehow we're in May as well, which this podcast will go out in May as well,

Starting point is 00:00:59 because it has to. Yeah, right. Otherwise, we're in trouble. We've not got any in the bank. Yeah, so what are we talking about today, Ben? I thought we could do at least one podcast, and maybe we'll do this again, on kind of like how computers actually work. I love it.

Starting point is 00:01:19 I think a way in which to cover the extremely broad topic of how computers actually work is to look at some of the maybe non-intuitive behaviors that they have. Okay. Things that people kind of take for granted is sort of like an abstraction that you rely on. And, you know, peel back the covers a little bit and explore how it works. And maybe even, if we get to it, some, like, non-obvious ways in which it breaks, right? And, like, when it doesn't work, what is actually happening? Because I think that sort of understanding the failure modes of these things is a great way to understand what's actually going on.

Starting point is 00:02:02 So that's my general idea. Okay. So we've got like this sort of general idea of the thing. theme of how computers really well, which obviously is very close to my heart as well. So this sounds brilliant. And then we were discussing some things before we started and, you know, shooting around ideas. And I sort of suggested the, there used to be like an old Google interview question, which was what happens, you know, you type Google.com into a browser, you hit enter, what happens. And, you know, I always like that because, you know, there's almost any number of

Starting point is 00:02:32 ways that you can go. But how about we simplify that bit? What happens if I have curl rather than the whole web browser there. And I curl some simple URL, right? Not JavaScript or anything. Well, I guess it doesn't matter. But yeah. I mean, it could be JavaScript. I guess so.

Starting point is 00:02:48 You're fetching a document. It doesn't even matter at that point because I'm just printed out in front of you. Yeah. So, yeah, should we try something like that? What happens if we curled Google.com? In fact, I did this only earlier today as I was playing around with sandboxes. Perfect. And I curled exactly HTTPS, Google.com.

Starting point is 00:03:05 And I know exactly what it returns. Can I offer a guess before you tell me the answer? You can offer a way. Yes. Does it give you a redirect? Something in the HTTP 300 range would be my guess. It does. It tells me the document has moved, and it tells me the document has moved to www.g Google.com.

Starting point is 00:03:25 Yes. Which was like, oh, really? Yeah, I guess, because I was very lazy. Yeah. So how does it tell you that? Well, in this instance, the curl told me just a document. It printed out a document in front of me that said 300 or whatever it was moved, document moved, and it had a tiny little bit of HTML that came to me because I didn't have, you know, dash d, dash or whatever the thing is to print out all the headers and stuff like that.

Starting point is 00:03:50 Okay, okay. So you didn't see, so this is going to be my question is my understanding of this is whenever you request a document over HTTP or HGPS, you're getting two things back. You're getting the headers and you're getting the document, the document body, right? And so when it was redirecting you, all you were seeing there was the document body, which was some HTML. Is that correct? A very small piece of HTML, which you would normally never see. I know browser would render that, really. Mm-hmm. Because the real data was in the headers, right?

Starting point is 00:04:26 Yeah. Okay. So then when you do that, and I'm just kind of going through, I'm trying to like answer the interview question here a little bit. That's right. Yeah. So when you, or maybe I'm interviewing you. I don't know which way around this is anymore because I'm going to start with. I thought it was going to go the other way and now we've seen to have flipped it.

Starting point is 00:04:43 Yeah. So, I don't know. I don't know who's who's. I mean, you know, it's a collaboration. What even job is this? I know what is the. What are the benefits? We don't have jobs anymore.

Starting point is 00:04:50 They're all taken by robots. Oh yeah. That's true. So, uh, when you make this request, you're going to get back headers and you're going to get back the body. Like how is the request delivered and how exactly is the body. returned. Like what protocol is it using to do that? Right. Right. Well, I mean, I know that the particular command I was using is absolutely HTTP slash 1.1. And I mean, this is one of those cool things. I know

Starting point is 00:05:19 maybe this is showing our age a little bit or certainly my age. I think I've got a few years on you. But at university in the late 90s, in the mid 90s, darn, you know, learning. that you could tell net to a port on a machine and just type stuff in and get them to do things. So like, you know, the most happy day was to discover how to send emails from other people. I don't recommend that folks try this anymore. I don't think it works anymore. But you used to be able to tell net to the mail port and give it just raw commands. And it would be like very trustingly say, sure, that you're at godatuniverse.com, fine.

Starting point is 00:06:02 Right. Send your emails, please. Right. So you kind of got this idea that everything was sort of these text-based thing. And I remember this early web thing. And this was HTTP 1. So you would just literally tell net to port 80 of some machine and type get space slash and hit enter. And then you'd get the document back. No headers, no nothing. It was just that was that was the whole thing. But nowadays, HTTP 1.1, we have to tell it more information. So you are going to connect to a TCP port on the machine, usually port 80. Sorry, 443. nowadays. Everything's HTTP. So the purposes of talking about the protocol, at least this level of protocol, HCTP, let's ignore SSL for now. Let's see if we'll get to it in time.

Starting point is 00:06:47 Right. I don't know. In his most recent video. Correct. And I'm so sorry. This is my family coming home in the background. We thought we'd beat them, but no. So my dog is announcing their arrival.

Starting point is 00:07:01 Much as you announce your arrival at the, the HTTP server by asking it. Whoa, segue there. So get, it's still a verb like get or put or post and most of these things would be like get. Then the URL, rather the

Starting point is 00:07:17 document locator on that machine itself, and I forget the exact terminology for it, but that's the bit after the host name. Yeah. But before the hash, if there's one in your URL. So like in Google, it will probably just be slash forward slash if it was your Google because that's the

Starting point is 00:07:33 anything you've got if you're going to Google.com. Then you put a space and then you tell it that it's HTTP slash 1.1, which tells it, hey, I'm talking to you in the new way, the new way. I mean, there's quick and there's HTTP too and other things. But for simplicity, let's stick with this. Then you're expected to send a bunch of headers. Each header is a bunch of asky bytes followed by a colon, followed by a space, I think. Maybe the space is optional.

Starting point is 00:08:00 And then some other stuff in the new line. Yeah. So it's pretty human readable as these things go. And the one thing that you will be most required, most required, I don't know if it can be more or less required. Most requirement. The highest requirement is to put host colon and say actually what host, again, my dog in the background, you'll have to apologize.

Starting point is 00:08:22 But what host I'm actually requesting from? Because this was like the big unlocker, I think, in like the early orts, whenever this thing came out, which allowed you to have more than one website per IP address. Right. Because if you just do get slash, you're like, well, which website did you want? I don't know, right? It's whatever.

Starting point is 00:08:42 But that was the way to solve it. You know, maybe it would have made more sense to do get HTTP colon, blah, and just give it the whole URL. I say you work it out. Maybe. Anyway. But host colon, google.com is what we would have said. And then, I mean, from my own personal experience of telnetting to Google.com to see how it works,

Starting point is 00:09:00 I know that all you need to do to get it to work at all is do get slash ATP 1.1, host colon, google.com return, and then a blank line that says, I'm done with the headers now. And then you get the document that says, no, you've, you meant, surely you meant www.w.gogle.com. Yeah. Right. Nevertheless, connectivity is there. And then it will close the connection, I believe. So that's, that's what I think would happen in a, I'm doing it manually myself with Netcat or Telnet or something of that nature. what am I missing? What's next?

Starting point is 00:09:34 No, that's good. And I mean, I love doing that exercise when kind of explaining HTTP to people because it takes the magic out of it. Now you introduce a whole lot more magic when you add in like TLS and a bunch of web sockets and other things like that. A fantastic quote by one of our friends. It was to do with optimization as it happens. But you mentioned magic and I wanted to like bend it to this, which is that he said, you know, like, HTTP is like magic and you're like, well, it isn't. It's just like a bunch of like Aski being set over.

Starting point is 00:10:05 Yeah. And that is like magic because if you actually show people how it's done, it's kind of disappointing afterwards. You're like, oh, it's just, oh, I see. Yeah, there's just two of them. The rabbit was already in the hat. That was right. The whole time.

Starting point is 00:10:20 It's not special. It's not a special hat. It's not a special rabbit. It's just the rabbit was in there the whole time. Exactly. But I love that. feeling of like magic is that. That's what magic means, really. It's disappointing when you know how it's done actually. Right. But anyway, not disappointing because now you know how it's done, you can go,

Starting point is 00:10:40 oh, cool, this is something I can build on and I understand, I can debug and so on. But this is obviously not curl. This is just me doing it manually and you do it manually. Right. So when curl is doing this, in the base case, when it just works as expected, it is, yes. Not that kind of base. It is, it is, opening a TCP connection to whatever IP address was resolved from the host name that you gave it. Yes. It is sending those, that initial request with the HTTP verb, as it's sometimes called, to get the post, the put, whatever it is. It's sending all the headers that it might send.

Starting point is 00:11:18 And Curle sends a lot of headers on your behalf, right? You can tell it to send specific headers, but you can also not, I just want this document, please, and it will add in whatever headers it thinks are appropriate, including a user agent. User agent, yeah. Hey, I'm curl this version and stuff like that. And what type of documents they'll accept, I think, is a list of things. And then potentially something to do with the connection in terms of closing the connection or keeping the connection open for further requests, which, you know, yeah.

Starting point is 00:11:50 One of the side note here, one of the interesting ways that you can detect which client is actually connecting to you because, you know, the user agent is supposed to say, but obviously that's like kind of voluntary. You can put whatever you want in there. But one of the interesting things that you can do is you can look at the order of the headers because the order of the headers is not specified in the protocol. So you can be whatever you want it to be. But it's a fingerprint for the client code that connected. Yes. Certain clients tend to put the headers in certain orders.

Starting point is 00:12:24 and depending on what client you're using, that can be an indication of what it actually is. Oh, my, that's sneaky. Yes. Ben, you were a sneaky man. What is the sort of like fingerprinting based on the fact that, yeah, well, you know, curl always sends them in this order, and Chrome's Fetcher does this, and Firefox does this.

Starting point is 00:12:44 So it doesn't matter how you configure it, it's, you know, to say like lie, user agent liar, it's like, yeah, but we know your Chrome really. Yeah, yeah. Similarly with like, you know, as you said, like spaces in between the, headers and the values. Right. Optional, you can do it either way, and that creates. Also, the casing on the values is another thing that can create sort of a fingerprint here.

Starting point is 00:13:06 So curl is choosing to do it the way that curl does it. And even if you change the user agent to be like, oh, no, actually on Firefox, it's possible that somebody else might be able to figure that out. But the header order is not actually really, and I might be, I might be lying here, but I think it is specifically not important. I think it is unspecified which sequence they're in. Yeah, I think so. But yeah, dear reader, listener, sorry, not reader.

Starting point is 00:13:34 Gosh, again, it's, it's been a week. You know, check what we're saying here, but this is just two old farts comparing notes on what they remember about how this works. Right, and well, and as you were saying before, oftentimes web servers are very tolerance of any kind of, you know, sort of garbage they may receive. and I think you said that you literally were just doing get slash on Google earlier today with no protocol.

Starting point is 00:14:01 No, this one had a, yeah, no, this one did have get slash. No, this was still using curl. I didn't actually turn it today. But like I think nowadays, if you do get space slash, it will tell you to go away because that's HTTP 0.9 or whatever the one that didn't even have the version. Yeah, the unversioning version. Yeah, right. Yeah, exactly. So, yeah.

Starting point is 00:14:23 So those headers, you're sending those headers, whether you know Curl is doing it for you or not. You can obviously be explicit about it. But if you don't do anything, it'll do it for you. And then depending on the type of verb you choose, the request itself may also have a body. Oh, yes. That's a good point. And this is particularly important for put and post and patch. because another one I'm thinking of.

Starting point is 00:14:54 And I'll also do it with Get. Oh. You can, I think I'm pretty sure that you can. I think one of the headers, one of the header fields is content length. Is that right? Is that the thing that gates this? And then you tell it how much you're going to be sending it or receiving, you know, certainly the response has a content length that says, you know,

Starting point is 00:15:13 this is how long it's going to be or it can. But if you're going to do anything that requires you uploading, you need to tell it where the end of your thing is, your data. Which is why, by the. the way, it is particularly difficult, if not impossible, to stream the body of a request. Like, if you have some input stream that you're reading from, and you're like, I don't know how many bytes are in this stream, I'm just, you know, reading it. By the time you're writing the body, you should have already written the header that says

Starting point is 00:15:44 the length. Yes. So you have to consume the whole stream in order to figure out how many bytes you're going to send. which is... You can put that in the header. I mean, it's fine if you're serving static content off of a disk where you're like streaming it from disk

Starting point is 00:15:57 and you can go, how long this file and it's immutable, it's not going to change on me. Okay, it's two terabytes. Cool. Content length, two terabytes, here it comes. But yeah, if it's being streamed by a process, for some reason, it's like outputing, you know, you're catching a file or whatever

Starting point is 00:16:12 or something ungodly, don't do this. Or another socket that you're reading from, right? Like you're trying to act as a proxy, perhaps. Then you don't know ahead of time how long the thing is going to be to write it into the header because you've already sent it. Anyone who's written web server software

Starting point is 00:16:26 has almost certainly had some kind of exception thrown when they've tried to do something silly in the body handler where it's like, no, you can't do that because the header has already been sent. Like you know, this is, I forget, I've definitely hit it with, oh, craiky, I can't think of the name. But in some cases where you're trying to do something, it's like, no, look, I've already sent the header.

Starting point is 00:16:48 I can't now mutate this. I can't like turn it into a streaming request. I can't turn it into a different size request because we've committed. We're done now. You just have to send text now. Sorry about that. Yeah, there is multi-part, though,

Starting point is 00:17:01 which is a pretty crazy hack. Right, right. Multi-part uploads. If you've ever used, like, an object store that supports HTTP, like Amazon S3 and other ones like this, you wind up sometimes doing multi-part uploads because otherwise, if you have a lot of data to upload, and something goes wrong on the 99th percentile byte of your 5 terabyte file,

Starting point is 00:17:28 you got to do it all over again, which is very inconvenient. I think there might also be, for S3, now that I say it out loud, there might also be another portion of the S3 protocol that allows you to do. I was going to say, I think those are separable because so many S3 lets you configure, like, where do the slices of a multi-part upload? go and how long do they live before I delete them away. But then there's multipart which is more like you say, it's almost like a streaming protocol where you say,

Starting point is 00:18:00 hey, I'm going to keep sending you data. Here is the delimiter and you come up with some string that you hope that God doesn't appear in your actual data. Right. And then you say, this is the thing that tells us that we've done this chunk and we're going to give you some more headers, I think. I think that's how that works.

Starting point is 00:18:13 Now, I'm very much in dodgy territory here. I've looked at it with websockets before. or there's some kind of thing that you have to do to get it to during the handshake, at least of WebSockets as they were in the 2010s. Yeah, yeah. 15 years ago. Okay, so exploring other areas where you might do this, and we've explained this simple model of the world.

Starting point is 00:18:32 We're sending text or receiving text. Everything's fine. And you sit down with Telnet, and you're like, I'm going to do what Ben and Matt said. We didn't really, we didn't see. So, yeah, the curl, for example, or if you tell it directly, you will see the headers of the response. But curl was hiding them from me because I just did curl the end.

Starting point is 00:18:48 So typically you'll see something along the line of, first of all, something which says 200 OK. Yep. Yeah, the response thing that says like your get verb and whatever, you get a response document that starts with the, what was the code? And we all have seen the codes. The two XX ones are like, everything's fine and they have a slightly different flavors. But usually 200 OK is one that you'll see. I think there's a two, one of the two XX ones is like, okay, but no data is like just yes, this is this is fine. but there's nothing to come, which you see on very occasionally.

Starting point is 00:19:22 Three, three XX ones are like exceptional case type stuff. Is that right? What is the definition of three X? Because they're like moved and stuff like that. Yeah, like there's temporarily moved, permanently moved. I mean, it's most of the, I think all the redirects are in 300. There might be other things in 300. That's the kind of edge casey thing where they might be.

Starting point is 00:19:41 And then certainly 400, four XXs are errors of sorts. The world famous 404, which, uh, You probably know if you're not a programmer. Yeah. But there's the one that Twitter proposed, and I think I actually got introduced to the standard 420 enhance your com for rate limiting responses. Oh, no, I don't know. To the reference to the movie Demolition Man. Oh, that's cool.

Starting point is 00:20:07 I know there's, I'm a teapot is one of the things which you do, which is an error code that a coffee, a tea machine may return if it is asked to serve coffee or something like that. Exactly. And then obviously if something breaks in the server itself and the server wants to let you know that something hideous went wrong internally, it'll give you a 5x. Well, there are one zero XX things? Are those like continuations, I think, for stuff like multipart? Oh no.

Starting point is 00:20:34 See, I'm trying to do this without Googling because I have the loudest mechanical people. Everything you're hearing, dear listener, is coming off the top of our heads and possibly other parts of our anatomy, depending on how accurate they are. That's right. So, anyway, and then after that, you get a sequence, sorry, of headers that look like the ones that you sent up.

Starting point is 00:20:55 So it's, you know, something colon, something else. And then there'll be two new lines and then off you go with the data, the response. So that's, it's pretty cool. It is. It is. And so, this is perfect, actually. This is perfect. So coming back to what I was saying before about, like, okay, I'm going to do what Ben and Matt said.

Starting point is 00:21:14 I'm going to sit down and I'm going to type these things out and tell net and I'm going to get back some response. And when I do that, all I see is a bunch of gobbledy gook. I don't see any text. I don't see any HTML. What is going on? One of the things that might be going on is that the server has compressed the response. And it will tell you that it has done that. And it will tell you which compression algorithm has used to do that in the headers.

Starting point is 00:21:40 So you'll see a compression header. And then you will see the header will be the, the header will be the, The text-based headers are still there, just to be clear, right? And then there'll be a new line, and then you'll see garbage, which, although a well-configured server ought not to do that unless you said, I would accept a compressing. But nevertheless, this is exactly the kind of thing. To your point at the beginning of this, is like, when something goes wrong, I'm getting an endpoint that just assumes every client can support a compression algorithm. Right. Which I believe you're about to go and talk about.

Starting point is 00:22:10 So I don't want to keep doing it. Well, yeah, no. And that's another thing that curl is, again, peeling back the abstraction here, that's another thing that Curl is doing for you, right? It's reading that header. It's applying the correct compression. I think you might need to tell it.

Starting point is 00:22:24 I can't remember if, like, if you just, can you disable it? Can you tell it not to do that? I don't remember either. I know I've had problems before because, and this is going to lead to one of my favorite observations or rather of mutual friends observation, You know, whether it's transparently decompressing because it says it's deflated and so curled is then going to say, well, I'll undeflated then because that's what it says that I should do.

Starting point is 00:22:52 Or whether, you know, that just means the fact that the compression is, whether the compression is there or not is now opaque to you. You don't know if it's happening or not, which is the kind of dichotomy of transparency means that now it's impossible to see whether it's actually happening or not, you know. So our friend coined the term, which I introduced a friend of somebody to at work yesterday, TransPake, which is like that kind of combination of like, well, it's like you're meant to not know it's there. But if it's if it is there, now you can't tell if it's there or not. And maybe it's broken and now you're just being gas lit by your system. Right. You know, I think we had this with at least one of the things at work where a vendor was sending GZIP's data.

Starting point is 00:23:35 but whatever except header we were getting was then unG zipping it for us, even though we'd ask for, you know, like get bob.g.z, or get, you know, this day.gz. And then the library was like, oh, cool, I'll un-gzit that for you.

Starting point is 00:23:49 And then it was storing it as, and we were writing it to disc as, you know, bob.g. And you're like, why is everything so huge? Oh, wait, everything's called. gZ because that's what we asked for, but something transparently, again, transpakly,

Starting point is 00:24:01 transpakly, un-g-zipped it for us, being helpful, thinking it was being helpful and it wasn't, right? Anyway, but yes, it could be deflate compressed. I think, or what are the other ones that are out there? The Z standard, I think, is that standardized? I don't know, but deflate's the one that, you know, everyone uses, which is to say GZip.

Starting point is 00:24:22 Pretty much to a first approximation. Deflate is the algorithm that powers GZip, and then there's a little tiny bit of a header associated with an actual GZP file, which I think is slightly different, and a CRC. So that's definitely something that can go wrong. Yeah, yeah. Do we want to try to get into TLS and compression and HDPS? Is that a bridge too far for this conversation?

Starting point is 00:24:48 That might be a bridge too far. I certainly don't know enough about it. So what we've been talking about so far, let's just, let's graze the topic and see if we feel comfortable with this. And then our listener can roll their eyes at how wrong we are. but what we've been talking about is like the HTTP protocol and in HTTPS

Starting point is 00:25:06 there's another layer another sort of wrapping of the protocol where the a certain amount of well a certain amount some encryption is applied and authentication and certificate exchange

Starting point is 00:25:22 all that kind of stuff but once you've peeled that back you're back to HTTP again so like effectively from the point of view of the web server itself and from the web client, you can almost, almost ignore the fact that the tunnel that connects them is an encrypted tunnel as opposed to just a regular tenant.

Starting point is 00:25:38 So obviously you can't tell net to port 443 and type anything and expect it to happen, which is unfortunate. What you can expect, what will happen is if you're very lucky, you might see a lot of garbage bites being sent back to you and then the connection is closed. That's probably what you're going to see if you do anything at all if you tell it to port 443.

Starting point is 00:25:55 Something that surprised me the other day, is I naively assumed that you connect to Port 443, some kind of negotiation happens with a certificate and key exchange and a whole Diffy-Helman thing to like, you know, get a unique key on both sides. And then and only then do you start saying, hi, I'd like to get this, you know.

Starting point is 00:26:23 Right. I was going to say this exact same thing. Yeah. it turns out for all that for the same reason that we added the host colon to the sort of like the message body rather than just relying on the telnet, you know, the IP address that you're connecting to being unique for that website, we lose that if we encrypt it, which means that like, you know, the web server you're talking to. If it's, you know, Google.com is co-located with yahoo.com unlikely. but whose certificate should the server give you when you connect to it? You're like, ah, good question. I don't know, which means that you, the client, need to tell it first.

Starting point is 00:27:03 Yes. But how do you encrypt that so that no one can tell which site you're actually browsing? And the surprising answer is you don't. Yep. Which was like mind blown. Yep. Not as secure as you thought. All that information is being exposed to the interwebs.

Starting point is 00:27:21 Does this mean that all these. these adverts that I keep fast-forwarding through about VPNs. Ah, uh-huh. And dear listener, do not worry, this is not about to segue into an advert for one of those. This is not a way of announcing that finally, Surf Shark, whatever is. Sponsorship, yeah. No, no sponsorship. This is, Ben and I is fun.

Starting point is 00:27:40 We're not late turning into anything else. But no, but there is, that's the one argument I can see for that because I just assume, you know, like fine, HTTPS, everything. And it's, you know, no one need know what I'm doing. but it's, you know, you are actually, if you cracked out WireShark and visited a website, which maybe we should do live sometime, we need to work out a way of screen recording this or something.

Starting point is 00:28:02 Yeah, yeah, yeah. Because, yeah, that's, I think the first time you show somebody, WireShark and then you browsing or doing anything trivial on the network is the first time the scales fall. Suddenly it was for me about like, oh, my gosh, look how much stuff's going on on my network or on my computer. Oh, golly. But anyway, so Poppin's talking about you were going to say the same thing about this surprising.

Starting point is 00:28:25 Yeah, you should be very careful about assuming that the data that is in your request, the URL, including the query parameters, the headers that you are either adding yourself or being added on your behalf by the tools that you're using like curl are actually secure because they're almost certainly not in many, many cases. So when you make that request... I was thinking it was only the host that was the thing that was leaked, just purely so the correct certificate could be served back to you, but I may be wrong. So I think that it's... So my understanding of this is that you're doing this sort of like escalation request. You're like, I'd like to escalate to this other protocol. And I would not be very confident in telling anybody that I knew for certain that in all cases,

Starting point is 00:29:12 it could only be these things, right? No, that's fair. Yeah, it's one of those things where it's like, if it's important to you that that information is encrypted, then you should check probably using Wireshark or something else like that. I've just fired it up. I couldn't hurt. I couldn't help myself.

Starting point is 00:29:28 Oh, I need to pseudo make me a sandwich, of course, because wire shark. Yeah, okay, given that I have to escalate, I'm not going to do that now. Otherwise, you are going to hear my long password rattling out. But no, that's definitely an exercise for the reader. And I'm sure hopefully people will comment and tell us where we're going wrong here. But again, this is from the top of our heads. So, yeah. No. So, okay, I think that's probably the most we should do on the, on the encryption,

Starting point is 00:29:53 because it's clear that we've reached the limit of our, we've hit the layer of extraction. I mean, there's definitely some things to do with certificates and things, you know, the little lock that you see. And I'm not going to pretend to know nearly enough about that to explain it to the world. Well, I mean, there is like a chain of trust, isn't there? We can probably talk a little about that. Yeah, that's true. Certificate authorities. Yeah, so the browser and curl or whatever and your operating system, maybe.

Starting point is 00:30:17 provide a list of certificate authorities, which are like trusted, trusted third parties that have said, we will sign other people's certificate to say it's definitely them, trust me, bro. And then as long as you trust the person who signed the certificate, you can trust, Google.com is in fact, Google.com because it's signed by very trust or whoever the heck actually signs Google's. Google probably are a CA, aren't they? Yeah, probably at this point. And that's kind of how you know that you are actually talking to the correct site,

Starting point is 00:30:47 the authentication part of HTTPS. And no is got air quotes here because obviously, it really does depend how much you trust the people who sign these certificates to say it really is who they say that it is. And if you look at the number of CAs that there are inside your browser, if you go and look at the list of them, there's like a bewildering number of them, including some to me, things that would raise an eyebrow.

Starting point is 00:31:11 And I'm like, oh, apparently we trust this company. I don't know who they are. and it doesn't look very trustworthy to me. Right, right, right. Similarly, in a corporate environment, you might discover reasonably your own companies, Route CA is in that list, because then you can sign,

Starting point is 00:31:30 first of all of your internal applications with your own certificate that doesn't have to go out to the outside. Well, it doesn't have to be signed by various sign. It means that you can go to, you know, Ben's cool app. Dot internal.com, and you can get a certificate for that, and your browser won't go, wait a second,

Starting point is 00:31:47 this is just self-signed, that's no good. You know, it can be signed by, you know, Ben's cool company.com, which is then in there. But it also allows potential for, like, security scanning software to decrypt by man in the middle, in your own, you know, on your own network to be able to check that things aren't going outside, which is really important in a,

Starting point is 00:32:08 certainly in a large corporate environment. I don't know that it happens in any, like, no, no, I mean, that's definitely the thing. I mean, again, like, you know, The theme of this being like, you know, breaking people's expectations around abstractions and things like that. Like, it's probably been told to you in indirect ways by people at your company if they're doing this. But maybe not in as direct ways as we're reading your personal email if you check your personal email at work. Right.

Starting point is 00:32:36 Like the little security lock in the browser that you're running. It's not as strong as you think it is because if you're using your company. I mean, obviously, if you're on your personal device, on the guest Wi-Fi of your company, whatever, fine. And that's a very good reason to do that, right? But yes, if you are, there's, most companies have at least a clause that says something like, there's no expectation of privacy, which is reasonable, right?

Starting point is 00:32:56 You're at work, you're doing work things. Yeah, you're on a work. Reasonable people could have disagreements about this. I get it, I get it. I get it. But like, that is the way in which they can potentially look at the encrypted data is that they can say, well, any website you go and get, if you go to, you know, secure.com,

Starting point is 00:33:13 I will say, yes, sure, I'm secure.com, and I can prove I am because I'm signed by this certificate, which, by the way, your company-given security system says, I can sign anything I damn well like, and that's fine. And that's, yeah, so yeah, just knowing that that exists is all you need to do. And you can, it's an exercise to the listener to go and see whether that's something that's happening to them or not. And, I mean, it's a, yeah, it's a fairly standard thing. Anyway, popping the stack all the way back, right?

Starting point is 00:33:41 So we said, what about curl, right? You said, right, we're curling this endpoint. What else, is there anything we've missed? Let's just think about this. Well, I got a good one. Okay. So let's say that what you see would you do this request, and maybe you're trying to drop down a level here

Starting point is 00:33:58 and you're going to use like the tunnel net approach, is that the, you know, you make your, you connect to the site, you type in your headers, you get back your response headers, and then you get some of the body. and then it stops. Ah, yeah, yeah, yeah. What happened? What is happening?

Starting point is 00:34:21 Like, how do you even know that it's not done? What is the mechanism for that? I thought it correctly. It's like two new lines is the fancy technology. I think that's the high, to tell you that the body of the document is complete, is that we're going to do a new line, and then we're going to do another new line,

Starting point is 00:34:38 and that's how you know that it's done. But let's say that you don't see those two new lines, and it just stops. Right. What is Curl doing in that situation? That's an excellent question. I don't actually know the answer to that. Yeah, I don't know.

Starting point is 00:34:55 Because my instinct, if this was an interview question, you, Ben, Rady, were asking me what's going on here, I would be like, oh, this sounds like the remote end has decided to keep the connection open, but it has, in fact, finished the document. And I don't know if it puts something at the end of that. I don't think it does. Maybe that's one of those 100.

Starting point is 00:35:17 100 continue is like one of those things as it puts in. I don't know. But you could get something like that. Totally separately. And just pause that, I did in fact just turn it to Google.com and do get space slash enter. And it does, it doesn't even tell me that I'm at the wrong address because it doesn't know that I came to the wrong address.

Starting point is 00:35:34 It just gives me the whole, you know, minimized Google.com thing pukes out. So it still works with even, you know, HTTP 0.9. Okay. What happened then, but tell me the answer. What was going on with my own connection? So what might be happening? This could be a number of things, but one, the scenario in which you would see this, is that there was like an error, like a programming error on the server side where

Starting point is 00:35:57 like read a document and then encountered an error and for whatever reason didn't close the connection, right? Maybe it's waiting for, itself is waiting for data to show up that just isn't showing up. It's not going to come up or itself for trying to do something. stuck on some database query or streaming some data from another source or whatever it might be, and it's just stopped. Curl has a number of different timeouts that you can configure, and part of the reason for those timeouts is situations like this, because you don't want some process to just be stuck forever,

Starting point is 00:36:26 because it made some request that is now stuck forever. It has a connection timeout, which is more of a TCP-level thing. It is a TCP-level thing. It's like, hey, I want to connect to this IP address and send it some text so I can get my documents. if it never establishes the TCP connection, or it might take some time to establish a TCP connection, there's a timeout that you can control for that. I believe there is also a response timeout to say, like,

Starting point is 00:36:53 if I start getting data and then I stop getting data, just whacked my microphone. That's fine. I'm sure that will be edited away. It won't be edited away. And you stop getting data, and then you continue to not get any data. day. How long should I wait before I just give up? And it's important to know, like with all of

Starting point is 00:37:17 these protocols, we think of them as almost like we use it often enough. You almost think of it as being like this synchronous thing, almost like an atomic thing where it's like, right, right, right. And we get the response and it all works. And it is entirely possible that you get some, but not all of the response. It is entirely possible that you get halfway through sending the request and then the connection just dies. It's entirely possible that you, start sending the request and depending on exactly what is happening on your network, it may not be clear how much of it was actually sent. Like, did you, I'm doing this like, put request or delete request, right? Like, did you actually delete it on the other side?

Starting point is 00:37:58 Right. I'm not sure. That is, yeah, the, the sort of like, what's the name of it, the Byzantine Generals problem or whatever way. Yeah. I send a thing to you that says, did you delete it? And you got back, but I didn't hear your response. You're like, well, did you then? I don't know. I don't know because there's any number of acts on both sides that you can never be 100% sure. Exactly, exactly. So that is definitely another place where this sort of abstraction can break down where it's like, well, I did the delete request and it failed. So clearly it's not deleted.

Starting point is 00:38:26 Five different timeouts. I've just been looking through the man page of curl. Five different termouts, termouts, timeouts for different conditions. You know, like how often, how long do you wait before a connection, as you said, overall timeout for the entire request. the first response and then subsequent chunks of data, that kind of feel about it. And then some other things do with IPV4 and IPV6, if you're trying to do both at the same time

Starting point is 00:38:49 and get the first one to respond kind of fields. Some really clever things like that. But yeah, that's, and a lot of these things are sort of, yeah, as you say, we think of these things as atomic. We think about, you know, like, you know, everyone talks about restful APIs and you're like, okay, it's like a magic RPC that I just do and it happens or it doesn't happen and I'm done.

Starting point is 00:39:07 You're like, no, you could see an error on your side, that says the connection never made got, you know, the request never went through, but it did in fact make it through. It was just the response didn't get to you and the TCP was torn down, but it did in fact take effect on the remote end, and now you're stuck. You didn't get back, you know,

Starting point is 00:39:24 the thing that you, the response. So it's a tricky world out there, but yeah. Yeah. No, there's many timeouts. Oh my gosh. You said there's five timeouts? I think so. I've got connect timeout, expect 100 timeout so maximum time in seconds you wait for curl for a 100 continue oh good I wasn't completely um that up uh see also connect on happy eyeballs time out that's hilarious to me

Starting point is 00:39:52 happy eyeballs yeah i'm going to just read this to you because it's joyous happy eyeballs with a capital h and a capital e is an algorithm that attempts to connect to both IPV4 and IPV6 addresses for dual stack hosts giving IPV4 v6 a head star of the specific number of milliseconds. So it's like a way of, you know, preferring IPV6, but, but, but,

Starting point is 00:40:19 clearly. Oh, man. So yeah, we've got C or so, and max timeout, retry max time. So there's an overall max time. And then there's a retry maximum time, which is like the number of times including, yeah, retail time is reset before the first friend. It's complicated. But, I mean,

Starting point is 00:40:38 Yeah, and we're back. What happened there was a terrible break in the abstraction layer because my USB hub crashed and put us into a very bad state. And we've just about recovered it. And I can't remember what on earth we were talking. No idea. No idea at all. It's like it was a different dimension. And if only, I mean, if only we had some kind of permanent record of what we'd said,

Starting point is 00:41:06 but sadly, neither you nor, you said you could download the MP3s okay. and it was working. So at least we got that on our side. I did do that. Okay, right. So we've got the previous part. So, dear listener, we apologize. Whatever we were talking about before has slipped our minds.

Starting point is 00:41:20 But we didn't want to leave you in the lurch or more importantly, we don't want to leave editing Matt with a job of like, how do you finish an episode? Right. Clearly something that went wrong. But I think there's another abstraction layer. We can talk about that another time. It's like how on earth our devices talk to each other in a sort of day-to-day environment. Yeah, USB actually would be kind of like a cool one.

Starting point is 00:41:40 USBC and DVI and yeah all those kind of magical I mean it's a miracle that it works as well as it does frankly yeah but yeah we were we were looking at curl and I think we were just about looked through all the timeouts that's what I'd

Starting point is 00:41:56 recall yeah well the happy eyeballs happy eyeballs for IPV6 IPB4 which is crazy I saw someone to put an IPV8 proposal out there which I think is somewhat tongue in cheek but also I read through it and was like this isn't as stupid as it looks maybe this is real And they skipped seven because we skipped five? I guess. I think it uses like a 64-bit IP address rather than the full crazy IPV-6, you know, I don't know, 256 bit, 120, whatever bit size that is. Where it's like, hey, all this information about the host, you know, like, I don't want to leak that outside my network.

Starting point is 00:42:30 Thank you very much, right? So it was a, and it sort of sort of backwards compatible with IPV-4 because it's essentially all existing IPV-4s are nought. dot nought, nought, nought, dot, the rest. You're like, oh, that makes sense. Maybe that's got legs. Anyway, that's really off topic, and that's about as much as I know about it. My friend, I don't know if there's anything that we can recover from where we were going. So I think we should probably just call it at this point here.

Starting point is 00:42:58 Having introduced the world to transpacity, I feel like, you know, we've at least achieved something. Transpacity. I guess it would be, wouldn't it? Yes, it is. Capacity, transparency. We'll have to ask our friend and see if maybe he'll join us on the podcast sometime to talk about it. But it's definitely a thing. All right, mate.

Starting point is 00:43:21 Well, it's been a journey today. I will chat to you another time. Sounds good. You've been listening to Tooscomplement, a programming podcast by Ben Radie and Matt Godbolt. Find the show transcript and notes at www. contact us on Mastodon. We are at 2's complement at hackyderm.io. Our theme music is by Inverse Phase. Find out more at InversePhase.com.

Two's Complement - The Rabbit Was Always There

Ben asks what happens when you curl google.com, and Matt peels back HTTP until the rabbit turns out to have been in the hat all along. Then a USB hub stages a dramatic intervention....

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.