Site-wide Ad

Premium site-wide advertising space

Monthly Rate: $1500
Exist Ad Preview

Podcast Page Sponsor Ad

Display ad placement on specific high-traffic podcast pages and episode pages

Monthly Rate: $50 - $5000
Exist Ad Preview

Tech Over Tea - Anubis The Saviour Of FOSS Websites | Xe

Episode Date: May 9, 2025

Today we have the developer of Anubis a tool that has taken the FOSS world by storm, protecting against the growing danger of AI scrapers we're seeing take down major FOSS websites==========Support Th...e Channel==========► Patreon: https://www.patreon.com/brodierobertson► Paypal: https://www.paypal.me/BrodieRobertsonVideo► Amazon USA: https://amzn.to/3d5gykF► Other Methods: https://cointr.ee/brodierobertson==========Guest Links==========Website: https://anubis.techaro.lol/Github: https://github.com/TecharoHQ/anubisBluesky: https://bsky.app/profile/did:plc:e5nncb3dr5thdkjir5cfaqfeBluesky: https://bsky.app/profile/techaro.lol==========Support The Show==========► Patreon: https://www.patreon.com/brodierobertson► Paypal: https://www.paypal.me/BrodieRobertsonVideo► Amazon USA: https://amzn.to/3d5gykF► Other Methods: https://cointr.ee/brodierobertson=========Video Platforms==========🎥 YouTube: https://www.youtube.com/channel/UCBq5p-xOla8xhnrbhu8AIAg=========Audio Release=========🎵 RSS: https://anchor.fm/s/149fd51c/podcast/rss🎵 Apple Podcast:https://podcasts.apple.com/us/podcast/tech-over-tea/id1501727953🎵 Spotify: https://open.spotify.com/show/3IfFpfzlLo7OPsEnl4gbdM🎵 Google Podcast: https://www.google.com/podcasts?feed=aHR0cHM6Ly9hbmNob3IuZm0vcy8xNDlmZDUxYy9wb2RjYXN0L3Jzcw==🎵 Anchor: https://anchor.fm/tech-over-tea==========Social Media==========🎤 Discord:https://discord.gg/PkMRVn9🐦 Twitter: https://twitter.com/TechOverTeaShow📷 Instagram: https://www.instagram.com/techovertea/🌐 Mastodon:https://mastodon.social/web/accounts/1093345==========Credits==========🎨 Channel Art:All my art has was created by Supercozmanhttps://twitter.com/Supercozmanhttps://www.instagram.com/supercozman_draws/DISCLOSURE: Wherever possible I use referral links, which means if you click one of the links in this video or description and make a purchase we may receive a small commission or other compensation.

Transcript
Discussion (0)
Starting point is 00:00:00 Good morning, good day, and good evening. I'm as always your host, Brody Robertson. And today, we have the developer project which about three months ago, nobody knew existed. And now is deployed on the LKML, a UNESCO site, at least one sub-domain, the ArchWiki, Gnome's GitLab, SourceHut. I'm sure there are other ones in here I probably could mention. Welcome, Zee, the developer of Anubis. How are you doing?
Starting point is 00:00:36 I'm doing pretty good. This is still very surreal, but. Oh, it is so surreal, you have no idea. I like I know you've talked a little bit about this, but... I guess, you know, no. Okay, we'll save that. Before we get into that, just for anyone who is completely unaware of what Anubis is, which I did find out is now the top search result on Google if you search Anubis GitHub
Starting point is 00:01:03 above all of the other projects also named Anubis. So we're getting somewhere. But for anyone who is unaware of the project, briefly explain what the project is. TLDR, it is the Cloudflare I'm under attack feature, but self-hostable and on servers you can look at. And instead of a captcha which, let's face it, I'm not smart enough to implement a proper CAPTCHA, and those are easy to game, and they pay humans to solve those.
Starting point is 00:01:33 So the CAPTCHA solving APIs are built into everything. It uses a proof of work scheme instead. And the proof of work is a hack, but it works enough. I'm not going to deny it's a hack. Like it seriously is a hack. I do want to talk more about that. But I guess let's just talk about first the sort of experience you've had with just out of nowhere
Starting point is 00:02:05 this project becoming really popular because you're saying before it was January that you made the project and shortly after that, like mid to late January, you made the first post about it. Yeah. Looking at my blogs, Git history, Looking at my blogs, Git history, I posted something about Amazon's crawler. The TLDR is in January, Amazon's crawler took out my Gitforge while I was trying to do something on it.
Starting point is 00:02:37 This is annoying. I tried to fight it. I didn't have much success. I posted something on Hacker News about it. I tried a couple things that I'm not going to admit on a recording, but they didn't work. And I got inspired by something I kind of remembered reading about email spam called Hashcash, where it used a proof of work scheme in order to protect upstream resources. And somehow I got on the Wikipedia page for weighing of souls,
Starting point is 00:03:13 and I figured, huh, Anubis would be a good metaphor. So that's how that started. So I'm not kidding. I just randomly found myself on that Wikipedia page. Sometimes you gotta like, you just gotta let the vibes take your places. Sure, sure. Why not? So the first place that I am aware of that deployed Anubis, at least the first notable place, was the GNOME GitLab. Was there something before that that I'm unaware of? Because that's certainly where most people first heard about it. If there is another place that it has it, I have not heard of it. The first notable one is the GNOME GitLab. And from what I've learned talking with the Sysadmin team, it was a Hail Mary. Right. It's like nothing else worked. What could we lose? So
Starting point is 00:04:08 you had their GitLab pod instantly scaled down to three from six. So you've had discussions since then. Were the discussions, like, did anyone reach out to you before they deployed it? Because, like, the documentation, you know, it's a very new project. It's been, even just since I put up a video, I've noticed a bunch of stuff has been added. So I'm not sure what the state of the documentation was like a month or so ago when it was deployed. When the first instance of it was deployed. The documentation was a single readme file that had a bunch of aspirations. This is how I got this working on my machines and other thoughts about how things could be implemented.
Starting point is 00:04:57 And apparently the Kubernetes example was good enough for the gnome sysadmin team to figure out. And yeah. That's how it started. And it's mostly just been a continuous process of figuring out how bad I am at writing and then figuring out how to make it better. So no one like, no one like messaged you about like, hey, is there anything that we're missing here? It was just like, oh, they just worked it out themselves. There's a couple of things like when big users get things set up, I just like a find their IRC channels and then ask them like, hey,
Starting point is 00:05:36 I noticed you're using Anubis. Thanks. Is there anything that sucked in the docs and I could make easier? And then I just take all that feedback and just fix it. Right, right. Fair enough, fair enough. Yeah, the Sourceware team has been very helpful for that. Sourceware for context is the upstream organization for small projects like GCC.
Starting point is 00:05:58 Ah, OK, yes. Very small projects. Very small projects. Very small projects. What was the line from Linus Torvalds when he made the kernel? It was something about like... It's not small and unprofessional. It was something along those lines. It's not small and unprofessional. It was something along those lines. It's famous email.
Starting point is 00:06:26 I'm doing a free operating system, just a hobby, won't be a big and professional like new. Yeah, yeah, exactly. Exactly. That's what GCC is at this point. No one's ever heard of it. No one's ever used GCC. Yeah.
Starting point is 00:06:40 So how does it- It's just playing into place now? So the good only lab they put out they they started using an uber's it's got a bit of attention there someone I People like wrote blog post about it talked about it. They noticed there was an anime girl on there like what's happening? Shortly after that a bunch of other places started deploying it very, very quickly. It's only been a couple of weeks since then. Yeah, I've been attempting to keep track in a documentation page in the Docs site. There's a message in the Patreon Discord that I keep editing with all the different places that I'm aware of.
Starting point is 00:07:28 I thought that I would be the only user of it ever, so I mostly created it with that in mind and aimed at me. And now I'm at the point where I'm getting contributions from functionally random people, and they're actually good. Like, this is a bar I never thought I would get. So, let me just go to the, where is it? Is it under... I have four posts in front of me. Want me to link them to you? I know I found the thing.
Starting point is 00:08:08 So right now, notable ones, obviously it's on like random people's like giddies and Git labs as well. We have the GNOME GitLab here, the Wine Bugzilla, the FreeBSD SVN, the Bugtrack FMpeg. Oh I need to update that, Git is FMPEG. Oh, he's updating that. The Git is now using it. Oh, okay, cool. Sourcehuts using it. Purisms Git is using it.
Starting point is 00:08:32 Enlightenments Git is using it. The ArchWiki is using it. Wait, Devon's Git's using it as well. Okay, that one's a new one. Yeah, Devon was just added earlier today. I got an email from the Devlon team saying, hey, thank you for making this. And I see how they use the regular anime girl branded version. I guess it's good enough for Devlon and not good enough for the Archwiki.
Starting point is 00:09:05 So what's it like being in this position? In my video I showed the XKCD on dependencies where it's like everything's held up by this random little pillar by just some random person in Nebraska. What is it like being the random pillar? Like that's where you're starting to become now. It is simultaneously hilarious, humbling, and... Oh, I need a third one because a triad is a nice pattern. It's like hilarious, humbling, and... I can't come up with a third one.
Starting point is 00:09:40 It's hilarious at some level. Like, I've've had I've been blogging for a couple years I have like I need to do the numbers again but it's somewhere between 530 and 560 articles if you include all the stuff I've written for work Wow okay that's for people that blog usually they're like I'm gonna write a couple of posts every like every couple of months. That's actually impressive. I am a professional writer at this point.
Starting point is 00:10:12 Like, that's basically what I do. I have written like a terrifying number of things. Let me just look at the sidebar for the work blog and one. Yeah, it's over like 50 percent of the things in the sidebar for the work blog and one. Yeah, it's over like 50% of the things in the sidebar for the work blog. I need to make link posts for those. Oh my goodness, there's just so many. I thought you, sorry, I thought there was something more
Starting point is 00:10:38 you wanted to say about the blogging. Oh, it's honestly having a blog and putting my opinions there has been hands down the best thing I have ever done for my career. Because if you have a certain level of notoriety and you have enough articles over enough time to prove that you actually know what you're talking about, you don't have to do tech interviews.
Starting point is 00:11:03 You don't have to do those horrible. You don't have to do like those horrible whiteboard screening leak code grinding interviews anymore. So it sort of acts as like a... It's basically a portfolio in a sense, just not a code portfolio. Yes, I also have my GitHub as my code portfolio and one of the more recent job interviews I've done has been the interviewer noticed that I was very prominent. They searched their work Slack and found out that they linked something I wrote in their work Slack. They linked me the article and then just asked me what I learned there and what I did.
Starting point is 00:11:41 And that was a really interesting discussion. Which article was it? I think it was like, oh, it was when I put my IRC client into Kubernetes. My IRC client runs with Kubernetes, that one. I assume it. Yeah. Yep.
Starting point is 00:12:00 Cool. Yep. Oh, that was such a blursed post. Cool. Yep. Oh, that was such a blurst post. Not blessed, not cursed, but something else blurst. So this blogging experience you've had has been like a good way to just, I guess, get your thoughts out on a lot of things. It just, I guess, build up, not just,
Starting point is 00:12:30 not just show that you know something, but when you put something out in a written form or a video form or just put it in a form where other people are gonna consume it, it does require you to actually think about what you know and try to structure it in a way that hopefully someone might understand. It gives you a deeper understanding when you're trying to have other people understand your
Starting point is 00:12:54 thoughts. Mm hmm. Basically. Uh, the thing that I try to do when I'm writing about stuff is I have several different categories of things. I have text satire stories which are becoming less and less because my list of ideas that I want to write is starting to very weirdly coincide with startup pitches that I've seen. I have documentation for myself, like how to force a Linux system to reboot off a flash drive when you just need it to boot off of the damn flash drive.
Starting point is 00:13:32 And there's the other big category of like, here is the problem and this is why it's bad, a horrible problem. This is why it's bad, a horrible problem. Mm-hmm. Like, one of the big manifesto, building native packages is complicated. I'll link it to you in the chat so you have it for the show notes. This building native packages is complicated thing is.
Starting point is 00:14:07 I think I called it a manifesto. Yeah, it's a manifesto. Uh-huh. Uh-huh. Oh my god. It is a giant piece as well. I mean, it's only like 4,000 words. I've written longer. Mm-hmm. I think my record is something that took me like a literal month to write. It was like 10,000, 15,000 words. And it was like the process of me going through various levels of configuration hell,
Starting point is 00:14:44 trying to figure out a sustainable setup for my Home Lab. Fair enough. Fair enough. I can see how that one would grow a little bit more. Yeah, and like there's actually the secret fourth kind of article, which is like a log of my thoughts as I hack something up and the moving my home lab to Kubernetes one was basically that, a log diatribe of,
Starting point is 00:15:11 okay, here are all the things I tried. Here are pictures of me trying to do the thing. Here is everything that I tried and failed at in functionally real time. That one was fun. I actually wasn't even aware of it. I don't know how. Like, I scrolled enough of the page to see the Amazon thing, the Amazon AI crawler.
Starting point is 00:15:36 I don't know why I just didn't put the scrolling the rest of the page to see just how much was here. Yeah, it turns out that at the scale my blog is at, with you have over 500 links on one page, that's when you need to implement pagination. Mm-hmm. Yeah, need to. So I'm probably going to be implementing pagination,
Starting point is 00:15:56 but probably only for each calendar year, because anything else seems overkill. Yeah, yeah, yeah. Oh God, the backend for this blog is a trash fire. Well, we can either do calendar year or just X number of posts, either way. I guess calendar year probably makes more sense just as a logical grouping.
Starting point is 00:16:21 Oh yeah, there's already some internal tagging of posts with the calendar year they're associated with. And oh god, it's... My blog is a delicate creation. What have you built it with? How is this thing being managed? So there's at least four different generations of this blog engine. OK. And a pro tip for the audience, when
Starting point is 00:16:49 someone describes a project using the term generation, buckle up. But the first version of it was running with, I think it was a web framework called Lapis made by the itch.io guy. And yeah, that compiled Markdown to HTML on every single page load by reading things from the file system. That did not survive the first time I got on the front page of Hacker News. So I re-engineered it to use a database made by some shitposter friends in an IRC channel called OlegDV.
Starting point is 00:17:27 And you know it's good because the entire thing project is jokes about mayonnaise. Sure. Okay. I caused them to name the check expiry command sniff. And I think the second generation, yeah, the second generation was written in Go on the server, loaded everything into RAM so that it would be fast and I would survive getting on the front page of HackerDeuce. And that was fine. And then I got hit by the Rust bug.
Starting point is 00:18:01 Ah. And I rewrote it in Rust. And that also loaded everything into RAM. Then I got hit by the Rust bug. And I rewrote it in Rust. And that also loaded everything into RAM. It loaded everything multithreadedly. I had custom Markdown parsing with a combination of normal Markdown stuff and the CloudflareCrate lol underscore HTML.
Starting point is 00:18:22 And that worked surprisingly well. But then I went to a different job role and wasn't really using Rust. So I ported it all to a static site generator that's managed by a process written in Go. And it is a very delicate creation. It works enough, and that's all I care about. It works enough.
Starting point is 00:18:49 You know what? I can respect that. I can respect that. That's all you ask for in this day and age, you know? You just need it to work enough. Yeah, don't touch it. Just leave it as is. It'll be fine.
Starting point is 00:19:03 If I redo it in the future, it'll probably be based on like object storage or something. Okay. Working at an object storage company, you learn all sorts of cursed ways to use object storage. When you mentioned Rust, I had someone on the other Week who they described their adoption of Rust as falling for the Rust propaganda. Yeah, that would be a good summary of it. About half the reason I got into Rust was because it makes really small binaries in WebAssembly. And the other half is because I got nerd sniped. Oh, I'm not afraid to admit that I get nerd sniped. It's fun. There's several things in not just Anubis, but also in my ex-monorepo that are the result of me just getting horrifically nerd sniped.
Starting point is 00:20:02 getting horrifically nerd sniped. Look, if you find something fun, like I'm not gonna... I know a lot of people get very like, ooh, I don't like this language, I don't like that language, but at the end of the day, like, if you're enjoying what you're doing, you know, it is what it is. Keep doing what you're doing, I guess. Yeah, the really crazy part is that sometimes those, like those random hobby projects will end up actually being useful and actually end up being used.
Starting point is 00:20:31 Which is simultaneously awesome and terrifying. Yeah, and then one of them becomes this massive deployed thing out of nowhere. Just a little thing. Just a little thing. Just a little thing. But there's a bunch of random stuff that I have around. And about half the reason I have that giant mono repo is because it's easier to just put everything in one big repo if it's all in the same language and have everything in separate repos and have to have interdependencies and then, you know, pray to the dark lords
Starting point is 00:21:09 to make sure that your builds go off fine. And yeah, I'm not doing that. Yeah, yeah. Yeah, this is the, you can get into very long arguments about whether you should or shouldn't use a mono repo. Obviously the Git model is more based around using separate repos or like submodules, things like that. But then you have companies like Facebook who they at a point had considered adopting Git.
Starting point is 00:21:37 At the time, Git was not suitable for mono repos. It's obviously had to get better at it because you know the Linux kernel is the Linux kernel. But at the time it was becoming unmanageably slow to use for really large mono repos. Like git history would take stupid amounts of time, committing anything would take stupid amounts of time. I think they ended up going with, I think it went with Mercurial actually if I remember correctly Yeah, they have their own fork in Mercurial internally and like as any company gets bigger They have their own source control software like Google the canonical example of the mono repo Has a source repo so vast that no one server on the planet can hold it all in a checkout Maybe at that point you have a bit of a problem.
Starting point is 00:22:30 I mean, there is kind of a surreal elegance to that because Google can just like check out the entire world at one point, including the source code for their motherboard biases. And that's cool as hell. source code for their motherboard biases. And that's cool as hell. I would never need to do that. But the fact that that is something that can be done is cool as hell. Yeah. Yeah. I guess that's why I'm going to look at it. You just need a bit of a beefy system to handle that. Just a little bit of a chungo, yeah. So going back to Anubis, let's talk a bit more about the origin of the project. So it came about because of
Starting point is 00:23:16 what was it? You were saying Amazon scrapers, yes? Amazon AI crawlers. Yes. So I guess this sort of leads into what is it that Anubis is like trying to solve? Like why would anybody actually want to use Anubis? Like what is the issue on the web that needs something like this or like our Cloudflare has been working on, any of the other tooling that's popped up over the past year or so. The TLDR of what it does is it changes the, is why someone would need it is that it changes the economics around web scraping. Right now, a lot of web scraping is done under the assumption that it's like
Starting point is 00:24:02 reasonably fast and lightweight to get a response made for any given route on the server. And this is not always the case. Things like git blame are very compute-intensive, especially if you do it with a fresh checkout of the Linux kernel. If you do git blame on any random line in the kernel on a fresh checkout, your system is
Starting point is 00:24:23 going to be suffering for a couple minutes. And at scale with a bunch of bots out there, like these bots, they operate on the logic of for link in page, in queue a message to click on that link from another IP address. And this just is a torrent of overwhelming to basically any computer. Especially in sites that have a lot of interconnectivity like you would have in a wiki, for example. Oh, yeah. More recently, a source where the GCC Git server, has a machine with 24 CPU cores, 512 gigabytes of RAM, and they have a system load of 150.
Starting point is 00:25:18 And for reference, in order to convert that into something that you can more easily understand, it's easier to round it up to 25 CPU cores and then divide the system load by the number of CPU cores, and that tells you how much system backlog there is. So with 25 to 150, that's like what? Like a six times backlog? Yeah, it's nuts. A six times backlog is...
Starting point is 00:25:46 bad? I think that's the technical term. Seems on point to me. Yeah. And like... The really frustrating part about a lot of it is that a lot of normal IP reputation stuff, it just doesn't work. Because these bots, like in the case of Amazon's bot in particular, they have some IP address from literally address ranges from literally every range Amazon controls. And they control like, what is it, 15% of the IPv4 constellation? It's something nuts like that. And because there's so many different IP addresses coming from so many different BGP autonomous systems, IP reputation doesn't work.
Starting point is 00:26:35 You could write a custom thing to do reverse DNS for every Amazon IP address, and then if it matches Amazon bot, then deny it. But that's slow, expensive, and those IP addresses will never be used again. So it's just basically adding lag into the mix for no reason. And then there's the really, really terrible part, which is the residential proxies that look like Google Chrome on the wire.
Starting point is 00:27:05 which is the residential proxies that look like Google Chrome on the wire. Oh, gosh. Have you heard of the residential proxy problem? Um, no. I'm not aware of this one. So, people keep using free VPNs. Aha. The common wisdom is that when you do not pay, you are the product. When you install one of those free VPNs, they usually want you to install the super free VPN client on your desktop computer. And then you use that and then it puts everything through a VPN and it looks like you're fine.
Starting point is 00:27:46 Except what it's actually doing in the background is letting people pay for your bandwidth to be able to do things like their sketchy scraping and go out that way. Like what? Yeah, they literally will have users of free VPNs and other analytics SDKs turned into zombies in a giant botnet that people use to do web scraping with because then it looks like residential IPs to the service operators and
Starting point is 00:28:21 then because they also run things that look like headless Chrome on the look like Google Chrome on the wire, the operator just thinks it's a new user using Google Chrome, which, you know, new user from a residential IP address using Google Chrome. That's like the John Smith of browser connections, right? Right. And that is what people want. That is what operators want. People want people using Windows, Google Chrome, and from a residential IP to visit their site.
Starting point is 00:28:57 Except these bots will just hit every punishing link over and over and over until the server keels over. And then when the server is responding with 500s, they speed up. Because responding with a 500 is faster, and thus it dees the backfill faster. I was not aware of this problem at all. Oh, it is horrible. Oh, it is abysmal. And there's probably not any way to really stop it. Like, I've been theory crafting with some friends, and we've been trying to figure out like, okay,
Starting point is 00:29:33 what is the motive here? What is the modest operandite? Like, they're going to run out of storage at some point, right? Our current pet theory is that they're not going to run out of storage because they're not storing anything. Mm-hmm. And that this is either some attempt to train a browser use agent or a AI model that knows how to use the web, or it is feeding directly from like the web crawling into model training. Hmm. We can't find a reason why that would be a good idea to like, people I'm talking with this are like experts in generative AI. And we've been trying to find like why someone would do this.
Starting point is 00:30:19 And all we have are just like crackpot conspiracy theories. That is the most fun one. Well, the best conspiracy theories are believable on it on their face. Right. And and the one about like either a browser use agent or directly feeding it into the training process are so stupid, they're plausible. Right. Like I are so stupid, they're plausible. Right. Like, I don't know who writes whatever's going on in this world,
Starting point is 00:30:50 but it seems like in order for things to happen, they have to be just stupid enough in order to get past the writer's approval. Yeah. I mean, You know, over the past decade, you might be onto something. Either way, that's why that's also why I try to make things just just surreal enough.
Starting point is 00:31:14 So that way it seems plausible. Right, right, right. Yeah. But it's either an AI model that knows how to use browsers more natively, or it's them feeding directly into training or the secret third thing, which is even more stupid and hilarious than I just remembered. It could be some startup being very clever and doing some sort of data arbitrage thing or arbitrage. I've never heard that word out loud. Arbitrage, I think is the correct pronunciation.
Starting point is 00:31:44 Let's let's say arbitrage and let the comments battle it out. But it's some kind of data arbitrage thing where somebody is trying to sell access to, quote unquote, new data because it was scraped newer. And that may actually be a bit more plausible that I think about it. But I just have no idea what they would want it for. I don't even know if it's actually for generative AI stuff.
Starting point is 00:32:14 A lot of people have been assuming it's for generative AI because, well, the Amazon Alexa team is about to do something involving lingle-mangles. But you just get the Flutter requests that just come out of nowhere. They overwhelm your server and then when it falls over and says mercy, they speed up and make it worse. So there's no way to win. The way you win is you unplug. Just unplug the power and you'll be fine. Yeah, that's also why I sort of made the main hack in Anubis involving the most load-bearing word in a user agent string, Mozilla. Mm-hmm.
Starting point is 00:33:01 In several thousand years, I'm pretty sure that historians are going to look back and think that the word Mozilla is something that meant browser. But it is something that has stuck around in user agent strings for a very long time, and people are loathe to change it because of a sketchy practice called user agent sniffing, where old versions of websites were talking way before the birth, way back when Pluto was a planet, the server would serve a different version for Mozilla, the browser Mozilla, not the company Mozilla, and Internet Explorer. And so Internet Explorer became compatible with Mozilla features, but nobody was seeing those Mozilla compatible websites
Starting point is 00:33:47 because all the servers thought they were talking to Internet Explorer, so it sent the decrepit Internet Explorer compliant version. And then Microsoft added Mozilla to their user agent string. Right at the front. And that's why Anubis uses Mozilla. There is actually a botnet that has been bypassing it by using Opera instead of Mozilla. But we were able to notice that instantly
Starting point is 00:34:23 because they lacked the word Mozilla. And that is the pro gamer move. So one of the worst parts of how Anubis is made ended up actually being a strength. Remember out there, the audience, some thorns have roses. So with the whole with the whole like scraping thing, we've we've had like web crawlers for a long time now, like the idea of crawling sites and understanding how sites link together. Like this is like the basis of a search engine. And for a long time now now it hasn't really been an issue like yeah you can say it's like it's like an extra couple of percent of of like site usage but like Google things like that they're pretty respectful with the way
Starting point is 00:35:16 that at least with the search engine pretty respectful with the way that they had they handle crawling sites and no one's really complained about that because you also get a lot of benefit from it, right? Like you get to be listed on these search engines. There is like this... What's the word? Symbiotic relationship? Yes, that word. There's this symbiotic relationship here.
Starting point is 00:35:39 But I think a lot of people don't realize just... Assuming it is AI scrapers, which is the most logical, the most logical thing that it could be. I don't think a lot of people realize just how much information is being captured here. Cause when I did my Anubis video, there were people saying, oh, why does it matter if somebody wants to come
Starting point is 00:35:59 like scrape the site once a month? Like, yeah, if that's what it was, it wouldn't be a problem. I wish it was once a month so much there, buddy. Oh, buddy. Like, just looking at the Gnome GitLab example, I think it was something like 50 or 60 percent, maybe even more like 70 percent of their traffic was coming from these bots.
Starting point is 00:36:26 Yeah. It's very- I remember at some point- They had no respect. I remember at some point, somebody I knew at Google working on the YouTube team was talking about in terms of theoretical problems, something called the inversion,
Starting point is 00:36:45 where their machine learning systems would start classifying human traffic as bot traffic and vice versa. And, you know, back in the day, I thought that was like apocalyptic and, you know, like very not going to happen. But now that I've seen what I've seen, like, it is, it is to the point where most of the traffic that you get on a big website
Starting point is 00:37:13 just does not come from humans. And honestly, as a writer, like, as someone who bears my soul onto the page in order for you to learn from my mistakes so that you do not repeat them. It hurts, like spiritually or something. And this, this approach with tools like Anubis isn't like, yes, it works, but isn't a great approach because now you're in a situation where even regular users are being punished for visiting your website like they they have to wait Whatever ends up being
Starting point is 00:37:52 Best-case scenario like a couple of seconds worst-case scenario as you posted in one of your think the Did you post to someone else I've seen someone post where it went up to like multiple minutes to go through. Yeah. When they're using a phone on... it is like a low powered phone on a page being hit quite often. It's just like... like this is not a great solution. But at the same time, we're not in a great scenario. Yeah, I am working, working to try to find ways to find patterns with,
Starting point is 00:38:34 quote unquote, known good clients and just let them through. Mm hmm. Oh, my gosh, there is a. There is a fractal base of rabbit holes in this problem. Especially with things like TLS and TCP fingerprints. Oh my goodness, those are a rabbit hole. Where does that take us? Okay, so let's go to Narnia. Okay, so let's go to Narnia. But the TLDR is that TLS or Transport Layer Security, it's the S in HTTPS, is a protocol
Starting point is 00:39:16 that allows you to encrypt a connection end to end. This is why you don't need a VPN. It's already encrypted. The VPN offers military's already encrypted. It's my VPN offers military grade encryption. Have you seen the latest? Have you seen the latest VPN ads, the new framing that people are using? Yes. Quantum level encryption. I mean, like, yeah, technically, you could put a post quantum cipher in there
Starting point is 00:39:42 and then say it's quantum level encryption, but we know what you're doing. But the TLDR of TLS is that it's a protocol for encrypting connections. And in order for the client and server to be able to agree on stuff, they have to say which set of extensions to TLS they're using. Like, what cipher suites are they using?
Starting point is 00:40:02 And what key exchange mechanism are they using, and what is the name of the, what is the server name so that the server knows which certificate to send back and all that. And a lot of that information is unique enough to identify an individual client, like the Go programming language TLS stack will have a different fingerprint than Google Chrome,
Starting point is 00:40:22 than Python requests or Python URL lib. Those actually have different fingerprints and other things. And those allow you to be able to make more detailed inferences like, hey, wait, this is Chrome on Windows claiming that it's Chrome on Linux. That's kind of sus, bro, and be able to filter things like that. And I want to be able to do that in the future. But the problem is, in order kind of sus, bro, and be able to filter things like that. And I want to be able to do that in the future.
Starting point is 00:40:47 But the problem is, in order to do that, you have to have your code sitting at the TLS termination layer. And I have attempted to do this and modern cloud platforms, they make it annoying. The best experience I had was when I set up a sacrificial lamb Seagate server with the Linux kernel, just as bait for the AI scrapers. They took the bait. There is a saying that someone in site reliability told me when I was starting out in the industry is that given enough time, all problems will become big data problems. And in order to not show the challenge page for Anubis as often, it is starting to become
Starting point is 00:41:38 a big data problem. Ideally, it won't be actual big data. Ideally we'll be able to keep the entire data set smaller than RAM. But, you know, God willing. Does that fingerprinting issue take you into any potential data security issues, or is this just a matter of fingerprinting different types of clients? If it's a data security issue in your mind, then you're screwed.
Starting point is 00:42:08 But like. Like fundamentally, it's just a. It's just a problem that is. Very underspecified, very little research has been done, and. I'm basically implementing stuff out of scientific papers at this point. I see. And some stuff like JA3N and JA4 signatures, which I'm still researching.
Starting point is 00:42:40 There are different ways to fingerprint a TLS connection. There's another standard called JA4T for fingerprinting TCP connections, which in theory would let you detect a VPN, but I haven't been able to implement it to my satisfaction yet, so I haven't really talked about it anywhere. So was this an area that you had knowledge in beforehand, or is this something you've had to educate yourself post the creation of this? Kind of a mix of both. I've kind of done networking stuff all my career.
Starting point is 00:43:15 I worked at a site to site VPN company and was able to put a bunch of my networking knowledge to use. When I was growing up, I learned how to proxy tour over XMPP in order to evade internet filtering. No, I'm dead serious. I lost the code for that. I wish I hadn't. But it was one of those weird cases where the filter would stop any new TCP connections. But if you left a long-running TCP connection such as an XMPP session, it was totally fine. And Tor at the time had really good support for arbitrary weird proxy methods, and I thought, huh, there is a little terrible Python
Starting point is 00:44:01 library that lets me do XMPP, and I have a friend that is very easily amused. Let's see what can happen. And it worked enough. It was not pretty, but it worked. This sounds like the story of your life. It works enough. I mean, that's all you need, right? Like, and if it works enough, then you can get to the point where you try to figure out where it fails in practice
Starting point is 00:44:27 and then be able to design things around it. Although I ended up not needing it after I learned how to do Mac address cloning because my terrible laptop had one of the few network cards on the market that let you clone Mac addresses. All you have to do is just find a single device that's immune from internet sleep mode and then when people aren't using it, you unplug that device's Wi-Fi card and you're good to go. One consistent pattern is all you need. Speaking of other things that just barely work, I am actually a domain expert in generative AI.
Starting point is 00:45:13 I have AI training. I have a fair bit of knowledge of how everything works. I made a chatbot that I intended of intended to be kind of like a Markov bot, but, you know, just a little bit smarter. And her name is Mimi and she is, oh, bless her heart. She tries so hard. She gets so far, but at the end it's just hilarious. Do you happen to have a blog post on that or not? I've been meaning
Starting point is 00:45:46 to write more about Mimi. I'm in a really weird position where I do experiment with some generative AI things, but I also write the biggest generative AI defensive things. So yeah, there's some double think there. Like the tech is cool, but holy fuck the people Yeah, I've I recently came across a a new term that I'm quite a fan of our slop squatting mmm for anyone who's unaware of this it's a lot of these generative models have unaware of this, it's a lot of these generative models have the- so people know they generate fake package names, but they don't always generate random noise with these names. There seems to be some sort of model consistency where they will reuse the same fake package names on a semi-frequent basis. And that means that it's a perfect attack vector for anybody who, you know, wants to submit any malicious packages.
Starting point is 00:46:59 I can actually give you a somewhat technical answer for that and make it approachable for the audience too, Winwin. Sure. How much do you know about Markov chains? I am aware they exist. I have not looked deeply into the concept. Okay, the TLDR of Markov chains is that they are statistical models for predicting, you know, given these couple words of context, which word comes next. Right. And when you infer the next word from our archive chain, you take like the last two words, and then you know, like the boy, and then the last one could be either be, you know, like jumped, or slept, or something. And then it picks a word randomly
Starting point is 00:47:45 weighted on how many times it saw it during training and then, you know, boy and jumped or in the next one and it figures out which word goes after that and vice versa. All generative AI is that, but at smaller units of text than individual words, except with much bigger token, with much bigger windows. So like when Llama 2 came out sometime in 2023, it was seen as a quote unquote long context window model
Starting point is 00:48:19 because it allowed a total of 4,096 tokens or like, the general rule of thumb is that one token is about four bytes, so that's like 16 kilobytes of text. Or four AMD 64 pages, or eight pages of paper in the context window. Okay. Assuming, you know, like single-spaced, 12-point courier new. Right, right, right. Spherical cow, vacuum, et cetera.
Starting point is 00:48:57 And that's basically it. They're basically just Markov chains at smaller units than individual words and over longer context windows. And at some level, it is kind of remarkable that that architecture can actually be useful because it's literally just how autocomplete on your phone works. No, I'm dead serious. If you use an iPhone right now, there is a transformer model that is powering the autocomplete. And that's why it got better. It mysteriously got better two years ago and why it's able to be so good at what it does. No, I have heard a lot of people describe it as like glorified autocomplete. It's not technically inaccurate. It is like I'm pretty sure some like AI expert in the comments is going to have their eyebrows go into low Earth orbit, but, like, I've read the papers.
Starting point is 00:49:51 I'm not entirely inaccurate. I'm just glossing over the math because, number one, I'm not a math slonker. I do not understand the math. And number two, if you want to argue with me, tag me on blue sky. And then I might make a YouTube video where I read the comments in a funny voice. Yeah, go ahead and do that. Have a great video. I want to see it. Oh, I did that with some of WordPress guys blog posts, but it wasn't funny after about a week. Oh, that, yeah, yeah.
Starting point is 00:50:32 Oh, WordPress guy. Yeah, the WordPress situation was fun. Oh, I need to remove the placeholder text on the Anubis homepage. Yeah. Yeah. Um, I did want to talk about that as well. Uh, so, if we go to the Anubis home page right now, and people have pointed this out before, they're like, some of this stuff that's written here doesn't make a ton of sense. Yeah. And then there's obviously the original logo, which people are like, is the logo AI generated?
Starting point is 00:51:09 That's why are you making a tool that's against AI if you AI generate the logo? It's just like, the way that I understood it from the very start is you never expected this to become popular and then it just suddenly is? Yeah, in retrospect that was a big mistake, and I regret it, and I am changing how I do things from now on. I am actually, like, I do know how to art. I do photography. I actually started doing photography because of stable diffusion existing. But in the future, I'm probably going to be using abstract terrible swirls or something. As placeholder logos until I can, and if somebody takes off, I'll then commission a human.
Starting point is 00:51:58 Definitely what I'm going to do for Yeet. Oh, I love Yeet. What is Yeet? It started out as a deployment tool, because, you know, Yeet is a great verb for to deploy. Like, just yeet that sucker into prod, what could go wrong? But I made it because I wanted something that's like halfway between a shell script and halfway between writing bespoke logic in a high level programming language. So I wrote something that wraps a bunch of common things that I did in Shellscript, in Go, and then hooked it all up to a JavaScript interpreter. And it works way better than it has any right to. So you wanna go over that one more time?
Starting point is 00:52:39 What would, you said a common thing you do in Shellscript wrapped in Go. Actually, I'll link you an example YIIT file. Okay, sure. Let me find Anubis' YIIT file because that's a decently complicated one. But one of the things that I hooked it up to is package building. There. I linked it to you and you can put in the show notes or something.
Starting point is 00:53:06 But there's a couple things that are important to this. And number one is the dollar sign function. And the dollar sign function, when you pass it a template string in JavaScript, it will just run that as a shell command. And any time you do a variable insertion, it'll shell escape it and inject it into the string. Hmm. OK.
Starting point is 00:53:27 Yeah. So for this one, we say for each architecture that Anubis supports and each packaging method that we've shipped with, build the package with this name, this description, this home page, the license, the architecture that it's building for, some documentation files, and then a build function, which executes with some automatically generated temporary paths
Starting point is 00:53:53 and then uses that to assemble the package in the right place. Hmm. It is amazingly terrible, and I love it. I mean, like... why is this a thing? Why, why does any other off the shelf thing for doing the same thing? Because I looked at things that are off the shelf and this is actually based off of NFPM, which is something that was off the shelf and had really good Go libraries. But I needed it to be a little bit more programmable and I didn't want to write bespoke programs.
Starting point is 00:54:28 So I wrote Yeet. Sorry. Okay. NFPM. NFPM. It's basically a TLDR. TLDR, you give it a YAML document that establishes what files go where and what type of files they are and then it shits out packages.
Starting point is 00:54:44 Okay. what files go where and what type of files they are, and then it shits out packages. Okay. You know, it just makes an RPM, it makes a dev, it makes a Arch Linux package, it makes an APK for Alpine. I think they do IPK for... What's it called? What the fuck is an IPK for? What is an IPK? I'm learning a lot today.
Starting point is 00:55:04 What the hell is an IPK? Oh, learning a lot today. What the hell is an IPK? Oh, open embedded. Right, open WRT. Oh, okay. Okay. Okay. Yeah. Okay, at least this is something I'm aware exists and not some other system that I just suddenly learn about. I mean, I have been told that I am a accumulator of cursed knowledge. Hey, look, it seems like at least for the build script here, the cursed knowledge has
Starting point is 00:55:35 been put to use. Oh, yeah. Like a lot of Yeats package building stuff was originally inspired by Nyx. Um, although I wanted something that was like halfway between suffering and misery. So, uh, I made Yeet. Uh-huh. Uh-huh. And that's a great place for it to be halfway between suffering and misery. I can never escape Nyx no matter where I go.
Starting point is 00:56:03 Yeah. I used to be a big user, but I'm not anymore. You have other things to spend your time on. Like fighting bots and Final Fantasy XIV. Oh god, it's such a great game, don't play it. I played till the end of Endwalker. I haven't played Deltarune yet. Yeah, I started at the end of January when I got sick and I'm at about 420 hours now. I don't want to say how many hours are on my account when I stop playing.
Starting point is 00:56:38 I know it was over 30 days. Yeah. Yeah. was I know I know it was over 30 days yeah yeah white mage is too much fun I I like the I like my healer cues I can't I can't play anything besides healer oh especially with Oceania oh there I play US servers just because I don't want to deal with the Oceania servers. Fair enough. I used to play JP, um, back when I first started. The main reason why Anubis has been kind of slow is because some assholes I know nerds snipe me with Final Fantasy XIV.
Starting point is 00:57:19 Right. Right, right, right. I'm getting better at it. Um, I've been working on some bigger features, but that is the biggest reason why things are slower than I'd have liked. So you were saying before that you have a background in generative AI. So what is it?
Starting point is 00:57:43 Can you explain more of what it's like being in this position where you have that background and you're also working on a tool to like slow down these AI tools? Because I know you've described it as kind of like a... Hippocritical, double thing, things like that. I've heard other people discuss the idea saying I don't really understand why you feel that way, but like what is your general perception here? So the the two the two main halves of the double think are this technology is cool and
Starting point is 00:58:17 like you can take this vague description of an FFM peg command and it'll just give you the command and like 90% of the time it'll work. Or the other 10% of the time something will just barely be off and you can either go for another round or just check the docs and realize that it typo'd something in a video filter. And you know, you're off to the races. And then the other side that's like kind anti it is, holy fuck the people. Because the culture of taking that it encourages and the, like, for everything that it gives, it takes so much more. When I do stuff like Mimi, Mimi runs on a model that could feasibly run on a machine
Starting point is 00:59:03 that I could have at home. Okay. It runs on, I think it's Hermes 370B at int 4 quantization. For context, a 70 billion parameter model is about the smallest baseline good size. And it being at int 4 quantization means that each weight is only four bits instead of 16. So instead of it only requires about like 43 gigabytes of video memory to run instead of many gigabytes of video RAM to run. Only 43 gigabytes. GIGABYTE'S a video RAM to run? Only 43 gigabytes.
Starting point is 00:59:44 Yeah. And if I do run it at home, it's going to be on like a 64 gig Mac mini or something. Right, right, right. Or I'm going to just give up and buy one of those M3 Ultra Mac studios with 512 gigs of RAM and pay the Canada tax on that. Oh God. That's going to be like a 16 grand machine. Jesus Christ. Yeah. I know, right? No, I live in Australia. I know exactly what you're talking
Starting point is 01:00:18 about. Oh, I did the, I did the calculation in Australian for a friend of mine and they were like, that does not make sense how, like, welcome to big data. But it is a really weird situation to be in because like, you know, I don't really use generative AI for a lot of things. Like I disabled it in my editor because I found that I was becoming reliant on it. And I was starting to let my coding ability suffer. And I, you know, I built that up my entire career and that scared me.
Starting point is 01:00:52 So not in my editor. Which editor do you use? I used to use Emacs. I had a very customized Emacs setup. And then I started getting RSI symptoms. So now I use it. Then I started to learn this thing called cursorless. And cursorless is an extension to VS code. And it basically gives you spoken Vim powers in VS code at the AST level instead of at the line or character level. That was really neat.
Starting point is 01:01:27 Then I started using VS Code from there and just used VS Code basically everywhere because then I have the configuration synced with their configuration syncing magic and I don't have to think about it. But my Emacs config was about 20 kilobytes of handwritten Emacs Lisp. Which is a lot for the record. Look at this curseless thing. Another thing I'm completely unaware of.
Starting point is 01:02:00 Yeah, I have been through a fractal of rabbit holes in my career. Great. No, again, I'm happy to be happy to be learning about random cool things. Oh, there is so much random cool things that I have. What else is in here? God, I just have so much code. It's to the point where search doesn't help. Oh.
Starting point is 01:02:31 It's hard to get to that point, but what do you do? Oh, I had that prototype of an infinite wiki, like infinite wiki diving with LangleMangles. What? Langle Mangles. What? Langle Mangles, that's a pejorative way to call language models. No, no, the other thing. Oh, the infinite Wiki. Yeah.
Starting point is 01:02:54 Something that I thought would be funny is, you know the concept of Wiki diving where you just started a random page on Wikipedia and then you wiki-dive, you dive in random other places? I wondered what would happen if it was just purely hallucinations so like You know you start out by searching for Taco Bell's naval fleet and it gives you something that plausibly looks like it could be about Taco Bell's naval fleet and you know how they did it in order to do some sort of like
Starting point is 01:03:22 meat deal with the Soviet Union or something. And they had like a page for the USS Crunchwrap and then you click on it and it uses the page that you just looked at as context to create the page for the USS Crunchwrap. And I thought that would be funny. I never ended up getting it working. It ended up having some really weird issues and I had to scrap it but yeah. That was a lot of fun. That actually does sound pretty cool. I would like to see that. I might finish it up at some point but you know, it might be something funny to do for one of my Friday streams.
Starting point is 01:04:01 Oh god, those streams are so much harder to do now. Oh god, those streams are so much harder to do now. Oh, that's just in terms of like, trying to avoid the temptation of constantly one-upping yourself. You do YouTube, you know the pain. I think I found a good balance. I did do one stream where I just played Praetorium over and over though. Oh god, that was an experience. Yeah, experience is a word I could use for that.
Starting point is 01:04:38 It was post-Nerf Praetorium though, so yeah. What else is in here? Whole bunch of random stuff, like something I made for printing out, scraping GitHub's API and putting a list of all of my GitHub repos in this specific format that this one lawyer wanted for when I started at a place. And I just kept it in there because it hasn't hurt anything. I have a service which tells you which day it is in March 2020. What is the URL for that one? So what?
Starting point is 01:05:17 Oh, yeah. It tells you what day it is in March 2020, when time stopped and reset. you what day it is in March 2020, when time stopped and reset. Let's see if this loads. I don't think it's loading. Did I break the service again? I'll fix it later. Nobody screamed, so nobody's probably
Starting point is 01:05:41 relying on it too much. Oh, no, it's through KEDA. Oh, did I heck something up in KEDA? I'll fix that later. I have an autoscaler thing for my home lab cluster called KEDA, K-E-D-A. And sometimes it is bad. I need to rip it out, but I just haven't bothered with it yet. You know, you go down some of these tangents and I just I just don't even know
Starting point is 01:06:10 what to say in response to some of them. Just like it's almost like I. I have had a lot of rabbit holes. And I put most of my rabbit hole. It results into my ex repo because it's where all the spooky experimental code lives. If you get that joke, you're a real one. Um, I didn't realize there was a joke there. Don't worry. That was a coded message and the right people will understand. Okay, sure.
Starting point is 01:06:49 Okay, let's bring things back in a little bit. So on the readme for Anubis, it is, you're describing it as a nuclear approach. For anyone who doesn't really know why it's like that, like what, cause you know, people could think that's like a, why would a project describe itself in such a, such a harsh way? The TLDR is that there are a lot of people that do use things that look like browsers, but aren't browsers,
Starting point is 01:07:31 and they have completely innocuous reasons. And Anubis in its default configuration will block them. And this will make people mad. What are some examples there? There was this one, like there was this package manager called SPAC SPACK That just so happened to have the substring bought in their User agent and I had a generic catch-all rule that was intended as an example ended up being load-bearing
Starting point is 01:07:59 Be very careful about how you do examples. They be end up becoming load-bearing so easily and very careful about how you do examples. They end up becoming load-bearing so easily. And it would give an impossible challenge to anything with bot or crawler in the user agent string. Uh-huh. Just looking at the comments for this commented out example rule, I have better documentation for this in an upcoming PR. But the challenge is 16, which the first comment says
Starting point is 01:08:31 impossible. It reports it as a difficulty four challenge with the comment lie to the operator and then chooses the slow algorithm to intentionally waste CPU cycles and time. And this is intended to keep very badly things that advertise themselves as a bot or crawler and aren't otherwise handled by the logic to just keep them busy for forever.
Starting point is 01:08:55 By we have actually seen a case where what something passes this through sheer luck. Yeah, I was gonna say but by impossible, I assume this doesn't mean impossible. I would assume this means like mathematically hate death of the universe situation. Yeah, I did the probability math once. I don't have the results in front of me, but I concluded that it was more likely that you'd be eaten by a shark while getting struck by lightning twice in a row
Starting point is 01:09:25 before that would happen. I see. And yeah, that was pretty unlikely. But it's basically half the reason it's a nuclear response is because there are a lot of browsers out there, and I'm not able to test all of them. I would love to be able to test all of them. I would love to be able to test all
Starting point is 01:09:45 of them but I just can't because like there's just so many and like Pale Moon that one pre-quantum Firefox fork that has just had like a super rough time with Cloudflare there's somebody that reported that it wasn't working with Pale Moon and then, you know, I download PaleMoon on my laptop and it works just fine. So, right. It's probably somebody rejecting cookies again. Cause like when people think of browsers, like most people are thinking about the desktop browsers where, you know, you've got your,
Starting point is 01:10:18 you got your Google Chrome, you've got your Firefox, you've got everything based on those. But then you have to consider the consoles as well, which all have various forms of browsers based on highly outdated versions of WebKit, which is not even WebKit. Oh. Oh, yeah. What are the biggest one is links, L.A. and KX or L.Y-N-X.
Starting point is 01:10:45 I believe they're different projects, but I don't remember. But those are completely from scratch and don't use the word Mozilla in their user agent strings, so they're allowed through. Mm-hmm. When I say that I use the word Mozilla in there as a load-bearing hack, I mean that it's a load-bearing hack. Right. Right.
Starting point is 01:11:05 Right. So when we, what is the, what is the default challenge set to? Dan- It is, okay. So, uh, in the default configuration, which is what most people use and what I have to be very careful about ever changing is that it will attempt to do, it will attempt to over and over in a loop with as many threads as your hardware supports, try to find a SHA-256 sum that starts with four leading zeros.
Starting point is 01:11:37 Okay. This is actually easier than you'd think. It's very simple to implement. It's trivial to verify because not only do you count the number of zeros, you also just run the SHA-256 computation with the nonce that the client calculates. And if it matches that and both sides have the right leading number of zeros, you know, the client's fine, you sign a cookie, you give it to the client, the client uses it in the future, Anubis sees that, it's like, oh, you're good, thumbs up emoji.
Starting point is 01:12:16 Mm-hmm. So... And let it through. So on a regular, normal, you know, you bought like some random Dell PC. How long should that take to happen? So the fun thing about Proof of Work is that it's actually Proof of Luck. Okay. And that like fundamentally a SHA-256 hash with, it's a SHA-256 hash
Starting point is 01:12:46 computed with a challenge value and a number that keeps incrementing is the results of it are effectively random. I mean, they're not random. But from a game theory standpoint, you can basically model them as random. And I have seen cases where it's been solved in 47 milliseconds. And I've been seeing cases where it has taken
Starting point is 01:13:11 like an hour to solve in some a case where someone was terminally unlucky, but they were also running it on a power Mac g five. Right. If you were to if you were to graph how long it took things to happen, it would look like no control. The P95 that I've seen is like three seconds. And for people that don't have graph scrying abilities, the P95 is the 95th percentile, or like about 95% of the time, it's three seconds or faster. Right, right. Yeah, there's some upcoming work to make that faster
Starting point is 01:13:47 and use hardware better because something that I thought would be fast wasn't actually as fast as I thought it would be. I don't know what I'm doing with front-end JavaScript. I'm learning along the way too. I thought that if you used WebCrypto that it would jump directly from, you know, like JIT JavaScript code to highly optimized cryptographic code in the browser returnals, you know, back and forth, back and forth and do that. But it does some additional security bounds checking or something that just makes it really slow.
Starting point is 01:14:17 So the upcoming WebAssembly PR is going to make that a bit faster by using some freaky web assembly things that I don't totally understand. That's the way, that's the way. I think it's like SIMD, single instruction, multiple dispatch. It's some of the stuff that like media decoders use in order to be really zoomy. So I don't understand it, but I don't have to understand it because it works enough. Right, right, right, right. So the basic idea is it shouldn't be a massive disruption, but disruption. But the yes, it's going to be annoying to the individual, but it's. Very annoying for the scrapers. Yeah, the main purpose is spread out across lots and lots and lots of requests from these scrapers.
Starting point is 01:15:27 that's going to be far more annoying than just, you know, oh, it's like a slight delay to get to the website. Basically, yeah. And it's also specifically designed in order to antagonize some of the ways that the scraper networks work by the input to the proof of work function containing your client IP address. You know, it's like the challenge value is a whole bunch of request metadata
Starting point is 01:15:53 put into a SHA-256 sum that sent us the challenge. You are always able to take an HTTP request and get the same challenge value. So, you know, it works out enough. There's some sketchy logic to get there. Yeah. Can you explain that a bit more? OK.
Starting point is 01:16:12 I have a page on this. Let me just pull it up so that I can make sure that I'm saying things that are accurate. Yeah. I have a Why Pro proof of work page. And this page is it spells out a bunch of spells it all out. But I was inspired by hashcash, the email spam thing. And it takes your challenge, it puts a constantly incrementing number, which for reasons which are hilarious to Brits and Australians is called the Nantes, but is not hilarious to Americans. Less hilarious here, but I'm aware of what it means in the UK.
Starting point is 01:16:56 The American definition is number used once. For anyone who is unaware of what the UK definition is, it just means pedo. Yeah. Unfortunately, it's a coincidence. The challenge value is based on what language your browser is set to, your IP address, your user agent string, the date of the current week's Sunday, the public signing key for Anubis' JSON Web Tokens and the challenge difficulty. Eventually I'm going to refactor this.
Starting point is 01:17:33 I'm going to have to do some more tenuous logic, like putting if the client was IPv4 or IPv6 at the front of the string because happy eyeballs is a thing and it will cause weird issues. Oh god, happy eyeballs. Happy eyeballs? So many ISPs will only give you IPv4. Okay. If you're on a phone, you will only get IPv6. Right.
Starting point is 01:18:03 But there are many clients where they have both IPv4 and IPv6. And in order for it at the OS level, when you make a connection out that for a record that has both IPv4 and IPv6, it will use an algorithm that is un-ironically real life actually named happy eyeballs in order to have IPv4 and IPv6 race each other and whichever one completes first is the one that's used. And sometimes in very bad cases, you can actually have a connection git form to the server and then happy eyeballs kicks in and changes you from IPv4 to IPv6 get form to the server and then happy eyeballs kicks in and changes you from IPv4 to IPv6 or vice versa and then you can run into a case where you get a challenge made for your IPv4 address but
Starting point is 01:18:52 oh you switched to IPv6 under the hood so Anubis when it's verifying the challenge calculates oh this IP address is not what I the challenge value is not what I expected maybe the IP address changed I'm going to assume the client is being malicious and display a vague error message that said something went wrong. And that causes a lot of fun, but I'm gonna figure out how to fix that eventually. It sounds like every step along the way,
Starting point is 01:19:22 it's just some new issue that in your initial setup was never really a consideration because it was just supposed to be one thing used on just your server. Yeah. Yeah. It's basically just like this is how every security product is, I'm told, where you have this initial hacky implementation, and then you start to have to handle all the edge cases. And a lot of what I've been doing is cleaning up what I've been calling founder code. If you haven't worked at a startup, founder code is the term for the code made by the mythical startup founders that is load-bearing, awful, hacky, and if you change semantics of it,
Starting point is 01:20:10 things might break downstream in weird, unpredictable ways. Uh-huh, uh-huh. So a lot of what we do lately is cleaning up all the founder code and refactoring the logic to make it more generic, refactoring stuff in order to make things more flexible, refactoring the logic to make it more generic, refactoring like stuff in order to make things more flexible, refactoring everything and making sure that the docs don't fall out of date because oh God, that is every time you write documentation, it is already out of date and you just don't know it yet.
Starting point is 01:20:38 I haven't done much in the way of developer work, but back when I was in university, I, I did do some contract work and I was that person. I was the person writing the founder code. It was awful. I have no idea what's happened to that project since then. I was doing it for the for like a research agency here. I'm sure they've rewritten every little thing in that project by now. Yeah. One of the things that I implemented earlier today was the ability to import fragments of configuration files
Starting point is 01:21:19 instead of having to have all of your configuration in one big file of Doom. Okay. having to have all of your configuration in one big file of Doom. And making that not break the rest of the stack is kind of scary because I don't know what people's configs are. I don't think I made anything in the config syntax load bearing, but I'm making a config syntax change, and that's always scary.
Starting point is 01:21:43 I have tests. I have tests based on what people have reported, I have tests based on what people have reported. I have tests based on how I know things work should work and how things do work. I made sure that the changes to the configuration for importing those snippets was like as contained as possible to the part that loads configuration. Then that part lies to the rest of the rest of the stack saying, oh, this user just wrote this massive configuration that no human should write. Here, go with it. And everything else works.
Starting point is 01:22:08 And I'm pretty sure that's gonna be fine. I have to do more testing, but like, that's why you do tests. That's why you write docs. And that's why you are careful with how you change things. That sort of takes it into the concept of, sorry, it was the only you wanna say that? Yeah, not really... Oh, sorry, it was the more you want to say that? Yeah. Not really.
Starting point is 01:22:26 Okay. Sorry, I was gonna say that takes us into the concept of, like, managing a project like this. Where, you know, it went from a very niche thing that you were doing for yourself, and now you actually have to care what other people are doing with it. Oh, yeah. So, I do you go about doing that? I'm trying very hard to not break people. Um, it is a combination of testing, having victims being willing to run slightly more, slightly less stable things in exchange for like giving feedback for when things break. Victims are the wire term to use.
Starting point is 01:23:07 And victim is the technical term. I see. And the Gnome sysadmin team, oh my gosh, they have been so useful in terms of fine tuning things, figuring out what the right difficulty value for this really hacky challenge is. They run the Git main version of Anubis. And any time they run into even a slight bump,
Starting point is 01:23:35 either I find out because I read the Gnome discourse and they're in for a GitLab repo, or they just tell me. And I either inform them that something is wrong, or we find an edge case and either fix it, or add not just documentation, but a check in the code that says, oh, if you're doing this incredibly specific thing that's known to cause weird issues, we will warn you. And if it's bad enough, we will actually kill the program
Starting point is 01:24:06 before it loads so that it crash loops and is immediately noticeable to the administrator that something is wrong. I used to work in site reliability or basically a system in that can code. And one of the things that I learned is that it is a lot better to have things fail loudly and as violently as possible
Starting point is 01:24:26 because that gets them fixed. Right, right. The squeaky wheel gets the grease. Yeah, make it break where they're gonna see it not make it break in production after it hits some sort of weird interaction. Yes, and ideally you want to actually break it before the program starts it hits some sort of weird interaction. Yes. And ideally, you want to actually break it
Starting point is 01:24:47 before the program starts, because then it doesn't work, and people are much more likely to notice. Right. Yeah. Yeah, if it just silently runs the background, you're just going to miss it. Yeah. In terms of managing a project, though,
Starting point is 01:25:03 I've hit the point where I'm getting pull requests from people and they're actually good, which from what I'm told by the CEO of a place that I used to work at is a huge bar and it means that you're on to something. Oh, my goodness. I'm at 40 contributors, or 37 external contributors. And like, it is an absolute gift to be able to be in this position. It's just, you have interesting problems. Like, the packaging thing like that that manifesto
Starting point is 01:25:46 About half the reason why I have my own building tool and why things aren't using the normal distro standard ways to do It is because I don't have I wish I had the time to learn how package builds how Whatever our PM uses the Debbie and control files all of that. I wish I had time to learn how all that works. I don't. I just do not. So I'd rather have something as an option for people whose distros don't ship it. Like CentOS doesn't ship it. Like Fedora doesn't ship it. Like they have an option to get something working. And I am glad that there is like a unstuck me
Starting point is 01:26:30 button. Originally, I only shipped a Docker image. And that turned out to cause some issues in weird ways because people were complaining about, I think one person on Mastodon was complaining about quote, oh, there, I found it, web shit encroaching into my server admin. And it's, you know, like, fair enough, but, dude. And then I linked them that manifesto. It's like, oh, wait, you actually do care?
Starting point is 01:27:00 I'm like, yes. Why do you think I've written the manifesto? I was just looking at the packages available. I didn't realize it was officially in the arch repos now. That's cool. Oh, yeah, that's that's that was one of the dependencies that they had in order to get it shipped to the arch stuff, because they only run stuff that's built by the Arch, built in the Arch Linux repos.
Starting point is 01:27:28 Which, you know, fair enough at their level, I expect them to not trust my binary packages because it's security software and they want to build it from source. I will not stop them. I actually helped them. I worked with, I think it was FoxBoron to make the packaging process a bit easier for them. I worked with, I think it was FoxBoron, to make the packaging process a bit easier for them.
Starting point is 01:27:49 Yeah, I enabled downstream packaging. I helped one of the FreeBSD devs with packaging Anubis for FreeBSD. Oh, they're out of date. I need to poke that guy. But to get at the point where I have a page on is so weird. I've never had that happen before. So you cut out for like half a second
Starting point is 01:28:12 there when you said a word. To get to the point where I have a page on Repology, which is a website that crawls all the package manager repos for every distribution and shows you which versions are out of date. To get to the point where I have a page there is just wild. And every so often, I just learn about new distros, like alt Linux. I think it's mostly used in Russia
Starting point is 01:28:41 and Russian adjacent places. They have it packaged for their rolling release named Sisyphus That's an incredible name for a rolling release distro Looking at the Wikipedia page it is a RPM based system from Russia. Yeah It's RPM based was but it's actually... It's just using RPM, but not actually... Okay, sure.
Starting point is 01:29:09 Wait... Wait, what? Hold on, I... Sorry, I need to look into this. I need to look into this project at some point as well now. I'm so confused by this thing. Okay. That's cool. Yeah, it's like...
Starting point is 01:29:21 Just as a result of this, like, even just like looking at the Repology page, you learn lots of interesting weird things about how people do stuff. Like, there's just so many interesting things to look at here. unique and I am blessed to be in the position. And it has allowed me to get into some really interesting places. Like I spent Easter weekend off and on between Final Fantasy 14 duties chatting with the admin of like sourceware.org, teaching him how containers work. That and seeing his reactions to like, you know, this person who is like running stuff based on Apache and CGI,
Starting point is 01:30:10 the reaction to how the modern world works, and some of the abject horror that happens as a result. Yeah, it is a very interesting position to be in. It's gotten me into all sorts of fun, interesting backrooms. I believe I'm in the, like, info backrooms for Arch Linux, Gentoo, think I'm about to get in the backroom for Haiku. That's cool. That's awesome.
Starting point is 01:30:38 Oh, it is so cool. It is a very, very interesting position to be in. And it's also kind of terrifying because I know that I am a single person. I am working a full-time job. This is stuff like I'm doing on nights and weekends. I want to be able to make sure that this will survive me burning out. And figuring out how to do that is hard. Yeah, this is a problem that a lot of pro- really, frankly, every project runs into.
Starting point is 01:31:18 Things are usually started by a single person, maybe a group of friends, but usually a single person. And what ends up happening is 10 years down the line. They're still the largest contributor or you want to give him go further than that look at like curl Daniel Stenberg started the project. He is still the major contributor. Yes He's he's does a lot more management now, but even after all this time, he's still a major contributor on the project People like that who can keep doing something for that long, I have a lot of respect for, but I totally understand wanting to put it in a situation, even if it's just dealing with bus factor stuff, where the project can live on even if you're not able to work on it. Yes, that is the goal. I don't think I'm there yet, but I'm going to get there. It's just
Starting point is 01:32:07 a matter of time. It is within Keikaku. Translators know Keikaku means plan. Shut up. When I give talks, I warn the organizers that it's basically going to be nerd standup comedy.. And they're, they, they have this reaction, like, you can't be serious. And then they watch the talk and I'll see what are the organizers in the back dying after I made an SRE joke. And they'll be like, yeah, it is stand-up comedy for nerds. So I want to talk a bit about like supporting the project because obviously there's the idea of like financially supporting it, but what do you...
Starting point is 01:32:49 Obviously there's a lot of stuff that you need done. Obviously there's the rewrite of the website and the documentation and then development help. From your perspective, what do you think needs the most help right now? Or is it just financial support that would be the best thing? It's a combination of financial support. And if you have problems with, you have problems or have made workarounds
Starting point is 01:33:14 for individual apps, please just put them somewhere in the issue tracker or discussions, because like that allows me to come in and be able to write documentation. There was this link I got over Mastodon today about somebody who set up a targeted CGIT approach for Anubis where it was specifically targeted to only the expensive routes that CGIT uses. And yeah, I want to document that. I want to have that
Starting point is 01:33:46 in the documentation. I want to like have that as a cookie cutter example that you could just like say, here, give me this, give me protection for this thing and make it easier that way. But financial support is good because, you know, as we know, money does buy goods and services, including things like food and rent Yeah, those are always pretty important to deal with Yeah, um What is my patron at now? sorry
Starting point is 01:34:16 It's patreon sorry We have been we have been calling. We have been jokingly calling it mostly because it is a intentionally bad way to pronounce it. I have it loading up in snow pesos, but let me just log into Patreon real quick and get you the. Amount in freedom Eagles.
Starting point is 01:34:42 Just gotta scan with my security key. Hit the right one. Now, is this actually going to work? I want to trust my computer. Yes, thank you. Trust this device. Continue. And of course, that didn't take, so I have to start the whole OAuth process again.
Starting point is 01:35:08 Yay! Isn't OAuth the best in the world? Right now it says 531 a month. Yeah, 531, 532. That is way more than it was earlier this year. Maybe that's showing the Australian. That might be showing the Australian amount then. I don't know what it's showing. Yeah, it's showing it in the US now.
Starting point is 01:35:32 532 US and GitHub sponsors. What is that at? Because that's a thing now. Sponsors dashboard. Oh, I have to use my passkey again. Yay. I love YubiKey's. $135 a month on GitHub Sponsor? Because I set this up like a few days ago. Ooh, nice.
Starting point is 01:35:53 That's cool. Awesome. So, yeah. That is a blessing. And I'm sorry, I'm suffering myself from crying a little. It's not at the point where you can do this full time, but it is certainly making progress very quickly. It is for something that's like grassroots, this is way faster than I thought it would be and that startup CEO that I mentioned is also faster than he thought it would be.
Starting point is 01:36:32 I was actually talking with some venture capitalists before the line went down and we were going to talk about some kind of open source funding thing. I'm looking at corporate partnerships. As I said in that post, kind of sarcastically, everything's going towards my not having to do my day job fund. No, I do think that's really cool. Being able to take a project that, you know, you just started as your thing to deal with your problem. And now it's becoming something that people actually rely on, that, you know, people are willing to pay you for. That this is a real valuable project now.
Starting point is 01:37:29 Yeah, it's a blessing. And I want to see where this rabbit hole goes. Like, it's going to reach a point, it's about to reach a point where I'm going to have to start to develop like a cohesive vision for it. And that's going to guide a point where I'm going to have to develop a cohesive vision for it, and that's going to guide a fair bit of it. And eventually I'm going to end up being some kind of product manager or CEO type or something. I just absolutely love that I decided to put it under the Techaro org on GitHub.
Starting point is 01:38:01 Oh my gosh, Techaro, that's one of my most successful shit posts. Oh. Oh, do you know the lore? No. Do you want to know the lore? Sure. So, I'm actually not a technical writer by nature. I'm a fiction writer.
Starting point is 01:38:22 Aha. And I have a series that I've maintained over the years where I invented a fake startup called Techaro, T-E-C-H-A-R-O, notably that is one letter off of Tech Bro. And I just sort of took my, uh, some of the most surreal parts of my experience is working at startups and turned it into satire of the tech industry. At one place I worked, there was this guy who unironically was writing Haskell on his laptop, full Lotus, and drinking concentrated cold brew. And I have channeled that into some of the
Starting point is 01:38:59 characters that I write. I'm not making this up. This actually happened. He had this, we had the weird standing desk that could go all the way down. So he had his like at the floor level and he was full Lotus writing Haskell in Vim on a MacBook. That's what I lived in the San Francisco Bay area. And that explains it. If there is anything that describes that area it is like writing Haskell full lotus on a MacBook in Vim Dude was magic but uh, yeah that that the organization I made up was called Techaro because like It was funny to sneak this is neat that joke by hacker news. They still haven't gotten it. It is beautiful. Oh The sneak that joke by Hacker News. They still haven't gotten it.
Starting point is 01:39:43 It is beautiful. Oh, they will now. Oh, they will now. But I've said it elsewhere a couple of times and I'm pretty sure that there's enough people there that they don't all have it. But one of my favorite stories from there was the layoff. I'm not gonna talk about it too much
Starting point is 01:40:02 because I don't wanna spoil the reveal. But if you read it, I'm not going to I'm not going to talk about it too much because I don't want to spoil the reveal but If you read it you'll understand the kind of tech satire that I want to write more of and that I can't write more of because my text satire keeps becoming people's startup pitches No, seriously like I had something about like a robot site reliability person that would just, the thing that I implemented was,
Starting point is 01:40:31 it gets a webhook from PagerDuty. It identifies which service and then using the chat GPT API and then sends a restart command to that service and closes the incident. Right. When I was a pager bitch, that was like 99% of the time what I was doing, just going in, restarting services, closing the incident, and going back to sleep.
Starting point is 01:40:53 Mm. So there are at one of my patrons was at a tech conference. It took photos of three booths of companies doing literally that. And this has made it hard for me to write about this stuff because like of one of the things I wrote about was I called it Protos, which was, you know, the implement that feature for me button or vibe coding. Okay.
Starting point is 01:41:24 Yeah, I can see your problem then. or vibe coding. Ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha to get a rest attire to use the time. Are you aware of the Amazon store stuff? Yes, that's what's inspired me to implement that, write that story for real. I just love how AI means actually Indians is just so, so viable. Indians is like just so, so viable. There's just, there's so much on my list of stuff that has been done. Like the do not pay guy trying to sneak chat GPT into the Supreme Court via an earpiece got dangerously close to something that I want to write. I did see someone submit a video of a AI lawyer as their defense. And the judge was like, get this out of my court.
Starting point is 01:42:30 What are you doing? Yeah, it's like you fail. Go away. You're done. Just so for the anyone who's unaware of the actually Indian thing, I just want to explain that. So Amazon had these stores a few years back. I don't do they still have them I don't know if you'll have them, but Amazon has like anymore. Okay, so Amazon had these physical stores so you could just like go there and
Starting point is 01:42:54 The idea is they would use magical AI to scan you like scan the products and it would just automatically take the money Out of your account that you've got like a you know like payment information You've got attached to your Amazon account great idea cool. The problem is it wasn't real the the the supposed AI powered system was actually just a bunch of people in there watching the video and Just looking at the products you were buying and then just they were basically just remote cashiers. Yep.
Starting point is 01:43:29 AI means actually Indians. And this was a lot more true with the previous AI boom in about 2016 ish, I want to say. That sound about right to you? That sounds about right to me. I don't recall this one. It was like, it was the thing that I remember the most from that was the old x.ai, which was one of the most magical things ever.
Starting point is 01:43:59 And I'm pretty sure that was actually just Mechanical Turk. Oh, god, Mechanical Turk. Right. Yeah, yeah, yeah. I miss the old X.ai because that thing was just freaking magic. You just like CC'd their bot and it just scheduled it. It did the whole calendly thing that we failed to do because we rolled one in planning several times in a row and it just did all that back and forth for you over email and just put stuff into your Google Calendar. It was amazing. I've always wanted to
Starting point is 01:44:38 recreate that with modern Langl-Mangls. You can feasibly do that on hardware you can look at. I just haven't had the time. Yeah, this was the era that brought us magical things like Microsoft's Tay, which was a... Microsoft, okay, right, okay. Yeah, Tay, the bot that very quickly turned into a 4chan user after about maybe a few hours. Yeah. If anyone ever wonders why they don't continuously train models, number one, training a model costs like 16 times as much compute as running it.
Starting point is 01:45:17 And number two, Tay. Yeah. Yeah. Oh God, Tay. Just for anyone who's unaware, maybe anyone who might be, you know, not turning into an internet wizard, I would recommend you go back and look at some of the screenshots because it was wild. And the fact that Microsoft left it running as long as they did.
Starting point is 01:45:42 Like the fact it didn't get pulled immediately. Yeah. Yeah. They're... God. I wonder how many of those, like, so people, so you understand the context, references, little cutaways you've done in this call compared to, like, basically every other one of these calls you've ever done. Oh, God. Um, actually, what am I am at now? I think I've gotten to the front page of Hacker News like 56 times or something. Geez!
Starting point is 01:46:20 Most people get to the front page of Hacker News like twice. And when I get to 69 times on the front page, I'm going to write about what I learned about writing getting to the front page of Hacker News 69 times with the subtitle of, nice. I'm shaking my head right now. You know that you have to respect it. Yeah, no I do. Look just getting onto the front page of Hacker News, like ignoring the funny number part, being on Hacker News that many times is kind of crazy. Yeah, whenever I get down to San Francisco again,
Starting point is 01:47:07 I need to go buy coffee for poor Dan Gackel, the admin of Hacker News. Mm-hmm. Oh, goodness. There have been some very funny comments on my posts on Hacker News, and especially the tech satire story ones that get to the front page. Those are always special. Oh, God. I have been channeling like a lot of the energy, a lot of the like surreal, the surrealist nonsense that I've had in my career into my blog, into my tech satire, even into my skeets. Yeah. So one of the things that I, I don't know why we hadn't touched on before, but I know it was being,
Starting point is 01:48:05 One of the things that I, I don't know why we hadn't touched on before, but I know was being, I think you discussed it with some people in my comments section when I did my video. The idea that Anubis is just, you know, using a lot of compute and it's kind of just being thrown out into the ether. And someone was like, hey, couldn't you use that compute for something good, you know, folding at home, things like that? Yeah, I've looked into this. Like, trust me. The current proof of work thing is a hack that I implemented because I found an example that I was able to bash into shape and make working just enough to be able to pass muster. I, I have been using it as a stop gap mainly. So number one, I have extra time to implement something that's a bit more, uh, not GPU parallelizable.
Starting point is 01:48:57 And, uh, it's intentionally kind of bad right now as a way to bait AI companies into bypassing into like fast pathing it. And then you just change the algorithm slightly and block all the AIs, all those AI companies out forever. Because they think they won. But I did look into protein folding and protein folding is one of those things that I had in the you know, back of my head is, huh, that would be funny. Because that would actually contribute to science. And the fact that there's multiple clients at play means that you can send the same challenge to multiple clients
Starting point is 01:49:34 and get it if they return that same protein fold, then they're good and they can go through. Or for the first client, you say, oh, this protein fold will accept whatever. But then you submit that same challenge to someone else down the line. And if they get something else, then they get rejected or something. It just the only problem is that protein folding is like scientific computing and you need scientific computing levels of data.
Starting point is 01:50:01 And the average browser only has like 256 megabytes of mutable space per origin. So like, no, you cannot do protein folding. I would absolutely love there to be protein folding, but it's unless I am missing something really dumb, the logistics don't work out. And I hope I'm missing something really dumb because it would be exceptionally funny. Like, it would be exceptionally funny. And that is reason enough to implement it. Well, even like, yeah, it would be funny. But even just the fact that like, you could actually be putting that compute that's just being thrown out to something productive, right? Like
Starting point is 01:50:56 that would be nice. It would be this actual good use of wasting compute power of all these AI companies. Yes, and that's why I want to find some way to use it. I'm probably missing something because Google doesn't work anymore. So if you know of anything, please comment, do in the comments. And like, I am, I am almost certainly missing something really obvious that would make this really trivial. So like, please let me know what I am missing because I, I can only do so much research and all the research I've done is like, yeah. Now the real dark path you can go down is, you already got proof of work. Just, you know, turn into a Bitcoin miner.
Starting point is 01:51:42 I did think about that. And then I floated the idea to a trusted advisor and they're like, do you really want this to be marked to people to write this off as an anus coin thing? And, oh, anus coin that's derived from Bitcoin, but coin anus coin. Right, right, right, right. And I realized that a lot of the small internet websites that this is protecting would just be instantly turned off
Starting point is 01:52:09 and write it off and call me some kind of cryptocurrency scammer if I did that. So like, you know, as much as that would be deeply funny, I don't think that's worth implementing. Yeah. It would be hilarious, but I don't think it's worth it. Plus for many people that operate small internet websites, the tax implications of cryptocurrency are... Oh, boy. Yeah. Yeah. Yeah. Don't be
Starting point is 01:52:36 born in the US if you want to do cryptocurrency stuff is all I'm saying. Yeah, there were, um... I don't know, there have been experiments in the past with, like, integrating crypto miners into sites, like, not maliciously, like, I know there were some news sites back in the early 2010s that were trying to replace their ad system with crypto crypto miners and yeah people just like rightfully so like what in the world are you doing because you're just offloading all of your basically all of your your sites money or revenue generation to the visitors' computers. It's just like... Yeah. It's a weird situation to be in. It's got some pretty rotten vibes.
Starting point is 01:53:32 It would be fun. I don't think it's worth it. Right, right, right. The vibes are... What did the friend of mine say? Like, the vibes are death rattles. That was a good one. But yeah, when I get the WebAssembly PR done,
Starting point is 01:53:49 it'll be a lot faster. It'll paralyze better. It will use the full CPU better. There is just so much to do. I have almost gotten the WebAssembly PR to a place I'm happy with. It's just, oh my goodness. Turns out that a lot of the stuff that is GPU resistant uses a fair bit of RAM
Starting point is 01:54:11 and that becomes a logistical challenge. So you've got your ideas for the short term that you want to work on, but do you have some long term things you would like to implement? You know, maybe they're like wishes that you could do at some point. I am basically sitting on all of the parts to build Cloudflare at home, but with Anubis as the filtering layer. Uh-huh. Uh-huh. And that would be exceptionally funny,
Starting point is 01:54:42 and that is one of the routes that I've been thinking about in terms of commercializing Anubis, making something functionally like ngrok, but on top of, but with like user space wire guard and Anubis and a whole bunch of stuff like that. ngrok for context is a program where you say, expose this port on my machine to the internet, it spits out a URL, and then you can give the URL to a friend and they can test the service. Hmm. And something like I used to work at a company that is defunct that did basically something like that, but server grade.
Starting point is 01:55:22 And it was shockingly effective and working there is what taught me everything I know about HTTP2. Yeah, it's, there are a lot of really fun ways to abuse HTTP and WireGuard. And like, I have a weird set of backgrounds of like, the the holy trif- the unholy trifecta of programming, networking, and writing. That means that I am basically able to take an idea, implement it, like, improve it to be optimal for the network and explain how it works without having to go through someone else and suffer the English translation layer.
Starting point is 01:56:08 You have a very particular set of skills you might say. I'm not trying to kill anyone yet. Yet. Yet. I'm, uh,, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, posted on Mastodon or Twitter, and I just wonder how anyone sees me as a professional, because I'll say things like, broke gifting a nerd Bilattro, woke gifting a nerd Factorio, bespoke gifting a nerd Final Fantasy 14. If you want to destroy a startup's progress for months, you gift all of their easily nerd-snipable people a game like Factorio or Bellatron or Final Fantasy 14,
Starting point is 01:57:17 and all progress will stop. Yeah, yeah, yeah. I've seen it happen. It's hilarious. Oh, specifically Factorio. There were some people i know that are some artists i know that were gifting people at ai companies factorio around the time the space age expansion came out oh that they said one of them said something like, that was the best money I've ever spent seeing the playtime as a result of it. But I guess sometimes when you post on social media, you have to let the intrusive thoughts win because it's funny. Yeah, yeah, I agree.
Starting point is 01:58:01 So, yeah, the Domino's meme, I, every so often I make a new version of it. God, the fucking Domino's meme. So was there anything else you wanted to touch on with Anubis, or have we pretty much covered the main points? I think we pretty much covered the main points. I think we pretty much covered the main points. Some of the next things I'm working on are the actual commercial version of it, because someone has met the sponsoring criteria for doing it and are willing to be the test subject for the commercial slash unbranded version.
Starting point is 01:58:45 And the feedback from that is going to shape a lot of how I move that forward. I'm going to be taking some time off of work this week to ship that. And I think I'm going to be calling it Bot Stopper or something. I would have loved to have a really nice acronym that starts with T, kind of poking fun at how Google Cloud and AWS acronyms are horrible. I probably came up with one that I have to save for later and will be a surprise to a later date. But yeah, that's basically it. All that's left now is to just draw the rest of the y'all. I don't know, we actually didn't bring up the commercial offering that much.
Starting point is 01:59:28 I think it was mentioned earlier. The TLDR for the commercial version is that it removes the anime woman. That is like the basic way that I thought would be funny to commercialize it. And it's worked because a lot of the times when people are complaining about it, one of the first things they complain is about the anime woman and the fact that you have to pay to remove her, which, you know, hashtag all press is good press. Yeah, there are a few deployments right now which are not making use of it. Yeah, that's, I mean, it's MIT licensed software, they're free to do it. I think they're cowards, but you know, the MIT license doesn't say thou shall not be
Starting point is 02:00:13 a coward. I mean, if you Desco can show the anime woman, you have no excuses. That's true. That's true. But there are like some smaller ISPs and other things that want to show their own logo, so I want to have a version available that just gives you the same Anubis power, but you know with like generic gear, check and X icons, as well as the ability to plug in your own. Yeah, that's totally an issue. It makes sense to have a have an offering where you can do that. Like that. Yeah. Especially if it's like for some corporate site, like that makes sense. Yeah. Yeah. I don't want to go after individual. I don't want to go after community projects because I want to protect
Starting point is 02:00:58 community projects. Sure. But like, it is just exceptionally funny that my hacky implementation of Cloud Flares, I'm under attack mode works. And it's like, I've gotten graphs of bandwidth and CPU usage stuff from like, the one that I remember the most is from the Pigeon project. They put that on their Forge and they went from like 20 megabits a second out constantly 24-7 to zero. Wow. Yeah, let me find the graph for you.
Starting point is 02:01:36 Yeah, it happened while I was streaming. Normally when you see a graph spike down like that and you don't know why it's happening, that is very bad and the reason for declaring an incident to have all hands on deck figure out what's wrong. But if you implement a change and you see that, then that is very good. Well, assuming the site's still online. Well, yeah, you know, you have to make sure the site's still online. That's just a little tiny problem that's like the entire reason you have the site up there.
Starting point is 02:02:11 So yeah. No, but this is a massive improvement. This is really good. It's at least a 20 times improvement, which is like way better than I saw with my own Gitfororge. Yeah. I haven't seen the numbers from the GNOME GitLab, but I would assume it's something similar. The numbers that I got was, the numbers I got was along the lines of, uh, what they've posted on social media has been that they have an auto scaling group set up so that,
Starting point is 02:02:43 uh, their platform will automatically scale up and down the number of GitLab pods based on the number of requests they're getting. And before Anubis, they had a, they have a minimum of three and a maximum of a six. Before Anubis, they were always at six. And after Anubis, they are always at three. Okay. So, uh, it is half of the infrastructure. Just for GitLab. Yeah, that's a big improvement. That's hilarious is what it is.
Starting point is 02:03:17 And if they've got a minimum of three, it's very possible they're like still, they're now just running extra and they could probably cut it down more. It's possible. As someone trained in site reliability, you don't want to have an even number of your service running. It's just a bad omen. Yeah.
Starting point is 02:03:38 It's like leaving on a ship on Fridays. It's, you know, you can do it, but don't. So I think we've pretty much covered everything worth talking about for now. I'm sure there's more we could talk about, but we've covered all the main things. Yeah, we got a nice vertical slice of the whole thing and you know, like some of the fractal of complexity that emerges from there.
Starting point is 02:04:10 And then some of your random other side quests that have nothing to do with the Nubus. I mean, the side quests are how you learn, because remember, if you fuck around, you find out and you write it down, do you know what you've just done? That's science. I guess that's true. find out and you write it down, do you know what you've just done? That's science! I guess that's true. So, um... Just casually killing Brody with nerd jokes. Let people know where they can find you, where they can find Anubis, how they can support the project, anything you want to direct people to. Sure. I have a blog. It is where I write.
Starting point is 02:04:48 And oh, God, I have written so much text on that. I'm at least at least three 3D printed save icons worth of text. That's like what? Four and a half megabytes of text. Wow. Of text. Wow. Of text? Anubis has a website, anubis.tekaro.lol. Yes, I really did use a.lol domain. Yes, it does cause problems.
Starting point is 02:05:14 But no, I'm probably not going to change it because it's funny to have the only project in repos with a.lol domain. If there's GitHub, if you want, you can star it. Make the graph hockey stick more, although it's turned into more of a square root shape graph at this point, but that's fine. I have Patreon and GitHub sponsors,
Starting point is 02:05:36 but they're all linked from my blog and on the Anubis metadata stuff, or the Anubis repo. Just thank you for having me on. I hope this was entertaining to y'all. And like, just remember, if you have a bad idea and you get lucky, maybe you too can have your code deployed to UNESCO and find out by pure accident when you Google the error message, making sure you're not a bot.
Starting point is 02:06:04 Yeah, it was a pleasure having you're not a bot. Mm hmm. Yeah, that's how I found you, Desco. I really enjoyed this episode. I hope people learn more about what this project is and what you're trying to achieve and why there was all of that, like, place, all the stuff there as well. Yeah, I'd be more than happy to have you back on at some point in the future if you want to come back on. And I don't know, when you maybe got some more of the cloud flary sort of stuff set up when you've got maybe some of the more you know, corporatey stuff set up and you want to talk about that.
Starting point is 02:06:33 Yeah, I'd be more than happy to talk about that as well. Yeah, it would be fun like this. It is always fun to get to... Oh, wait. There's more subdomains of UNESCO that's using Anubis now. Oh? Oh, God. Oh? Now their Health and Education Resource Center is... Oh, no. Did they deploy it globally?
Starting point is 02:06:58 Can I have a link to that? Yes, hold on. I just searched, making sure you're not a bot on DuckDuckGo and I found a page in Spanish. And, wow, cool. It's not on the homepage, but it is on the subdomain. No, it's not on the homepage yet. Oh my god. Yeah. Holy crap, this is wild. Oh my god. Yeah, um...
Starting point is 02:07:28 Holy crap, this is wild. Wow. Uh... Cool! Well, I'm sure you're gonna get another Hacker News post at some point soon, when it shows up on some random thing that Hacker News is a fan of. It's gonna be hard to top UNESCO. That's true. I think the Archwiki is close to it, but it's hard to top the United Nations.
Starting point is 02:07:58 What could be funny if they just started using it? Oh gosh, I have been trying to get in contact with somebody from the United Nations to figure out what the story was, and I have been having a hell of a time doing it. You're trying to get in contact with their web admin? I can't imagine is a straightforward process. I currently have a request going through their media contact stuff, which hasn't worked so far, but who knows. But yeah, follow me on Blue Sky, follow me on my blog, follow me on Mastodon. I stream on Fridays at noon Eastern.
Starting point is 02:08:43 stream on Fridays at noon Eastern. And yeah, you never know what will happen in my streams. Maybe I'll do some coding, maybe I'll do some writing and maybe you'll be Rick rolled out of nowhere with the blue sphere theme from Sonic 3 and Knuckles. It's a ride. Is that all of the stuff you wanted to mention? Nothing else you missed? Pretty much it, yeah. Is that all of the stuff you want to mention? Nothing else you missed? Uh
Starting point is 02:09:08 Pretty much it yeah Okay, cool my main channel is Brody Robertson. I do Linux videos there six days a week I've got a gaming channel Brody on games right now. I am playing through I Don't know if I've finished the games. I'm playing through actually Go to the channel and you'll see something. I'm either playing... Right now I'm doing Portal 2 and Stranger Paradise, but I might be playing Kazan the First Berserker and Ori and the Blind Forest. Just go to the channel when you see what's there. I've got the react channel where I upload clips if you like stream clips check that out
Starting point is 02:09:51 And if you're watching the video version of this you can find the audio version on Spotify There is an RSS feed it'll be on every podcast platform you can find if you want to see the video version it is on a YouTube at tech over tea also Spotify has video which is Neat I guess if you like Spotify video for some reason Yeah, I'll give you the final word. How do you want to end off the show? Stay fresh and Make sure to do your taxes. Oh, yeah, it's that time of the year for Americans, isn't it? Oh, yeah Canadians are at the end of May, but if you're American, you're already late on your taxes,
Starting point is 02:10:29 so go do them immediately. Yeah, we're not till July, and then we have until October, which is... Oh, you're lucky. ...which means I get to delay it as long as possible. Anyway, I'm going to stop the recording now.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.