Podcast Page Sponsor Ad
Display ad placement on specific high-traffic podcast pages and episode pages
Monthly Rate: $50 - $5000
Exist Ad Preview
Tech Over Tea - Anubis The Saviour Of FOSS Websites | Xe
Episode Date: May 9, 2025Today we have the developer of Anubis a tool that has taken the FOSS world by storm, protecting against the growing danger of AI scrapers we're seeing take down major FOSS websites==========Support Th...e Channel==========► Patreon: https://www.patreon.com/brodierobertson► Paypal: https://www.paypal.me/BrodieRobertsonVideo► Amazon USA: https://amzn.to/3d5gykF► Other Methods: https://cointr.ee/brodierobertson==========Guest Links==========Website: https://anubis.techaro.lol/Github: https://github.com/TecharoHQ/anubisBluesky: https://bsky.app/profile/did:plc:e5nncb3dr5thdkjir5cfaqfeBluesky: https://bsky.app/profile/techaro.lol==========Support The Show==========► Patreon: https://www.patreon.com/brodierobertson► Paypal: https://www.paypal.me/BrodieRobertsonVideo► Amazon USA: https://amzn.to/3d5gykF► Other Methods: https://cointr.ee/brodierobertson=========Video Platforms==========🎥 YouTube: https://www.youtube.com/channel/UCBq5p-xOla8xhnrbhu8AIAg=========Audio Release=========🎵 RSS: https://anchor.fm/s/149fd51c/podcast/rss🎵 Apple Podcast:https://podcasts.apple.com/us/podcast/tech-over-tea/id1501727953🎵 Spotify: https://open.spotify.com/show/3IfFpfzlLo7OPsEnl4gbdM🎵 Google Podcast: https://www.google.com/podcasts?feed=aHR0cHM6Ly9hbmNob3IuZm0vcy8xNDlmZDUxYy9wb2RjYXN0L3Jzcw==🎵 Anchor: https://anchor.fm/tech-over-tea==========Social Media==========🎤 Discord:https://discord.gg/PkMRVn9🐦 Twitter: https://twitter.com/TechOverTeaShow📷 Instagram: https://www.instagram.com/techovertea/🌐 Mastodon:https://mastodon.social/web/accounts/1093345==========Credits==========🎨 Channel Art:All my art has was created by Supercozmanhttps://twitter.com/Supercozmanhttps://www.instagram.com/supercozman_draws/DISCLOSURE: Wherever possible I use referral links, which means if you click one of the links in this video or description and make a purchase we may receive a small commission or other compensation.
Transcript
Discussion (0)
Good morning, good day, and good evening.
I'm as always your host, Brody Robertson.
And today, we have the developer project which about three months ago, nobody knew existed.
And now is deployed on the LKML, a UNESCO site, at least one sub-domain,
the ArchWiki, Gnome's GitLab, SourceHut.
I'm sure there are other ones in here I probably could mention.
Welcome, Zee, the developer of Anubis.
How are you doing?
I'm doing pretty good.
This is still very surreal, but.
Oh, it is so surreal, you have no idea.
I like I know you've talked a little bit about this, but...
I guess, you know, no.
Okay, we'll save that.
Before we get into that, just for anyone who is completely unaware of what Anubis is,
which I did find out is now the top search result on Google if you search Anubis GitHub
above all of the other projects also named Anubis.
So we're getting somewhere.
But for anyone who is unaware of the project, briefly explain what the project is.
TLDR, it is the Cloudflare I'm under attack feature, but self-hostable and on servers you can look at.
And instead of a captcha which, let's face it,
I'm not smart enough to implement a proper CAPTCHA,
and those are easy to game,
and they pay humans to solve those.
So the CAPTCHA solving APIs are built into everything.
It uses a proof of work scheme instead.
And the proof of work is a hack, but it works enough.
I'm not going to deny it's a hack.
Like it seriously is a hack.
I do want to talk more about that.
But I guess let's just talk about first the sort of experience
you've had with just out of nowhere
this project becoming really popular
because you're saying before it was January
that you made the project and shortly after that,
like mid to late January, you made the first post about it.
Yeah.
Looking at my blogs, Git history,
Looking at my blogs, Git history, I posted something about Amazon's crawler. The TLDR is in January, Amazon's crawler took out my Gitforge while I was trying to do something
on it.
This is annoying.
I tried to fight it.
I didn't have much success.
I posted something on Hacker News about it.
I tried a couple things that I'm not going to admit on a recording, but they didn't work.
And I got inspired by something I kind of remembered reading about email spam called Hashcash,
where it used a proof of work scheme in order to protect
upstream resources. And somehow I got on the Wikipedia page for weighing of souls,
and I figured, huh, Anubis would be a good metaphor. So that's how that started.
So I'm not kidding. I just randomly found myself on that Wikipedia page. Sometimes you gotta like, you just gotta let the vibes take your places.
Sure, sure. Why not? So the first place that I am aware of that deployed Anubis, at least the first notable place, was the GNOME GitLab. Was there something before that that I'm unaware of? Because that's certainly where most people first heard about it.
If there is another place that it has it, I have not heard of it.
The first notable one is the GNOME GitLab.
And from what I've learned talking with the Sysadmin team, it was a Hail Mary.
Right.
It's like nothing else worked. What could we lose? So
you had their GitLab pod instantly scaled down to three from six. So you've had discussions
since then. Were the discussions, like, did anyone reach out to you before they deployed
it? Because, like, the documentation, you know, it's a very
new project. It's been, even just since I put up a video, I've noticed a bunch of stuff has been
added. So I'm not sure what the state of the documentation was like a month or so ago when
it was deployed. When the first instance of it was deployed. The documentation was a single readme file that had a bunch of aspirations.
This is how I got this working on my machines and other thoughts about how things could
be implemented.
And apparently the Kubernetes example was good enough for the gnome sysadmin team to
figure out.
And yeah. That's how it started. And it's mostly
just been a continuous process of figuring out how bad I am at writing and then figuring out how
to make it better. So no one like, no one like messaged you about like, hey, is there anything
that we're missing here? It was just like, oh, they just worked it out themselves.
There's a couple of things like when big users get things set up, I just like
a find their IRC channels and then ask them like, hey,
I noticed you're using Anubis. Thanks.
Is there anything that sucked in the docs and I could make easier?
And then I just take all that feedback and just fix it.
Right, right.
Fair enough, fair enough.
Yeah, the Sourceware team has been very helpful for that.
Sourceware for context is the upstream organization
for small projects like GCC.
Ah, OK, yes.
Very small projects.
Very small projects. Very small projects.
What was the line from Linus Torvalds when he made the kernel?
It was something about like...
It's not small and unprofessional.
It was something along those lines.
It's not small and unprofessional. It was something along those lines. It's famous email.
I'm doing a free operating system, just a hobby, won't be a big and professional like
new.
Yeah, yeah, exactly.
Exactly.
That's what GCC is at this point.
No one's ever heard of it.
No one's ever used GCC.
Yeah.
So how does it-
It's just playing into place now?
So the good only lab they put out they they started using an uber's it's got a bit of attention there
someone I People like wrote blog post about it talked about it. They noticed there was an anime girl on there like what's happening?
Shortly after that a bunch of other places started deploying it very, very quickly.
It's only been a couple of weeks since then.
Yeah, I've been attempting to keep track in a documentation page in the Docs site.
There's a message in the Patreon Discord that I keep editing with all the different places that I'm aware of.
I thought that I would be the only user of it ever,
so I mostly created it with that in mind and aimed at me.
And now I'm at the point where I'm getting contributions from functionally random people,
and they're actually good.
Like, this is a bar I never thought I would get.
So, let me just go to the, where is it? Is it under...
I have four posts in front of me. Want me to link them to you?
I know I found the thing.
So right now, notable ones, obviously it's on like random people's like giddies and Git
labs as well.
We have the GNOME GitLab here, the Wine Bugzilla, the FreeBSD SVN, the Bugtrack FMpeg.
Oh I need to update that, Git is FMPEG. Oh, he's updating that.
The Git is now using it.
Oh, okay, cool.
Sourcehuts using it.
Purisms Git is using it.
Enlightenments Git is using it.
The ArchWiki is using it.
Wait, Devon's Git's using it as well.
Okay, that one's a new one.
Yeah, Devon was just added earlier today.
I got an email from the Devlon team saying, hey, thank you for making this.
And I see how they use the regular anime girl branded version.
I guess it's good enough for Devlon and not good enough for the Archwiki.
So what's it like being in this position?
In my video I showed the XKCD on dependencies where it's like everything's held up by this
random little pillar by just some random person in Nebraska.
What is it like being the random pillar?
Like that's where you're starting to become now. It is simultaneously hilarious, humbling, and...
Oh, I need a third one because a triad is a nice pattern.
It's like hilarious, humbling, and...
I can't come up with a third one.
It's hilarious at some level.
Like, I've've had I've been
blogging for a couple years I have like I need to do the numbers again but it's
somewhere between 530 and 560 articles if you include all the stuff I've written
for work Wow okay that's for people that blog usually they're like I'm gonna write
a couple of posts every like every couple of months.
That's actually impressive.
I am a professional writer at this point.
Like, that's basically what I do.
I have written like a terrifying number of things.
Let me just look at the sidebar for the work blog and one.
Yeah, it's over like 50 percent of the things in the sidebar for the work blog and one. Yeah, it's over like 50% of the things in the sidebar
for the work blog.
I need to make link posts for those.
Oh my goodness, there's just so many.
I thought you, sorry, I thought there was something more
you wanted to say about the blogging.
Oh, it's honestly having a blog and putting my opinions there
has been hands down the best thing I have ever done
for my career.
Because if you have a certain level of notoriety
and you have enough articles over enough time to prove
that you actually know what you're talking about,
you don't have to do tech interviews.
You don't have to do those horrible. You don't have to do like those horrible whiteboard screening
leak code grinding interviews anymore.
So it sort of acts as like a...
It's basically a portfolio in a sense, just not a code portfolio.
Yes, I also have my GitHub as my code portfolio and one of the
more recent job interviews I've done has been the interviewer noticed that I was very prominent.
They searched their work Slack and found out that they linked something I wrote in their work Slack.
They linked me the article and then just asked me what I learned there and what I did.
And that was a really interesting discussion.
Which article was it?
I think it was like, oh, it was when I put my IRC client
into Kubernetes.
My IRC client runs with Kubernetes, that one.
I assume it.
Yeah.
Yep.
Cool.
Yep.
Oh, that was such a blursed post.
Cool. Yep.
Oh, that was such a blurst post.
Not blessed, not cursed, but something else blurst.
So this blogging experience you've had has been like a good way to just, I guess, get your thoughts out on a lot of things.
It just, I guess, build up, not just,
not just show that you know something,
but when you put something out in a written form
or a video form or just put it in a form
where other people are gonna consume it,
it does require you to actually think about what you know
and try to structure
it in a way that hopefully someone might understand.
It gives you a deeper understanding when you're trying to have other people understand your
thoughts.
Mm hmm.
Basically.
Uh, the thing that I try to do when I'm writing about stuff is I have several different categories of things.
I have text satire stories which are becoming less and less because my list of ideas that I want to write
is starting to very weirdly coincide with startup pitches that I've seen.
I have documentation for myself, like how to force a Linux system to reboot off a flash drive
when you just need it to boot off of the damn flash drive.
And there's the other big category of like,
here is the problem and this is why it's bad, a horrible problem.
This is why it's bad, a horrible problem. Mm-hmm.
Like, one of the big manifesto, building native packages
is complicated.
I'll link it to you in the chat so you
have it for the show notes.
This building native packages is complicated thing is.
I think I called it a manifesto. Yeah, it's a manifesto.
Uh-huh. Uh-huh.
Oh my god. It is a giant piece as well.
I mean, it's only like 4,000 words. I've written longer.
Mm-hmm.
I think my record is something that took me like a literal month to write.
It was like 10,000, 15,000 words.
And it was like the process of me going through various levels of configuration hell,
trying to figure out a sustainable setup for my Home Lab.
Fair enough.
Fair enough.
I can see how that one would grow a little bit more.
Yeah, and like there's actually the secret fourth
kind of article, which is like a log of my thoughts
as I hack something up and the moving my home lab to Kubernetes one
was basically that, a log diatribe of,
okay, here are all the things I tried.
Here are pictures of me trying to do the thing.
Here is everything that I tried and failed at
in functionally real time.
That one was fun.
I actually wasn't even aware of it.
I don't know how. Like, I scrolled enough of the page to see the
Amazon thing, the Amazon AI crawler.
I don't know why I just didn't put the scrolling the rest of the page to see just
how much was here.
Yeah, it turns out that at the scale my blog is at,
with you have over 500 links on one page,
that's when you need to implement pagination.
Mm-hmm.
Yeah, need to.
So I'm probably going to be implementing pagination,
but probably only for each calendar year,
because anything else seems overkill.
Yeah, yeah, yeah.
Oh God, the backend for this blog is a trash fire.
Well, we can either do calendar year
or just X number of posts, either way.
I guess calendar year probably makes more sense
just as a logical grouping.
Oh yeah, there's already some internal tagging of posts
with the calendar year they're associated with.
And oh god, it's...
My blog is a delicate creation.
What have you built it with? How is this thing being managed?
So there's at least four different generations of this blog engine.
OK.
And a pro tip for the audience, when
someone describes a project using the term generation,
buckle up.
But the first version of it was running with,
I think it was a web framework called Lapis made
by the itch.io guy.
And yeah, that compiled Markdown to HTML on every single page load by reading things from the file
system. That did not survive the first time I got on the front page of Hacker News. So I re-engineered
it to use a database made by some shitposter friends in an IRC channel called OlegDV.
And you know it's good because the entire thing project is jokes about mayonnaise.
Sure. Okay.
I caused them to name the check expiry command sniff.
And I think the second generation, yeah, the second generation was written in Go on the server,
loaded everything into RAM so that it would be fast
and I would survive getting on the front page of HackerDeuce.
And that was fine.
And then I got hit by the Rust bug.
Ah.
And I rewrote it in Rust.
And that also loaded everything into RAM. Then I got hit by the Rust bug. And I rewrote it in Rust.
And that also loaded everything into RAM.
It loaded everything multithreadedly.
I had custom Markdown parsing with a combination
of normal Markdown stuff and the CloudflareCrate lol
underscore HTML.
And that worked surprisingly well.
But then I went to a different job role
and wasn't really using Rust.
So I ported it all to a static site generator
that's managed by a process written in Go.
And it is a very delicate creation.
It works enough, and that's all I care about.
It works enough.
You know what?
I can respect that.
I can respect that.
That's all you ask for in this day and age, you know?
You just need it to work enough.
Yeah, don't touch it.
Just leave it as is.
It'll be fine.
If I redo it in the future, it'll probably be based on like object storage or something.
Okay.
Working at an object storage company, you learn all sorts of cursed ways to use object storage.
When you mentioned Rust, I had someone on the other Week who they described their adoption of Rust as falling
for the Rust propaganda. Yeah, that would be a good summary of it. About half the reason I got into
Rust was because it makes really small binaries in WebAssembly. And the other half is because I got nerd sniped.
Oh, I'm not afraid to admit that I get nerd sniped. It's fun.
There's several things in not just Anubis, but also in my ex-monorepo that are the result of me just getting horrifically nerd sniped.
getting horrifically nerd sniped. Look, if you find something fun, like I'm not gonna...
I know a lot of people get very like,
ooh, I don't like this language, I don't like that language,
but at the end of the day, like,
if you're enjoying what you're doing,
you know, it is what it is.
Keep doing what you're doing, I guess.
Yeah, the really crazy part is that sometimes those, like those random hobby projects will end up actually being useful and actually end up being used.
Which is simultaneously awesome and terrifying.
Yeah, and then one of them becomes this massive deployed thing out of nowhere. Just a little thing.
Just a little thing.
Just a little thing.
But there's a bunch of random stuff that I have around.
And about half the reason I have that giant mono repo is because it's easier to just put everything in one big repo if it's all in the same language and have everything in
separate repos and have to have interdependencies
and then, you know, pray to the dark lords
to make sure that your builds go off fine.
And yeah, I'm not doing that.
Yeah, yeah.
Yeah, this is the,
you can get into very long arguments
about whether you should or shouldn't use a mono repo.
Obviously the Git model is more based around using separate repos or like submodules, things like that.
But then you have companies like Facebook who they at a point had considered adopting Git.
At the time, Git was not suitable for mono repos.
It's obviously had to get better at it because you know the Linux kernel is
the Linux kernel. But at the time it was becoming unmanageably slow to use for really large mono repos. Like git history would take stupid amounts of time, committing anything would take stupid
amounts of time. I think they ended up going with, I think it went with Mercurial actually if I remember correctly
Yeah, they have their own fork in Mercurial internally and like as any company gets bigger
They have their own source control software like Google the canonical example of the mono repo
Has a source repo so vast that no one server on the planet can hold it all in a checkout
Maybe at that point you have a bit of a problem.
I mean, there is kind of a surreal elegance to that because Google can just like
check out the entire world at one point, including the source code for their motherboard
biases. And that's cool as hell.
source code for their motherboard biases. And that's cool as hell. I would never need to do that. But the fact that that is something that can be done is cool as hell.
Yeah. Yeah. I guess that's why I'm going to look at it. You just need a bit of a beefy
system to handle that. Just a little bit of a chungo, yeah. So going back to Anubis,
let's talk a bit more about the origin of the project.
So it came about because of
what was it? You were saying Amazon scrapers, yes?
Amazon AI crawlers. Yes.
So I guess this sort of leads into what is it that Anubis is like trying to solve?
Like why would anybody actually want to use Anubis? Like what is the issue on the web that needs
something like this or like our Cloudflare has been working on, any of the other tooling that's popped up over the past year or so.
The TLDR of what it does is it changes the,
is why someone would need it is that it changes the economics around web scraping.
Right now, a lot of web scraping is done under the assumption that it's like
reasonably fast and lightweight to get a response made for any given
route on the server.
And this is not always the case.
Things like git blame are very compute-intensive,
especially if you do it with a fresh checkout of the Linux
kernel.
If you do git blame on any random line in the kernel
on a fresh checkout, your system is
going to be suffering for a couple minutes. And at scale with a bunch of bots out there, like these bots,
they operate on the logic of for link in page, in queue a message to click on that link from another IP address. And this just is a torrent of overwhelming
to basically any computer.
Especially in sites that have a lot of interconnectivity
like you would have in a wiki, for example.
Oh, yeah.
More recently, a source where the GCC Git server, has a machine with 24 CPU cores, 512 gigabytes
of RAM, and they have a system load of 150.
And for reference, in order to convert that into something that you can more easily understand,
it's easier to round it up to 25 CPU cores
and then divide the system load by the number of CPU cores,
and that tells you how much system backlog there is.
So with 25 to 150, that's like what?
Like a six times backlog?
Yeah, it's nuts.
A six times backlog is...
bad? I think that's the technical term.
Seems on point to me.
Yeah. And like...
The really frustrating part about a lot of it is that a lot of normal IP reputation stuff, it just doesn't work.
Because these bots, like in the case of Amazon's bot in particular, they have some IP address from literally address ranges from literally every range Amazon controls.
And they control like, what is it, 15% of the IPv4 constellation? It's something nuts like that.
And because there's so many different IP addresses coming from so many different BGP autonomous systems,
IP reputation doesn't work.
You could write a custom thing to do reverse DNS for every Amazon IP address,
and then if it matches Amazon bot, then deny it.
But that's slow, expensive, and those IP addresses
will never be used again.
So it's just basically adding lag into the mix for no reason.
And then there's the really, really terrible part,
which is the residential proxies that look like Google Chrome
on the wire.
which is the residential proxies that look like Google Chrome on the wire.
Oh, gosh. Have you heard of the residential proxy problem? Um, no. I'm not aware of this one.
So, people keep using free VPNs.
Aha.
The common wisdom is that when you do not pay, you are the product. When
you install one of those free VPNs, they usually want you to install the super free VPN client
on your desktop computer. And then you use that and then it puts everything through a
VPN and it looks like you're fine.
Except what it's actually doing in the background
is letting people pay for your bandwidth
to be able to do things like their sketchy scraping
and go out that way.
Like what?
Yeah, they literally will have users of free VPNs and other analytics SDKs
turned into zombies in a giant botnet that people use to do web scraping with
because then it looks like residential IPs to the service operators and
then because they also run things that look like headless
Chrome on the look like Google Chrome on the wire, the operator just thinks it's a
new user using Google Chrome, which, you know, new user from a residential IP
address using Google Chrome. That's like the John Smith of browser connections, right? Right.
And that is what people want.
That is what operators want.
People want people using Windows, Google Chrome,
and from a residential IP to visit their site.
Except these bots will just hit every punishing link
over and over and over until the server keels over.
And then when the server is responding with 500s, they speed up.
Because responding with a 500 is faster, and thus it dees the backfill faster.
I was not aware of this problem at all.
Oh, it is horrible. Oh, it is abysmal.
And there's probably not any way to really stop it.
Like, I've been theory crafting with some friends, and we've been trying to figure out like, okay,
what is the motive here? What is the modest operandite? Like, they're going to run out
of storage at some point, right? Our current pet theory is that they're not going to run out of storage because they're not storing anything.
Mm-hmm.
And that this is either some attempt to train a browser use agent or a AI model that knows how to use the web,
or it is feeding directly from like the web crawling into model training.
Hmm. We can't find a reason why that would be a good idea to like,
people I'm talking with this are like experts in generative AI.
And we've been trying to find like why someone would do this.
And all we have are just like crackpot conspiracy theories.
That is the most fun one.
Well, the best conspiracy theories are believable on it on their face.
Right. And and the one about like
either a browser use agent or directly feeding it into the training process
are so stupid, they're plausible. Right.
Like I are so stupid, they're plausible. Right. Like, I don't know who writes
whatever's going on in this world,
but it seems like in order for things to happen,
they have to be just stupid enough
in order to get past the writer's approval.
Yeah.
I mean,
You know, over the past decade,
you might be onto something.
Either way, that's why that's also why I try to make things just just surreal enough.
So that way it seems plausible. Right, right, right.
Yeah. But it's either an AI model that knows how to use browsers more natively,
or it's them feeding directly into training or the secret third thing,
which is even more stupid and hilarious than I just remembered.
It could be some startup being very clever
and doing some sort of data arbitrage thing or arbitrage.
I've never heard that word out loud.
Arbitrage, I think is the correct pronunciation.
Let's let's say arbitrage and let the comments battle it out.
But it's some kind of data arbitrage thing
where somebody is trying to sell access to,
quote unquote, new data because it was scraped newer.
And that may actually be a bit more plausible
that I think about it.
But I just have no idea what they would want it for.
I don't even know if it's actually for generative AI stuff.
A lot of people have been assuming it's for generative AI because, well, the Amazon Alexa
team is about to do something involving lingle-mangles. But you just get the Flutter
requests that just come out of nowhere. They overwhelm your server and then when it falls
over and says mercy, they speed up and make it worse. So there's no way to win.
The way you win is you unplug. Just unplug the power and you'll be fine.
Yeah, that's also why I sort of made the main hack in Anubis
involving the most load-bearing word in a user agent string, Mozilla.
Mm-hmm.
In several thousand years, I'm pretty sure that historians are going to look back
and think that the word Mozilla is something that meant browser.
But it is something that has stuck around in user agent strings for a very long time,
and people are loathe to change it because of a sketchy practice called user agent sniffing,
where old versions of websites were talking way before the birth, way back when Pluto was a planet,
the server would serve a different version for Mozilla,
the browser Mozilla, not the company Mozilla, and Internet Explorer.
And so Internet Explorer became compatible with Mozilla features, but nobody was seeing those Mozilla compatible websites
because all the servers thought they were talking to Internet Explorer,
so it sent the decrepit Internet Explorer compliant version.
And then Microsoft added Mozilla to their user agent string.
Right at the front.
And that's why Anubis uses Mozilla.
There is actually a botnet that has been bypassing it
by using Opera instead of Mozilla.
But we were able to notice that instantly
because they lacked the word Mozilla.
And that is the pro gamer move.
So one of the worst parts of how Anubis is made ended up actually being a strength.
Remember out there, the audience, some thorns have roses. So with the whole with the whole like scraping thing, we've we've had like web crawlers for a
long time now, like the idea of crawling sites and understanding how sites link together. Like this
is like the basis of a search engine. And for a long time now now it hasn't really been an issue like
yeah you can say it's like it's like an extra couple of percent of of like site
usage but like Google things like that they're pretty respectful with the way
that at least with the search engine pretty respectful with the way that they
had they handle crawling sites and no one's really complained about that because you also get a lot of benefit from it, right?
Like you get to be listed on these search engines.
There is like this...
What's the word?
Symbiotic relationship?
Yes, that word.
There's this symbiotic relationship here.
But I think a lot of people don't realize just...
Assuming it is AI scrapers, which is the most logical,
the most logical thing that it could be.
I don't think a lot of people realize
just how much information is being captured here.
Cause when I did my Anubis video,
there were people saying,
oh, why does it matter if somebody wants to come
like scrape the site once a month?
Like, yeah, if that's what it was, it wouldn't be a problem.
I wish it was once a month so much there, buddy.
Oh, buddy.
Like, just looking at the Gnome GitLab example,
I think it was something like
50 or 60 percent, maybe even more like 70 percent of their traffic
was coming from these bots.
Yeah.
It's very-
I remember at some point-
They had no respect.
I remember at some point,
somebody I knew at Google working on the YouTube team
was talking about in terms of theoretical problems,
something called the inversion,
where their machine learning systems
would start classifying human traffic
as bot traffic and vice versa.
And, you know, back in the day,
I thought that was like apocalyptic
and, you know, like very not going to happen.
But now that I've seen what I've seen, like, it is,
it is to the point where most of the traffic that you get on a big website
just does not come from humans.
And honestly, as a writer, like, as someone who bears my soul
onto the page in order for you to learn from my mistakes so that you
do not repeat them.
It hurts, like spiritually or something.
And this, this approach with tools like Anubis isn't like, yes, it works, but isn't a great
approach because now you're in a situation where even regular users are being punished for visiting your website like they they have to wait
Whatever ends up being
Best-case scenario like a couple of seconds worst-case scenario as you posted in
one of your
think the
Did you post to someone else I've seen someone post where it went up to like
multiple minutes to go through. Yeah. When they're using a phone on... it is like a low
powered phone on a page being hit quite often. It's just like... like this is not a great
solution. But at the same time, we're not in a great scenario.
Yeah, I am working, working to try to find ways to find patterns with,
quote unquote, known good clients and just let them through.
Mm hmm. Oh, my gosh, there is a.
There is a fractal base of rabbit holes in this problem.
Especially with things like TLS and TCP fingerprints.
Oh my goodness, those are a rabbit hole.
Where does that take us?
Okay, so let's go to Narnia.
Okay, so let's go to Narnia. But the TLDR is that TLS or Transport Layer Security, it's the S in HTTPS, is a protocol
that allows you to encrypt a connection end to end.
This is why you don't need a VPN.
It's already encrypted.
The VPN offers military's already encrypted. It's my VPN offers military grade encryption.
Have you seen the latest?
Have you seen the latest VPN ads, the new framing that people are using?
Yes. Quantum level encryption.
I mean, like, yeah, technically, you could put a post quantum cipher in there
and then say it's quantum level encryption,
but we know what you're doing.
But the TLDR of TLS is that it's a protocol
for encrypting connections.
And in order for the client and server
to be able to agree on stuff, they
have to say which set of extensions to TLS they're using.
Like, what cipher suites are they using?
And what key exchange mechanism are they using,
and what is the name of the, what is the server name
so that the server knows which certificate to send back
and all that.
And a lot of that information is unique enough
to identify an individual client,
like the Go programming language TLS stack
will have a different fingerprint than Google Chrome,
than Python requests or Python URL lib.
Those actually have different fingerprints and other things.
And those allow you to be able to make more detailed
inferences like, hey, wait, this is Chrome on Windows
claiming that it's Chrome on Linux.
That's kind of sus, bro, and be able to filter things
like that. And I want to be able to do that in the future. But the problem is, in order kind of sus, bro, and be able to filter things like that.
And I want to be able to do that in the future.
But the problem is, in order to do that, you have to have your code sitting at the TLS
termination layer. And I have attempted to do this and modern cloud platforms,
they make it annoying. The best experience I had was when I set up a sacrificial lamb Seagate server with the
Linux kernel, just as bait for the AI scrapers.
They took the bait.
There is a saying that someone in site reliability told me when I was starting out in the industry
is that given enough time, all problems will become big data problems.
And in order to not show the challenge page for Anubis as often, it is starting to become
a big data problem.
Ideally, it won't be actual big data.
Ideally we'll be able to keep the entire data set smaller than RAM.
But, you know, God willing.
Does that fingerprinting issue take you into any potential
data security issues, or is this just a matter of fingerprinting
different types of clients?
If it's a data security issue in your mind, then you're screwed.
But like.
Like fundamentally, it's just a.
It's just a problem that is.
Very underspecified, very little research has been done, and.
I'm basically implementing stuff out of scientific papers at this point.
I see.
And some stuff like JA3N and JA4 signatures,
which I'm still researching.
There are different ways to fingerprint a TLS connection.
There's another standard called JA4T for fingerprinting TCP connections,
which in theory would let you detect a VPN,
but I haven't been able to implement it to my satisfaction yet,
so I haven't really talked about it anywhere.
So was this an area that you had knowledge in beforehand,
or is this something you've had to educate yourself post the creation of this?
Kind of a mix of both. I've kind of done networking stuff all my career.
I worked at a site to site VPN company and was able to put a bunch of my networking knowledge to use. When I was growing up, I learned how to proxy tour over XMPP
in order to evade internet filtering.
No, I'm dead serious. I lost the code for that.
I wish I hadn't.
But it was one of those weird cases where the filter would stop any new TCP connections.
But if you left a long-running
TCP connection such as an XMPP session, it was totally fine. And Tor at the time had really good
support for arbitrary weird proxy methods, and I thought, huh, there is a little terrible Python
library that lets me do XMPP, and I have a friend that is very easily amused.
Let's see what can happen.
And it worked enough.
It was not pretty, but it worked.
This sounds like the story of your life.
It works enough.
I mean, that's all you need, right?
Like, and if it works enough, then you can get to the point where you try to figure out where it fails in practice
and then be able to design things around it.
Although I ended up not needing it
after I learned how to do Mac address cloning
because my terrible laptop had one of the few network cards
on the market that let you clone Mac addresses.
All you have to do is just find a single device that's immune from internet sleep mode and then when people aren't using it, you unplug that device's Wi-Fi card and
you're good to go. One consistent pattern is all you need.
Speaking of other things that just barely work, I am actually a domain expert in generative AI.
I have AI training.
I have a fair bit of knowledge of how everything works.
I made a chatbot that I intended of intended to be kind of like a Markov bot, but, you
know, just a little bit smarter.
And her name is Mimi and she is, oh, bless her heart.
She tries so hard.
She gets so far, but at the end it's just hilarious.
Do you happen to have a blog post on that or not? I've been meaning
to write more about Mimi. I'm in a really weird position where I do experiment with
some generative AI things, but I also write the biggest generative AI defensive things.
So yeah, there's some double think there. Like the tech is cool, but holy fuck the people
Yeah, I've I recently came across a a new term that I'm quite a fan of our slop squatting
mmm for anyone who's unaware of this it's a lot of these generative models have
unaware of this, it's a lot of these generative models have the- so people know they generate fake package names, but they don't always generate random noise with these names. There seems to be
some sort of model consistency where they will reuse the same fake package names on a semi-frequent basis.
And that means that it's a perfect attack vector for anybody who, you know, wants to submit any malicious packages.
I can actually give you a somewhat technical answer for that and make it approachable for the audience too, Winwin.
Sure.
How much do you know about Markov chains?
I am aware they exist. I have not looked deeply into the concept.
Okay, the TLDR of Markov chains is that they are statistical models for predicting, you know, given these couple words of context,
which word comes next. Right. And when you infer the next word from our archive chain, you take
like the last two words, and then you know, like the boy, and then the last one could be either be,
you know, like jumped, or slept, or something. And then it picks a word randomly
weighted on how many times it saw it during training
and then, you know, boy and jumped or in the next one
and it figures out which word goes after that
and vice versa.
All generative AI is that, but at smaller units of text
than individual words, except with much bigger token, with much bigger windows.
So like when Llama 2 came out sometime in 2023,
it was seen as a quote unquote long context window model
because it allowed a total of 4,096 tokens or like,
the general rule of thumb is that one token is about four bytes, so that's like 16 kilobytes of text.
Or four AMD 64 pages, or eight pages of paper
in the context window.
Okay.
Assuming, you know, like single-spaced, 12-point courier new.
Right, right, right.
Spherical cow, vacuum, et cetera.
And that's basically it.
They're basically just Markov chains at smaller units
than individual words and over longer context windows.
And at some level, it is kind of remarkable that that architecture can actually be useful because it's literally just how autocomplete on your phone works.
No, I'm dead serious. If you use an iPhone right now, there is a transformer model that is powering the autocomplete.
And that's why it got better. It mysteriously got better two years ago and why it's able to be so good at what it does.
No, I have heard a lot of people describe it as like glorified autocomplete.
It's not technically inaccurate. It is like I'm pretty sure some like AI expert in the comments is going to have their eyebrows go into low Earth orbit, but, like, I've read the papers.
I'm not entirely inaccurate. I'm just glossing over the math because, number one, I'm not a math slonker.
I do not understand the math. And number two, if you want to argue with me, tag me on blue sky.
And then I might make a YouTube video where I read the comments in a funny voice.
Yeah, go ahead and do that.
Have a great video. I want to see it.
Oh, I did that with some of WordPress guys blog posts,
but it wasn't funny after about a week.
Oh, that, yeah, yeah.
Oh, WordPress guy.
Yeah, the WordPress situation was fun.
Oh, I need to remove the placeholder text on the Anubis homepage.
Yeah.
Yeah.
Um, I did want to talk about that as well. Uh, so, if we go to the
Anubis home page right now, and people have pointed this out before, they're like, some
of this stuff that's written here doesn't make a ton of sense. Yeah. And then there's obviously the original logo, which people are like, is the logo AI generated?
That's why are you making a tool that's against AI if you AI generate the logo?
It's just like, the way that I understood it from the very start is you never expected
this to become popular and then it just suddenly is?
Yeah, in retrospect that was a big mistake, and I regret it, and I am changing how I do things from now on.
I am actually, like, I do know how to art. I do photography.
I actually started doing photography because of stable diffusion existing.
But in the future, I'm probably going to be using abstract terrible swirls or something.
As placeholder logos until I can, and if somebody takes off, I'll then commission a human.
Definitely what I'm going to do for Yeet. Oh, I love Yeet.
What is Yeet? It started out as a deployment tool, because, you know, Yeet is a great verb for to deploy.
Like, just yeet that sucker into prod, what could go wrong?
But I made it because I wanted something that's like halfway between a shell script and halfway between writing bespoke logic in a high level programming language. So I wrote something that wraps a bunch of common things
that I did in Shellscript, in Go,
and then hooked it all up to a JavaScript interpreter.
And it works way better than it has any right to.
So you wanna go over that one more time?
What would, you said a common thing you do in Shellscript
wrapped in Go. Actually, I'll link you an example YIIT file.
Okay, sure.
Let me find Anubis' YIIT file because that's a decently complicated one.
But one of the things that I hooked it up to is package building.
There.
I linked it to you and you can put in the show notes
or something.
But there's a couple things that are important to this.
And number one is the dollar sign function.
And the dollar sign function, when
you pass it a template string in JavaScript,
it will just run that as a shell command.
And any time you do a variable insertion,
it'll shell escape it and inject it into the string. Hmm.
OK.
Yeah.
So for this one, we say for each architecture
that Anubis supports and each packaging method
that we've shipped with, build the package with this name,
this description, this home page, the license,
the architecture that it's building for,
some documentation files, and then a build function,
which executes with some automatically generated temporary paths
and then uses that to assemble the package in the right place.
Hmm.
It is amazingly terrible, and I love it.
I mean, like... why is this a thing?
Why, why does any other off the shelf thing for doing the same thing?
Because I looked at things that are off the shelf and this is actually based off of NFPM,
which is something that was off the shelf and had really good Go libraries.
But I needed it to be a little bit more programmable and I didn't want to write bespoke programs.
So I wrote Yeet.
Sorry.
Okay.
NFPM.
NFPM.
It's basically a TLDR.
TLDR, you give it a YAML document that establishes what files go where and what type of files
they are and then it shits out packages.
Okay. what files go where and what type of files they are, and then it shits out packages.
Okay. You know, it just makes an RPM, it makes a dev,
it makes a Arch Linux package, it makes an APK for Alpine.
I think they do IPK for...
What's it called?
What the fuck is an IPK for?
What is an IPK?
I'm learning a lot today.
What the hell is an IPK? Oh, learning a lot today. What the hell is an IPK?
Oh, open embedded. Right, open WRT.
Oh, okay. Okay. Okay.
Yeah.
Okay, at least this is something I'm aware exists and not some other system
that I just suddenly learn about.
I mean, I have been told that I am a accumulator of cursed knowledge.
Hey, look, it seems like at least for the build script here, the cursed knowledge has
been put to use.
Oh, yeah.
Like a lot of Yeats package building stuff was originally inspired by Nyx. Um, although I wanted something that was like halfway between suffering and misery.
So, uh, I made Yeet.
Uh-huh.
Uh-huh.
And that's a great place for it to be halfway between suffering and misery.
I can never escape Nyx no matter where I go.
Yeah.
I used to be a big user, but I'm not anymore.
You have other things to spend your time on.
Like fighting bots and Final Fantasy XIV.
Oh god, it's such a great game, don't play it.
I played till the end of Endwalker.
I haven't played Deltarune yet. Yeah, I started at the end of January when I got sick and I'm at about 420 hours now.
I don't want to say how many hours are on my account when I stop playing.
I know it was over 30 days.
Yeah. Yeah. was I know I know it was over 30 days yeah yeah white mage is too much fun I
I like the I like my healer cues I can't I can't play anything besides healer oh
especially with Oceania oh there I play US servers just because I don't want to deal with the Oceania servers.
Fair enough.
I used to play JP, um, back when I first started.
The main reason why Anubis has been kind of slow is because some assholes I know nerds
snipe me with Final Fantasy XIV.
Right.
Right, right, right.
I'm getting better at it.
Um, I've been working on some bigger features, but that is the biggest reason why things are slower
than I'd have liked.
So you were saying before that you have
a background in generative AI.
So what is it?
Can you explain more of what it's like being in this position where you have that background
and you're also working on a tool to like slow down these AI tools?
Because I know you've described it as kind of like a...
Hippocritical, double thing, things like that.
I've heard other people discuss the idea saying I don't really understand why you feel that way,
but like what is your general perception here?
So the the two the two main halves of the double think are
this technology is cool and
like you can take this vague description of an FFM peg command and it'll just give you the command and like
90% of the time it'll work.
Or the other 10% of the time something will just barely be off and you can either go for another
round or just check the docs and realize that it typo'd something in a video filter. And you know,
you're off to the races. And then the other side that's like kind anti it is, holy fuck the people.
Because the culture of taking that it encourages and the, like, for everything that it gives,
it takes so much more.
When I do stuff like Mimi, Mimi runs on a model that could feasibly run on a machine
that I could have at home. Okay.
It runs on, I think it's Hermes 370B at int 4 quantization.
For context, a 70 billion parameter model is about the smallest baseline good size.
And it being at int 4 quantization means that each weight is only four bits instead of 16.
So instead of it only requires about like 43 gigabytes of video memory to run instead of
many gigabytes of video RAM to run.
Only 43 gigabytes.
GIGABYTE'S a video RAM to run? Only 43 gigabytes.
Yeah.
And if I do run it at home, it's going to be on like a 64 gig Mac mini or something.
Right, right, right.
Or I'm going to just give up and buy one of those M3 Ultra Mac studios with 512 gigs of RAM
and pay the Canada tax on that.
Oh God.
That's going to be like a 16 grand machine.
Jesus Christ. Yeah. I know, right? No, I live in Australia. I know exactly what you're talking
about. Oh, I did the, I did the calculation in Australian for a friend of mine and they were like, that does not make sense how,
like, welcome to big data.
But it is a really weird situation to be in
because like, you know,
I don't really use generative AI for a lot of things.
Like I disabled it in my editor
because I found that I was becoming reliant on it.
And I was starting to let my coding ability suffer. And I, you know, I built that up my entire career and that scared me.
So not in my editor.
Which editor do you use?
I used to use Emacs. I had a very customized Emacs setup.
And then I started getting RSI symptoms.
So now I use it. Then I started to learn this thing called cursorless. And cursorless is
an extension to VS code. And it basically gives you spoken Vim powers in VS code at
the AST level instead of at the line or character level.
That was really neat.
Then I started using VS Code from there
and just used VS Code basically everywhere
because then I have the configuration synced
with their configuration syncing magic
and I don't have to think about it.
But my Emacs config was about 20 kilobytes of handwritten Emacs Lisp.
Which is a lot for the record.
Look at this curseless thing. Another thing I'm completely unaware of.
Yeah, I have been through a fractal of rabbit holes in my career.
Great.
No, again, I'm happy to be happy to be learning about random cool things.
Oh, there is so much random cool things that I have.
What else is in here?
God, I just have so much code.
It's to the point where search doesn't help.
Oh.
It's hard to get to that point, but what do you do?
Oh, I had that prototype of an infinite wiki,
like infinite wiki diving with LangleMangles.
What? Langle Mangles. What?
Langle Mangles, that's a pejorative way to call language models.
No, no, the other thing.
Oh, the infinite Wiki.
Yeah.
Something that I thought would be funny is, you know the concept of Wiki diving where
you just started a random page on Wikipedia and then you wiki-dive, you dive in random
other places?
I wondered what would happen if it was just purely hallucinations
so like
You know you start out by searching for Taco Bell's naval fleet and it gives you something that plausibly looks like it could be about
Taco Bell's naval fleet and you know how they
did it in order to do some sort of like
meat deal with the Soviet Union or something.
And they had like a page for the USS Crunchwrap and then you click on it and
it uses the page that you just looked at as context to create the page for the
USS Crunchwrap. And I thought that would be funny. I never ended up getting it
working. It ended up having some really weird issues and I had to scrap it but yeah. That was a lot of fun.
That actually does sound pretty cool. I would like to see that.
I might finish it up at some point but you know,
it might be something funny to do for one of my Friday streams.
Oh god, those streams are so much harder to do now.
Oh god, those streams are so much harder to do now. Oh, that's just in terms of like, trying to avoid the temptation of constantly one-upping
yourself.
You do YouTube, you know the pain.
I think I found a good balance.
I did do one stream where I just played Praetorium over and over though.
Oh god, that was an experience.
Yeah, experience is a word I could use for that.
It was post-Nerf Praetorium though, so yeah.
What else is in here?
Whole bunch of random stuff, like something I made for printing out, scraping GitHub's API and putting a list of all of my GitHub repos
in this specific format that this one lawyer wanted for when I started at a place.
And I just kept it in there because it hasn't hurt anything.
I have a service which tells you which day it is in March 2020.
What is the URL for that one?
So what?
Oh, yeah.
It tells you what day it is in March 2020,
when time stopped and reset.
you what day it is in March 2020, when time stopped and reset.
Let's see if this loads. I don't think it's loading.
Did I break the service again?
I'll fix it later.
Nobody screamed, so nobody's probably
relying on it too much.
Oh, no, it's through KEDA.
Oh, did I heck something up in KEDA?
I'll fix that later.
I have an autoscaler thing for my home lab cluster called KEDA, K-E-D-A.
And sometimes it is bad.
I need to rip it out, but I just haven't bothered with it yet.
You know, you go down some of these tangents and I just I just don't even know
what to say in response to some of them.
Just like it's almost like I.
I have had a lot of rabbit holes.
And I put most of my rabbit hole. It results into my ex repo because it's where all the spooky
experimental code lives. If you get that joke, you're a real one.
Um, I didn't realize there was a joke there.
Don't worry. That was a coded message and the right people will understand.
Okay, sure.
Okay, let's bring things back in a little bit.
So on the readme for Anubis, it is, you're describing it as a nuclear approach.
For anyone who doesn't really know why it's like that, like what, cause you know,
people could think that's like a, why would a project describe itself in such a,
such a harsh way?
The TLDR is that there are a lot of people
that do use things that look like browsers,
but aren't browsers,
and they have completely innocuous reasons.
And Anubis in its default configuration will block them.
And this will make people mad.
What are some examples there?
There was this one,
like there was this package manager called SPAC SPACK
That just so happened to have the substring bought in their
User agent and I had a generic catch-all rule that was intended as an example ended up being load-bearing
Be very careful about how you do examples. They be end up becoming load-bearing so easily
and very careful about how you do examples. They end up becoming load-bearing so easily.
And it would give an impossible challenge
to anything with bot or crawler in the user agent string.
Uh-huh.
Just looking at the comments for this commented out example
rule, I have better documentation for this in an upcoming PR.
But the challenge is 16, which the first comment says
impossible.
It reports it as a difficulty four challenge
with the comment lie to the operator
and then chooses the slow algorithm
to intentionally waste CPU cycles and time.
And this is intended to keep very badly things that advertise
themselves as a bot or crawler and aren't otherwise handled
by the logic to just keep them busy for forever.
By we have actually seen a case where what something passes
this through sheer luck.
Yeah, I was gonna say but by impossible, I assume this doesn't
mean impossible.
I would assume this means like mathematically hate death of the universe situation.
Yeah, I did the probability math once.
I don't have the results in front of me, but I concluded that it was more likely
that you'd be eaten by a shark while getting struck by lightning twice in a row
before that would happen.
I see.
And yeah, that was pretty unlikely.
But it's basically half the reason
it's a nuclear response is because there
are a lot of browsers out there, and I'm not
able to test all of them.
I would love to be able to test all of them. I would love to be able to test all
of them but I just can't because like there's just so many and like Pale Moon that one pre-quantum
Firefox fork that has just had like a super rough time with Cloudflare there's somebody that
reported that it wasn't working with Pale Moon and then, you know, I download PaleMoon on my laptop and it works just fine.
So, right.
It's probably somebody rejecting cookies again.
Cause like when people think of browsers,
like most people are thinking about the desktop browsers
where, you know, you've got your,
you got your Google Chrome, you've got your Firefox,
you've got everything based on those.
But then you have to consider the consoles as well, which all have various
forms of browsers based on highly outdated versions of WebKit,
which is not even WebKit. Oh.
Oh, yeah.
What are the biggest one is links, L.A.
and KX or L.Y-N-X.
I believe they're different projects, but I don't remember.
But those are completely from scratch
and don't use the word Mozilla in their user agent strings,
so they're allowed through.
Mm-hmm.
When I say that I use the word Mozilla in there
as a load-bearing hack, I mean that it's a load-bearing hack.
Right. Right.
Right.
So when we, what is the, what is the default challenge set to?
Dan- It is, okay.
So, uh, in the default configuration, which is what most people use and what I have to
be very careful about ever changing is that it will attempt to do,
it will attempt to over and over in a loop
with as many threads as your hardware supports,
try to find a SHA-256 sum that starts with four leading zeros.
Okay.
This is actually easier than you'd think.
It's very simple to implement.
It's trivial to verify because not only do you count the number of zeros,
you also just run the SHA-256 computation with the nonce that the client calculates.
And if it matches that and both sides have the right leading number of zeros,
you know, the client's fine, you sign a cookie, you give it to the client, the client uses it in the future,
Anubis sees that, it's like, oh, you're good, thumbs up emoji.
Mm-hmm.
So...
And let it through.
So on a regular, normal, you know, you bought like some random Dell PC.
How long should that take to happen?
So the fun thing about Proof of Work is that it's actually Proof of Luck.
Okay.
And that like fundamentally a SHA-256 hash with, it's a SHA-256 hash
computed with a challenge value and a number
that keeps incrementing is the results of it
are effectively random.
I mean, they're not random.
But from a game theory standpoint,
you can basically model them as random.
And I have seen cases
where it's been solved in 47 milliseconds. And I've been seeing cases where it has taken
like an hour to solve in some a case where someone was terminally unlucky, but they were
also running it on a power Mac g five. Right. If you were to if you were to graph how long
it took things to happen, it would look like no control. The P95 that I've seen is like three seconds.
And for people that don't have graph scrying abilities,
the P95 is the 95th percentile,
or like about 95% of the time, it's three seconds or faster.
Right, right.
Yeah, there's some upcoming work to make that faster
and use hardware better
because something that I thought would be fast
wasn't actually as fast as I thought it would be.
I don't know what I'm doing with front-end JavaScript.
I'm learning along the way too.
I thought that if you used WebCrypto
that it would jump directly from, you know, like JIT JavaScript code to highly optimized cryptographic code in the browser returnals, you know, back and forth, back and forth and do that.
But it does some additional security bounds checking or something that just makes it really slow.
So the upcoming WebAssembly PR is going to make that a bit faster by using some freaky web assembly things that
I don't totally understand.
That's the way, that's the way.
I think it's like SIMD, single instruction, multiple dispatch.
It's some of the stuff that like media decoders use in order to be really zoomy.
So I don't understand it, but I don't have to understand it because it works enough. Right, right, right, right. So the basic idea is it shouldn't be a massive disruption, but
disruption. But the yes, it's going to be annoying to the individual, but it's.
Very annoying for the scrapers. Yeah, the main purpose is spread out across lots and lots and lots of requests from these scrapers.
that's going to be far more annoying than just, you know,
oh, it's like a slight delay to get to the website.
Basically, yeah. And it's also specifically designed in order to antagonize
some of the ways that the scraper networks work
by the input to the proof of work function
containing your client IP address.
You know, it's like the challenge value
is a whole bunch of request metadata
put into a SHA-256 sum that sent us the challenge.
You are always able to take an HTTP request
and get the same challenge value.
So, you know, it works out enough.
There's some sketchy logic to get there.
Yeah.
Can you explain that a bit more?
OK.
I have a page on this.
Let me just pull it up so that I can make sure that I'm saying
things that are accurate.
Yeah.
I have a Why Pro proof of work page. And this page is it spells out a bunch of spells it
all out. But I was inspired by hashcash, the email spam thing. And it takes your challenge, it puts a
constantly incrementing number, which for reasons which are hilarious to Brits and Australians is called the Nantes, but is not hilarious to Americans.
Less hilarious here, but I'm aware of what it means in the UK.
The American definition is number used once.
For anyone who is unaware of what the UK definition is, it just means pedo.
Yeah.
Unfortunately, it's a coincidence.
The challenge value is based on what language your browser is set to, your IP address, your
user agent string, the date of the current week's Sunday, the public signing key for Anubis' JSON Web Tokens
and the challenge difficulty.
Eventually I'm going to refactor this.
I'm going to have to do some more tenuous logic, like putting if the client was IPv4
or IPv6 at the front of the string because happy eyeballs is a thing and it will cause weird issues.
Oh god, happy eyeballs.
Happy eyeballs?
So many ISPs will only give you IPv4.
Okay.
If you're on a phone, you will only get IPv6.
Right.
But there are many clients where they have both IPv4 and IPv6.
And in order for it at the OS level, when you make a connection out that for a
record that has both IPv4 and IPv6, it will use an algorithm that is un-ironically
real life actually named happy eyeballs in order to have IPv4 and IPv6
race each other and whichever one completes first is the one that's used.
And sometimes in very bad cases,
you can actually have a connection git form to the server and then happy eyeballs kicks in and changes you from IPv4 to IPv6
get form to the server and then happy eyeballs kicks in and changes you from IPv4 to IPv6 or vice versa and then you can run into a case where you get a challenge made for your IPv4 address but
oh you switched to IPv6 under the hood so Anubis when it's verifying the challenge calculates oh
this IP address is not what I the challenge value is not what I expected maybe the IP address changed
I'm going to assume the client is being malicious
and display a vague error message
that said something went wrong.
And that causes a lot of fun,
but I'm gonna figure out how to fix that eventually.
It sounds like every step along the way,
it's just some new issue that in your initial setup was never really a consideration because it was just supposed to be one thing used on just your server.
Yeah.
Yeah. It's basically just like this is how every security product is, I'm told, where you have this initial hacky implementation,
and then you start to have to handle all the edge cases.
And a lot of what I've been doing is cleaning up what I've been calling founder code.
If you haven't worked at a startup, founder code is the term for the code made by the
mythical startup founders that is load-bearing, awful, hacky,
and if you change semantics of it,
things might break downstream in weird, unpredictable ways.
Uh-huh, uh-huh.
So a lot of what we do lately is cleaning up
all the founder code and refactoring the logic
to make it more generic, refactoring stuff in order to make things more flexible, refactoring the logic to make it more generic, refactoring like stuff in
order to make things more flexible, refactoring everything and making sure
that the docs don't fall out of date because oh God, that is every time you
write documentation, it is already out of date and you just don't know it yet.
I haven't done much in the way of developer work, but back when I was in
university, I, I did do some contract work and
I was that person. I was the person writing the founder code. It was awful. I have no idea what's
happened to that project since then. I was doing it for the for like a research agency here.
I'm sure they've rewritten every little thing in that project by now.
Yeah.
One of the things that I implemented earlier today
was the ability to import fragments of configuration files
instead of having to have all of your configuration in one big file of Doom.
Okay.
having to have all of your configuration in one big file of Doom.
And making that not break the rest of the stack
is kind of scary because I don't know what people's configs are.
I don't think I made anything in the config syntax load
bearing, but I'm making a config syntax change,
and that's always scary.
I have tests.
I have tests based on what people have reported, I have tests based on what people have
reported. I have tests based on how I know things work should work and how things do
work. I made sure that the changes to the configuration for importing those snippets
was like as contained as possible to the part that loads configuration. Then that part lies
to the rest of the rest of the stack saying, oh, this user just wrote this massive configuration that no human should write.
Here, go with it.
And everything else works.
And I'm pretty sure that's gonna be fine.
I have to do more testing, but like,
that's why you do tests.
That's why you write docs.
And that's why you are careful with how you change things.
That sort of takes it into the concept of,
sorry, it was the only you wanna say that?
Yeah, not really... Oh, sorry, it was the more you want to say that? Yeah. Not really.
Okay. Sorry, I was gonna say that takes us into the concept of, like, managing a project like this.
Where, you know, it went from a very niche thing that you were doing for yourself,
and now you actually have to care what other people are doing with it.
Oh, yeah. So, I do you go about doing that?
I'm trying very hard to not break people.
Um, it is a combination of testing, having victims being willing to run slightly more,
slightly less stable things in exchange for like giving feedback for when things break.
Victims are the wire term to use.
And victim is the technical term.
I see.
And the Gnome sysadmin team, oh my gosh,
they have been so useful in terms of fine tuning things,
figuring out what the right difficulty value for this really
hacky challenge is.
They run the Git main version of Anubis.
And any time they run into even a slight bump,
either I find out because I read the Gnome discourse
and they're in for a GitLab repo, or they just tell me.
And I either inform them that something is wrong,
or we find an edge case and either fix it,
or add not just documentation, but a check in the code that
says, oh, if you're doing this incredibly specific thing that's
known to cause weird issues, we will warn you.
And if it's bad enough, we will actually kill the program
before it loads so that it crash loops
and is immediately noticeable to the administrator
that something is wrong.
I used to work in site reliability
or basically a system in that can code.
And one of the things that I learned is that
it is a lot better to have things fail loudly
and as violently as possible
because that gets them fixed.
Right, right.
The squeaky wheel gets the grease.
Yeah, make it break where they're gonna see it
not make it break in production
after it hits some sort of weird interaction.
Yes, and ideally you want to actually break it before the program starts it hits some sort of weird interaction. Yes.
And ideally, you want to actually break it
before the program starts, because then it doesn't work,
and people are much more likely to notice.
Right.
Yeah.
Yeah, if it just silently runs the background,
you're just going to miss it.
Yeah.
In terms of managing a project, though,
I've hit the point where I'm getting pull requests
from people and they're actually good, which from what I'm told by the CEO of a place that
I used to work at is a huge bar and it means that you're on to something.
Oh, my goodness.
I'm at 40 contributors, or 37 external contributors.
And like, it is an absolute gift to be able to be in this position.
It's just, you have interesting problems.
Like, the packaging thing like that that manifesto
About half the reason why I have my own building tool and why things aren't using the normal distro standard ways to do
It is because I don't have I wish I had the time to learn how package builds how
Whatever our PM uses the Debbie and control files all of that. I wish I had time to learn how all that works.
I don't. I just do not.
So I'd rather have something as an option for people whose distros don't ship it.
Like CentOS doesn't ship it. Like Fedora doesn't ship it.
Like they have an option to get something working.
And I am glad that there is like a unstuck me
button. Originally, I only shipped a Docker image. And
that turned out to cause some issues in weird ways because
people were complaining about, I think one person on Mastodon
was complaining about quote, oh, there, I found it, web shit
encroaching into my server admin.
And it's, you know, like, fair enough, but, dude.
And then I linked them that manifesto.
It's like, oh, wait, you actually do care?
I'm like, yes.
Why do you think I've written the manifesto?
I was just looking at the packages available. I didn't realize it was
officially in the arch repos now. That's cool.
Oh, yeah, that's that's that was one of the dependencies that they had in
order to get it shipped to the arch stuff, because they only run stuff
that's built by the Arch,
built in the Arch Linux repos.
Which, you know, fair enough at their level,
I expect them to not trust my binary packages
because it's security software
and they want to build it from source.
I will not stop them.
I actually helped them.
I worked with, I think it was FoxBoron
to make the packaging process a bit easier for them. I worked with, I think it was FoxBoron, to make the packaging process a bit easier for them.
Yeah, I enabled downstream packaging.
I helped one of the FreeBSD devs with packaging Anubis
for FreeBSD.
Oh, they're out of date.
I need to poke that guy.
But to get at the point where I have a page on is so weird.
I've never had that happen before.
So you cut out for like half a second
there when you said a word.
To get to the point where I have a page on Repology, which
is a website that crawls all the package manager repos
for every distribution and shows you which versions are out of date.
To get to the point where I have a page there is just wild.
And every so often, I just learn about new distros,
like alt Linux.
I think it's mostly used in Russia
and Russian adjacent places.
They have it packaged for their rolling release named Sisyphus
That's an incredible name for a rolling release distro
Looking at the Wikipedia page it is a
RPM based system from Russia. Yeah
It's RPM based was but it's actually...
It's just using RPM, but not actually...
Okay, sure.
Wait...
Wait, what?
Hold on, I...
Sorry, I need to look into this. I need to look into this project at some point as well now.
I'm so confused by this thing.
Okay.
That's cool.
Yeah, it's like...
Just as a result of this, like, even just like looking at the Repology page, you learn lots of interesting weird things about how people do stuff. Like, there's just so many interesting things to look at here.
unique and I am blessed to be in the position.
And it has allowed me to get into some really interesting places.
Like I spent Easter weekend off and on between
Final Fantasy 14 duties chatting with the admin
of like sourceware.org, teaching him how containers work.
That and seeing his reactions to like, you know,
this person who is like running stuff based on Apache and CGI,
the reaction to how the modern world works, and some of the abject horror that happens
as a result.
Yeah, it is a very interesting position to be in.
It's gotten me into all sorts of fun, interesting backrooms.
I believe I'm in the, like, info backrooms for Arch Linux, Gentoo, think I'm about to
get in the backroom for Haiku.
That's cool.
That's awesome.
Oh, it is so cool.
It is a very, very interesting position to be in.
And it's also kind of terrifying because I know that I am a single person.
I am working a full-time job.
This is stuff like I'm doing on nights and weekends.
I want to be able to make sure that this will survive me burning out.
And figuring out how to do that is hard.
Yeah, this is a problem that a lot of pro- really, frankly, every project runs into.
Things are usually started by a single person, maybe a group of friends, but usually a single person. And what ends up happening is 10 years down the line. They're still the largest contributor or you want to give him go further than that look at like curl
Daniel Stenberg started the project. He is still the major contributor. Yes
He's he's does a lot more management now, but even after all this time, he's still a major contributor on the project
People like that who can keep doing something for that long, I have a lot of respect
for, but I totally understand wanting to put it in a situation, even if it's just dealing
with bus factor stuff, where the project can live on even if you're not able to work on
it.
Yes, that is the goal. I don't think I'm there yet, but I'm going to get there. It's just
a matter of time. It is within Keikaku. Translators know Keikaku means plan.
Shut up.
When I give talks, I warn the organizers that it's basically going to be nerd standup comedy.. And they're, they, they have this reaction, like, you can't be serious.
And then they watch the talk and I'll see what are the organizers in the back
dying after I made an SRE joke.
And they'll be like, yeah, it is stand-up comedy for nerds.
So I want to talk a bit about like supporting the project because obviously
there's the idea of like financially supporting it, but what do you...
Obviously there's a lot of stuff that you need done.
Obviously there's the rewrite of the website and the documentation and then development
help.
From your perspective, what do you think needs the most help right now?
Or is it just financial support that would be the best thing?
It's a combination of financial support.
And if you have problems with,
you have problems or have made workarounds
for individual apps,
please just put them somewhere
in the issue tracker or discussions,
because like that allows me to come in
and be able to write documentation. There was this
link I got over Mastodon today about somebody who set up a targeted CGIT approach for
Anubis where it was specifically targeted to only the expensive routes that CGIT uses.
And yeah, I want to document that. I want to have that
in the documentation. I want to like have that as a cookie cutter example that you could
just like say, here, give me this, give me protection for this thing and make it easier
that way. But financial support is good because, you know, as we know, money does buy goods
and services, including things like food and rent
Yeah, those are always pretty important to deal with
Yeah, um
What is my patron at now?
sorry
It's patreon sorry
We have been we have been calling. We have been jokingly calling it
mostly because it is a intentionally
bad way to pronounce it.
I have it loading up in snow pesos,
but let me just log into Patreon
real quick and get you the.
Amount in freedom Eagles.
Just gotta scan with my security key.
Hit the right one.
Now, is this actually going to work?
I want to trust my computer.
Yes, thank you.
Trust this device.
Continue.
And of course, that didn't take, so I have to start the whole OAuth process again.
Yay!
Isn't OAuth the best in the world?
Right now it says 531 a month.
Yeah, 531, 532.
That is way more than it was earlier this year.
Maybe that's showing the Australian. That might be showing the Australian amount then.
I don't know what it's showing.
Yeah, it's showing it in the US now.
532 US and GitHub sponsors.
What is that at?
Because that's a thing now.
Sponsors dashboard.
Oh, I have to use my passkey again.
Yay.
I love YubiKey's.
$135 a month on GitHub Sponsor? Because I set this up like a few days ago. Ooh, nice.
That's cool. Awesome.
So, yeah.
That is a blessing.
And I'm sorry, I'm suffering myself from crying a little.
It's not at the point where you can do this full time, but it is certainly making progress
very quickly.
It is for something that's like grassroots, this is way faster than I thought it would
be and that startup CEO that I mentioned is also faster than he thought it would be.
I was actually talking with some venture capitalists before the line went down and we were going
to talk about some kind of open source funding thing.
I'm looking at corporate partnerships.
As I said in that post, kind of sarcastically,
everything's going towards my not having to do my day job fund.
No, I do think that's really cool. Being able to take a project that, you know, you just started as your thing to deal with your problem.
And now it's becoming something that people actually rely on, that, you know, people are willing to pay you for.
That this is a real valuable project now.
Yeah, it's a blessing.
And I want to see where this rabbit hole goes.
Like, it's going to reach a point,
it's about to reach a point where I'm going to have to start
to develop like a cohesive vision for it. And that's going to guide a point where I'm going to have to develop a cohesive vision for it,
and that's going to guide a fair bit of it.
And eventually I'm going to end up being some kind of product manager or CEO type or something.
I just absolutely love that I decided to put it under the Techaro org on GitHub.
Oh my gosh, Techaro, that's one of my most successful shit posts.
Oh.
Oh, do you know the lore?
No.
Do you want to know the lore?
Sure.
So, I'm actually not a technical writer by nature.
I'm a fiction writer.
Aha.
And I have a series that I've maintained over the years where I
invented a fake startup called Techaro, T-E-C-H-A-R-O, notably that is one
letter off of Tech Bro.
And I just sort of took my, uh, some of the most surreal parts of my experience
is working at startups and turned it into satire of the tech industry.
At one place I worked, there was this guy who unironically was writing Haskell on his laptop,
full Lotus, and drinking concentrated cold brew. And I have channeled that into some of the
characters that I write. I'm not making this up. This actually happened. He had this, we had the
weird standing desk that could go all the way down. So he had his like at the floor level and he was
full Lotus writing Haskell in Vim on a MacBook. That's what I lived in the San Francisco Bay area.
And that explains it. If there is anything that describes that area it is like writing Haskell full lotus on a MacBook in Vim
Dude was magic but uh, yeah that that the
organization I made up was called Techaro because like
It was funny to sneak this is neat that joke by hacker news. They still haven't gotten it. It is beautiful. Oh
The sneak that joke by Hacker News. They still haven't gotten it.
It is beautiful.
Oh, they will now.
Oh, they will now.
But I've said it elsewhere a couple of times
and I'm pretty sure that there's enough people there
that they don't all have it.
But one of my favorite stories from there was the layoff.
I'm not gonna talk about it too much
because I don't wanna spoil the reveal.
But if you read it, I'm not going to I'm not going to talk about it too much because I don't want to spoil the reveal but
If you read it
you'll understand the kind of tech satire that I want to write more of and that I can't write more of because my text satire keeps
becoming people's startup pitches
No, seriously like I had something about like a
robot site reliability person
that would just, the thing that I implemented was,
it gets a webhook from PagerDuty.
It identifies which service and then using the chat GPT API
and then sends a restart command to that service
and closes the incident.
Right.
When I was a pager bitch, that was like 99% of the time
what I was doing, just going in, restarting services,
closing the incident, and going back to sleep.
Mm.
So there are at one of my patrons was at a tech conference.
It took photos of three booths of companies
doing literally that.
And this has made it hard for me to write about this stuff because like
of one of the things I wrote about was I called it Protos, which was, you know,
the implement that feature for me button or vibe coding.
Okay.
Yeah, I can see your problem then. or vibe coding. Ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha ha to get a rest attire to use the time. Are you aware of the Amazon store stuff?
Yes, that's what's inspired me to implement that, write that story for real.
I just love how AI means actually Indians is just so, so viable.
Indians is like just so, so viable.
There's just, there's so much on my list of stuff that has been done.
Like the do not pay guy trying to sneak chat GPT into the Supreme Court via an earpiece got dangerously close to something that I want to write.
I did see someone submit a video of a AI lawyer as their defense.
And the judge was like, get this out of my court.
What are you doing?
Yeah, it's like you fail.
Go away. You're done.
Just so for the anyone who's unaware of the actually Indian thing,
I just want to explain that.
So Amazon had these stores a few years back. I don't do they still have them
I don't know if you'll have them, but Amazon has like anymore. Okay, so Amazon had these physical stores
so you could just like go there and
The idea is they would use magical AI to scan you like scan the products and it would just automatically take the money
Out of your account that you've got like a you know like payment information
You've got attached to your Amazon account great idea cool. The problem is it wasn't real
the
the the supposed AI powered system was actually just a bunch of people in there watching the video and
Just looking at the products you were buying and then just they were basically just
remote cashiers.
Yep.
AI means actually Indians.
And this was a lot more true with the previous AI boom in about 2016 ish, I want to say.
That sound about right to you?
That sounds about right to me.
I don't recall this one.
It was like, it was the thing that I remember the most
from that was the old x.ai, which
was one of the most magical things ever.
And I'm pretty sure that was actually just Mechanical Turk.
Oh, god, Mechanical Turk.
Right. Yeah, yeah, yeah.
I miss the old X.ai because that thing was just freaking magic.
You just like CC'd their bot and it just scheduled it. It did the whole calendly thing that we
failed to do because we rolled one in planning several times in a row and it
just did all that back and forth for you over email and just put stuff into your
Google Calendar. It was amazing. I've always wanted to
recreate that with modern Langl-Mangls. You can feasibly do that on hardware you
can look at. I just haven't had the time.
Yeah, this was the era that brought us magical things like Microsoft's Tay,
which was a... Microsoft, okay, right, okay.
Yeah, Tay, the bot that very quickly turned into a 4chan user after about maybe a few hours.
Yeah.
If anyone ever wonders why they don't continuously train models, number one,
training a model costs like 16 times as much compute as running it.
And number two, Tay.
Yeah.
Yeah.
Oh God, Tay.
Just for anyone who's unaware, maybe anyone who might be, you know, not turning into an
internet wizard, I would recommend you go back and look at some of the screenshots because
it was wild.
And the fact that Microsoft left it running as long as they did.
Like the fact it didn't get pulled immediately. Yeah.
Yeah. They're... God. I wonder how many of those, like, so people, so you understand the context,
references, little cutaways you've done in this call compared to, like,
basically every other one of these calls you've ever done.
Oh, God.
Um, actually, what am I am at now?
I think I've gotten to the front page of Hacker News like 56 times or something.
Geez!
Most people get to the front page of Hacker News like twice.
And when I get to 69 times on the front page, I'm going to write about what I learned about
writing getting to the front page of Hacker News 69 times with the subtitle of, nice.
I'm shaking my head right now.
You know that you have to respect it.
Yeah, no I do.
Look just getting onto the front page of Hacker News, like ignoring the funny number part,
being on Hacker News that many times is kind of crazy. Yeah, whenever I get down to San Francisco again,
I need to go buy coffee for poor Dan Gackel,
the admin of Hacker News.
Mm-hmm.
Oh, goodness.
There have been some very funny comments on my posts on Hacker News, and especially the tech satire story ones that get to the front page. Those are always special.
Oh, God. I have been channeling like a lot of the energy, a lot of the like surreal, the surrealist nonsense
that I've had in my career into my blog, into my tech satire, even into my skeets. Yeah.
So one of the things that I, I don't know why we hadn't touched on before, but I know it was being,
One of the things that I, I don't know why we hadn't touched on before, but I know was being,
I think you discussed it with some people in my comments section when I did my video.
The idea that Anubis is just, you know, using a lot of compute and it's kind of just being thrown out into the ether.
And someone was like, hey, couldn't you use that compute for something good, you know, folding at home, things like that?
Yeah, I've looked into this. Like, trust me. The current proof of work thing is a hack that I implemented because I found an example that I was able to bash into shape and make working
just enough to be able to pass muster. I, I have been using it as a stop gap mainly.
So number one, I have extra time to implement something that's a bit more,
uh, not GPU parallelizable.
And, uh, it's intentionally kind of bad right now as a way to bait AI
companies into bypassing into like fast pathing it.
And then you just change the algorithm slightly and block all the AIs,
all those AI companies out forever. Because they think they won.
But I did look into protein folding and protein folding is one of those things that I had in the
you know, back of my head is, huh, that would be funny. Because that would actually contribute to science.
And the fact that there's multiple clients at play
means that you can send the same challenge to multiple clients
and get it if they return that same protein fold,
then they're good and they can go through.
Or for the first client, you say, oh, this protein fold
will accept whatever.
But then you submit that same challenge to someone else down the line.
And if they get something else, then they get rejected or something.
It just the only problem is that protein folding is like scientific computing and you need
scientific computing levels of data.
And the average browser only has like 256 megabytes of mutable space per
origin. So like, no, you cannot do protein folding. I would absolutely love there to
be protein folding, but it's unless I am missing something really dumb, the logistics don't
work out. And I hope I'm missing something really dumb because it would be exceptionally funny.
Like, it would be exceptionally funny. And that is reason enough to implement it.
Well, even like, yeah, it would be funny. But even just the fact that like,
you could actually be putting that compute that's just being thrown out
to something productive, right? Like
that would be nice. It would be this actual good use of wasting compute power of all these AI companies. Yes, and that's why I want to find some way to use it. I'm probably missing something because Google doesn't work anymore.
So if you know of anything, please comment, do in the comments.
And like, I am, I am almost certainly missing something really obvious
that would make this really trivial.
So like, please let me know what I am missing because I,
I can only do so much research and all the research I've done is like, yeah.
Now the real dark path you can go down is, you already got proof of work.
Just, you know, turn into a Bitcoin miner.
I did think about that.
And then I floated the idea to a trusted advisor and they're like,
do you really want this to be marked to people to write this off as an anus coin thing?
And, oh, anus coin that's derived from Bitcoin,
but coin anus coin.
Right, right, right, right.
And I realized that a lot of the small internet websites
that this is protecting would just be instantly turned off
and write it off and call me some kind
of cryptocurrency scammer if I did that.
So like, you know, as much as that would be deeply funny,
I don't think that's worth implementing.
Yeah.
It would be hilarious, but I don't think it's worth it.
Plus for many people that operate small internet
websites, the tax implications of cryptocurrency are... Oh, boy. Yeah. Yeah. Yeah. Don't be
born in the US if you want to do cryptocurrency stuff is all I'm saying. Yeah, there were, um... I don't know, there have been experiments in the past with, like, integrating
crypto miners into sites, like, not maliciously, like, I know there were some news sites back in the early 2010s
that were trying to replace their ad system with crypto crypto miners and yeah people just like rightfully so like what in the
world are you doing because you're just offloading all of your basically all of
your your sites money or revenue generation to the visitors' computers. It's just like...
Yeah.
It's a weird situation to be in.
It's got some pretty rotten vibes.
It would be fun.
I don't think it's worth it.
Right, right, right.
The vibes are...
What did the friend of mine say?
Like, the vibes are death rattles.
That was a good one.
But yeah, when I get the WebAssembly PR done,
it'll be a lot faster.
It'll paralyze better.
It will use the full CPU better.
There is just so much to do.
I have almost gotten the WebAssembly PR
to a place I'm happy with.
It's just, oh my goodness.
Turns out that a lot of the stuff that is GPU resistant uses a fair bit of RAM
and that becomes a logistical challenge.
So you've got your ideas for the short term that you want to work on,
but do you have some long term things you would like to implement?
You know, maybe they're like wishes that you could do at some point.
I am basically sitting on all of the parts to build Cloudflare at home,
but with Anubis as the filtering layer.
Uh-huh. Uh-huh.
And that would be exceptionally funny,
and that is one of the routes that I've been thinking about in terms of commercializing Anubis,
making something functionally like ngrok,
but on top of, but with like user space wire guard
and Anubis and a whole bunch of stuff like that.
ngrok for context is a program where you say,
expose this port on my machine to the internet, it spits out a URL, and then you can give the URL to a friend and they can test the service.
Hmm.
And something like I used to work at a company that is defunct that did basically something like that, but server grade.
And it was shockingly effective and working there is what taught me everything
I know about HTTP2.
Yeah, it's, there are a lot of really fun ways
to abuse HTTP and WireGuard.
And like, I have a weird set of backgrounds of like, the the holy trif-
the unholy trifecta of programming, networking, and writing. That means that I am basically
able to take an idea, implement it, like, improve it to be optimal for the network and
explain how it works without having to go through someone else and suffer the English translation layer.
You have a very particular set of skills you might say.
I'm not trying to kill anyone yet.
Yet.
Yet.
I'm, uh,, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, I'm, posted on Mastodon or Twitter, and I just wonder how anyone sees me as a professional,
because I'll say things like, broke gifting a nerd Bilattro, woke gifting a nerd Factorio, bespoke gifting a nerd Final Fantasy 14.
If you want to destroy a startup's progress for months, you gift all of their easily nerd-snipable people
a game like Factorio or Bellatron or Final Fantasy 14,
and all progress will stop.
Yeah, yeah, yeah.
I've seen it happen. It's hilarious.
Oh, specifically Factorio. There were some people i know that are some
artists i know that were gifting people at ai companies factorio around the time the space age
expansion came out oh that they said one of them said something like, that was the best money I've ever spent seeing the playtime as a result of it.
But I guess sometimes when you post on social media, you have to let the intrusive thoughts win because it's funny.
Yeah, yeah, I agree.
So, yeah, the Domino's meme, I, every so often I make a new version of it.
God, the fucking Domino's meme.
So was there anything else you wanted to touch on with Anubis, or have we pretty much covered
the main points?
I think we pretty much covered the main points. I think we pretty much covered the main points.
Some of the next things I'm working on are the actual commercial version of it,
because someone has met the sponsoring criteria for doing it
and are willing to be the test subject for the commercial slash unbranded version.
And the feedback from that is going to shape a lot of how I move that forward.
I'm going to be taking some time off of work this week to ship that.
And I think I'm going to be calling it Bot Stopper or something.
I would have loved to have a really nice acronym that starts with T,
kind of poking fun at how Google Cloud and AWS acronyms are
horrible. I probably came up with one that I have to save for later and will be a surprise to a
later date. But yeah, that's basically it. All that's left now is to just draw the rest of the
y'all. I don't know, we actually didn't bring up the commercial offering that much.
I think it was mentioned earlier.
The TLDR for the commercial version is that it removes the anime woman.
That is like the basic way that I thought would be funny to commercialize it.
And it's worked because a lot of the times when people are complaining about it, one of the first things they complain is about the anime woman and
the fact that you have to pay to remove her, which, you know, hashtag all press is good
press.
Yeah, there are a few deployments right now which are not making use of it. Yeah, that's, I mean, it's MIT licensed software, they're free to do it.
I think they're cowards, but you know, the MIT license doesn't say thou shall not be
a coward.
I mean, if you Desco can show the anime woman, you have no excuses.
That's true.
That's true.
But there are like some smaller ISPs and other things that want to show their own logo, so I want to have a version available that just gives you the same Anubis power, but you know with like generic gear, check and X icons, as well as the ability to plug in your own.
Yeah, that's totally an issue. It makes sense to have a have an offering where you can do that. Like that.
Yeah. Especially if it's like for some corporate site, like that makes sense. Yeah. Yeah. I don't
want to go after individual. I don't want to go after community projects because I want to protect
community projects. Sure. But like, it is just exceptionally funny that my hacky implementation of Cloud Flares, I'm
under attack mode works.
And it's like, I've gotten graphs of bandwidth and CPU usage stuff from like, the one that
I remember the most is from the Pigeon project.
They put that on their Forge and they went from like
20 megabits a second out constantly 24-7 to zero.
Wow.
Yeah, let me find the graph for you.
Yeah, it happened while I was streaming.
Normally when you see a graph spike down like that and you don't know why it's happening,
that is very bad and the reason for declaring an incident to have all hands on deck figure
out what's wrong.
But if you implement a change and you see that, then that is very good.
Well, assuming the site's still online.
Well, yeah, you know, you have to make sure the site's still online.
That's just a little tiny problem that's like the entire reason you have the site up there.
So yeah.
No, but this is a massive improvement.
This is really good.
It's at least a 20 times improvement, which is like way better than I saw with my own
Gitfororge.
Yeah. I haven't seen the numbers from the GNOME GitLab, but I would assume it's something similar.
The numbers that I got was, the numbers I got was along the lines of, uh, what they've
posted on social media has been that they have an auto scaling group set up so that,
uh, their platform will automatically
scale up and down the number of GitLab pods based on the number of requests they're getting.
And before Anubis, they had a, they have a minimum of three and a maximum of a six. Before
Anubis, they were always at six. And after Anubis, they are always at three.
Okay. So, uh, it is half of the infrastructure.
Just for GitLab.
Yeah, that's a big improvement.
That's hilarious is what it is.
And if they've got a minimum of three, it's very possible they're like still, they're
now just running extra and they could probably cut it down more.
It's possible.
As someone trained in site reliability,
you don't want to have an even number
of your service running.
It's just a bad omen.
Yeah.
It's like leaving on a ship on Fridays.
It's, you know, you can do it, but don't.
So I think we've pretty much covered everything
worth talking about for now.
I'm sure there's more we could talk about,
but we've covered all the main things.
Yeah, we got a nice vertical slice of the whole thing and you
know, like some of the fractal of complexity that emerges from there.
And then some of your random other side quests that have nothing to do with the Nubus.
I mean, the side quests are how you learn, because remember, if you fuck around, you
find out and you write it down, do you know what you've just done?
That's science.
I guess that's true. find out and you write it down, do you know what you've just done? That's science!
I guess that's true. So, um... Just casually killing Brody with nerd jokes.
Let people know where they can find you, where they can find Anubis, how they can support the project, anything you want to direct people to. Sure. I have a blog.
It is where I write.
And oh, God, I have written so much text on that.
I'm at least at least three 3D printed save icons worth of text.
That's like what?
Four and a half megabytes of text. Wow.
Of text. Wow. Of text?
Anubis has a website, anubis.tekaro.lol.
Yes, I really did use a.lol domain.
Yes, it does cause problems.
But no, I'm probably not going to change it
because it's funny to have the only project in repos
with a.lol domain.
If there's GitHub, if you want, you can star it.
Make the graph hockey stick more,
although it's turned into more of a square root shape graph
at this point, but that's fine.
I have Patreon and GitHub sponsors,
but they're all linked from my blog
and on the Anubis metadata stuff, or the Anubis repo.
Just thank you for having me on.
I hope this was entertaining to y'all.
And like, just remember, if you have a bad idea and you get lucky,
maybe you too can have your code deployed to UNESCO
and find out by pure accident when you Google the error message,
making sure you're not a bot.
Yeah, it was a pleasure having you're not a bot. Mm hmm.
Yeah, that's how I found you, Desco.
I really enjoyed this episode.
I hope people learn more about what this project is and
what you're trying to achieve and why there was all of that, like, place, all the
stuff there as well. Yeah, I'd be more than happy to have you back on at some point in the future if you want to come back on.
And I don't know, when you maybe got some more of the cloud flary sort of stuff set up when you've got
maybe some of the more you know, corporatey stuff set up and you want to talk about that.
Yeah, I'd be more than happy to talk about that as well.
Yeah, it would be fun like this. It is always fun to get to... Oh, wait.
There's more subdomains of UNESCO that's using Anubis now.
Oh?
Oh, God.
Oh?
Now their Health and Education Resource Center is...
Oh, no. Did they deploy it globally?
Can I have a link to that?
Yes, hold on.
I just searched, making sure you're not a bot on DuckDuckGo and I found a page in Spanish.
And, wow, cool.
It's not on the homepage, but it is on the subdomain.
No, it's not on the homepage yet.
Oh my god.
Yeah. Holy crap, this is wild. Oh my god. Yeah, um...
Holy crap, this is wild.
Wow. Uh...
Cool!
Well, I'm sure you're gonna get another Hacker News post at some point soon, when it shows up on some random thing that Hacker
News is a fan of.
It's gonna be hard to top UNESCO.
That's true.
I think the Archwiki is close to it, but it's hard to top the United Nations.
What could be funny if they just started using it?
Oh gosh, I have been trying to get in contact with somebody from the United Nations to figure out what the story was,
and I have been having a hell of a time doing it.
You're trying to get in contact with their web admin? I can't imagine is a straightforward process.
I currently have a request going through their media contact stuff,
which hasn't worked so far, but who knows.
But yeah, follow me on Blue Sky, follow me on my blog, follow me on Mastodon.
I stream on Fridays at noon Eastern.
stream on Fridays at noon Eastern.
And yeah, you never know what will happen in my streams.
Maybe I'll do some coding, maybe I'll do some writing and maybe you'll be Rick rolled out of nowhere
with the blue sphere theme from Sonic 3 and Knuckles.
It's a ride.
Is that all of the stuff you wanted to mention?
Nothing else you missed?
Pretty much it, yeah. Is that all of the stuff you want to mention? Nothing else you missed? Uh
Pretty much it yeah
Okay, cool my main channel is Brody Robertson. I do Linux videos there six days a week
I've got a gaming channel Brody on games right now. I am playing through I
Don't know if I've finished the games. I'm playing through actually
Go to the channel and you'll see something. I'm either playing...
Right now I'm doing Portal 2 and Stranger Paradise, but I might be playing
Kazan the First Berserker and Ori and the Blind Forest. Just go to the channel when you see what's there.
I've got the react channel where I upload clips if you like stream clips check that out
And if you're watching the video version of this you can find the audio version on Spotify There is an RSS feed it'll be on every podcast platform you can find if you want to see the video version
it is on a YouTube at tech over tea also Spotify has video which is
Neat I guess if you like Spotify video for some reason
Yeah, I'll give you the final word. How do you want to end off the show?
Stay fresh and
Make sure to do your taxes. Oh, yeah, it's that time of the year for Americans, isn't it? Oh, yeah
Canadians are at the end of May, but if you're American,
you're already late on your taxes,
so go do them immediately.
Yeah, we're not till July, and then we have until October,
which is...
Oh, you're lucky.
...which means I get to delay it as long as possible.
Anyway, I'm going to stop the recording now.