The Changelog: Software Development, Open Source - The "confident idiot" problem (News)
Episode Date: December 8, 2025Why AI needs hard rules (not vibe checks), what Anthropic's acquisition of Bun's creators tells us about the AI takeover, Jonah Glover couldn't get Claude to recreate Space Jam's 1996 website, Google ...finally unkills something, and Bazzite is a distro for the next generation of Linux gaming.
Transcript
Discussion (0)
What up, nerds?
I'm Jared, and this is ChangeLog News for the week of Monday, December 8th, 2025.
We are quickly approaching last call for state-of-the-log voicemails.
We record the show in a week, and we have to give BMC time to make the remixes,
so if you're thinking about sending one in, and you should, now is the best time.
Submit yours today at changelog.fm slash sotel.
Okay, let's get in to this week's news.
The Confident Idiot Problem
Or why AI needs hard rules, not vibe checks.
If you've been following the
How do we actually use AI in production?
Conversation stream, you've probably heard people propose a strategy
where one LLM checks another LLM's results.
But will that work?
Quote, we are told to ask GPT40 to grade
GPT 3.5. We are told to fix the vibes, but this creates a dangerous circular dependency.
If the underlying models suffer from sycifancy, which is agreeing with the user, or hallucination,
a judge model often hallucinates a passing grade. We are trying to fix probability with more
probability. That is a losing game. End quote. One possible way of dealing with these
confident idiots we've introduced into our software stacks the last few years is to stop treating agents
like magic boxes, and start treating them like software, hence the steer SDK was created.
Quote, steer is an open source Python library that intercepts agent failures,
such as hallucinations, bad JSON, PII leaks, etc.,
and allows you to inject fixes via a local dashboard without changing your code.
End quote.
Another way of dealing with these confident idiots in our software stacks is remove them,
but that might not be possible anymore.
Bun is joining Anthropic.
The company behind Bun, which is the open source runtime for cloud code, is joining Anthropic.
We discussed the big acquisition slash aqua hire on last week's friend's episode,
but at the time I hadn't quite considered this move and how contrary it is to Anthropics
party line that AI agents are replacing software engineers.
From Anthropics announcement, quote, we've been a close partner of Bun for many months.
Our collaboration has been central to the rapid execution of
the ClaudeCode team, and it directly drove the recent launch of Claude's native installer.
We know the Bun team is building from the same vantage point that we do at Anthropic,
with a focus on rethinking the developer experience and building innovative, useful products.
End quote.
Bun is open source.
Why not just fork it and have a ClaudeCode-powered engineer make all the necessary changes
and upgrades to the runtime that Anthropic needs?
Perhaps because there's no getting there from here.
At least not yet.
Jared Sumner and the Bun team's expertise is what's so valuable, still, even too anthropic.
Claude can't recreate classic space jam site.
Jonah Glover tried to recreate everyone's favorite 1996 website by giving Claude code,
which is running Opus 4.1, a screenshot of the site and all the associated assets.
It failed repeatedly in all the ways I would expect from my own front end and design attempts with the tool.
Jonah's finding, which is quite relatable, quote,
Once Claude's version existed, every grid overlay, every comparison step, every precise adjustment was anchored to his layout, not the real one.
At the end of all of this, I'm left with the irritating fact that, like many engineers, he's wrong and he thinks he's right.
What this teaches me is that Claude is actually kind of a liar, or at least, Claude is confused.
However, for the drama, I'll assume Claude is a liar, end quote.
I've been giving Claude code a lot of props lately, but I've also been giving it a lot of tasks that it can't,
quite accomplish. This process starts off as fun and interesting, but each time it ends in
failure, I am perplexed by all the possible failure paths. Was it me? Am I prompting? Was it the
agent? Was it the model? Or perhaps I'm asking for things that aren't easily accomplished with
today's tech. I can be quite demanding. This all makes me yearn for the days when the only one to
blame for my failures was me. It's now time for sponsored news. Depot's advent of code
2025. Depot is running a community leaderboard for advent of code 2025, and they're putting real
money behind it. The top five finishers each direct $1,000 to a registered charity of their choice.
If you pick a charity supporting STEM education or the developer ecosystem, Depot adds a 50% bonus.
They've already generated $7,500 in donations. Here's the format. 12 days of puzzles, unlocking daily at
midnight Eastern, starting December 1st. Solve it your own.
pace, there's no time limit. Any language, any skill level, each day brings a two-part programming
challenge from Eric Wastell's advent of code. To join Depot's private leaderboard, request access
on their events page, they'll send you a code. Whether you're competing for the top five
or just want to sharpen your skills alongside other devs, it's a good excuse to write some code this
month. Check it out at depot.dev slash events slash advent dash of dash code dash 2025, or just
follow the link in the newsletter. It's also in your chapter data. Thank you to Depot for sponsoring
change log news. Google Unkills JPEG XL. Quote, in a dramatic turn of events, the Chromium team has
reversed its obsolete tag and has decided to support the format in Blink, which is the engine
behind Chrome and Chromium and Edge, given Chrome's position in the browser market share, I predict
the format will become a de facto standard for images in the near future. End quote. We're used to
things being killed by Google, but unkilled? This is a trend I can get behind. Here's my
unkill requests. It's time to bring back zeitgeist, dodgeball, and of course, Google Reader.
The next generation of Linux gaming. If the mythical year of the Linux desktop is ever to materialize,
it will first be preceded by a sea change in gaming options for the venerable open source OS.
The gaming sea change appears to be in full swing, with Steam on Linux hitting an all-time high
of over 3% usage last month.
Enter Bazite, a Fedora-based Linux distro
that's hyper-focused on making gaming awesome.
Quote, Bazite is designed for Linux newcomers and enthusiasts
alike with Steam pre-installed,
HDR and VRR support,
improved CPU schedulers for responsive gameplay,
and numerous community-developed tools and tweaks
to streamline your gaming and streaming experience, end quote.
The project began back in 2023,
but it appears to be maturing and aiming at sustainability by setting up ways to donate with its latest update.
Quote, as Bazite matures, we begin to tackle more ambitious projects such as proper secure boot,
support for more handheld devices, and conference attendance, which means more costs for us.
And we would gladly appreciate the help in covering them.
That is the news for now, but go and subscribe to the changelog newsletter for the full scoop of links worth clicking on, such as
why I ignore the spotlight as a staff engineer
Vanilla CSS is all that you need
and what happens when you take an XKCD joke too literally
get in on the newsletter at changelog. News
have yourself a great week
like subscribe and five-star review us
if you dig the show and I'll talk to you again real soon
