The Changelog: Software Development, Open Source - The "confident idiot" problem (News)

Episode Date: December 8, 2025

Why AI needs hard rules (not vibe checks), what Anthropic's acquisition of Bun's creators tells us about the AI takeover, Jonah Glover couldn't get Claude to recreate Space Jam's 1996 website, Google ...finally unkills something, and Bazzite is a distro for the next generation of Linux gaming.

Transcript
Discussion (0)
Starting point is 00:00:00 What up, nerds? I'm Jared, and this is ChangeLog News for the week of Monday, December 8th, 2025. We are quickly approaching last call for state-of-the-log voicemails. We record the show in a week, and we have to give BMC time to make the remixes, so if you're thinking about sending one in, and you should, now is the best time. Submit yours today at changelog.fm slash sotel. Okay, let's get in to this week's news. The Confident Idiot Problem
Starting point is 00:00:36 Or why AI needs hard rules, not vibe checks. If you've been following the How do we actually use AI in production? Conversation stream, you've probably heard people propose a strategy where one LLM checks another LLM's results. But will that work? Quote, we are told to ask GPT40 to grade GPT 3.5. We are told to fix the vibes, but this creates a dangerous circular dependency.
Starting point is 00:01:03 If the underlying models suffer from sycifancy, which is agreeing with the user, or hallucination, a judge model often hallucinates a passing grade. We are trying to fix probability with more probability. That is a losing game. End quote. One possible way of dealing with these confident idiots we've introduced into our software stacks the last few years is to stop treating agents like magic boxes, and start treating them like software, hence the steer SDK was created. Quote, steer is an open source Python library that intercepts agent failures, such as hallucinations, bad JSON, PII leaks, etc., and allows you to inject fixes via a local dashboard without changing your code.
Starting point is 00:01:46 End quote. Another way of dealing with these confident idiots in our software stacks is remove them, but that might not be possible anymore. Bun is joining Anthropic. The company behind Bun, which is the open source runtime for cloud code, is joining Anthropic. We discussed the big acquisition slash aqua hire on last week's friend's episode, but at the time I hadn't quite considered this move and how contrary it is to Anthropics party line that AI agents are replacing software engineers.
Starting point is 00:02:17 From Anthropics announcement, quote, we've been a close partner of Bun for many months. Our collaboration has been central to the rapid execution of the ClaudeCode team, and it directly drove the recent launch of Claude's native installer. We know the Bun team is building from the same vantage point that we do at Anthropic, with a focus on rethinking the developer experience and building innovative, useful products. End quote. Bun is open source. Why not just fork it and have a ClaudeCode-powered engineer make all the necessary changes
Starting point is 00:02:47 and upgrades to the runtime that Anthropic needs? Perhaps because there's no getting there from here. At least not yet. Jared Sumner and the Bun team's expertise is what's so valuable, still, even too anthropic. Claude can't recreate classic space jam site. Jonah Glover tried to recreate everyone's favorite 1996 website by giving Claude code, which is running Opus 4.1, a screenshot of the site and all the associated assets. It failed repeatedly in all the ways I would expect from my own front end and design attempts with the tool.
Starting point is 00:03:21 Jonah's finding, which is quite relatable, quote, Once Claude's version existed, every grid overlay, every comparison step, every precise adjustment was anchored to his layout, not the real one. At the end of all of this, I'm left with the irritating fact that, like many engineers, he's wrong and he thinks he's right. What this teaches me is that Claude is actually kind of a liar, or at least, Claude is confused. However, for the drama, I'll assume Claude is a liar, end quote. I've been giving Claude code a lot of props lately, but I've also been giving it a lot of tasks that it can't, quite accomplish. This process starts off as fun and interesting, but each time it ends in failure, I am perplexed by all the possible failure paths. Was it me? Am I prompting? Was it the
Starting point is 00:04:04 agent? Was it the model? Or perhaps I'm asking for things that aren't easily accomplished with today's tech. I can be quite demanding. This all makes me yearn for the days when the only one to blame for my failures was me. It's now time for sponsored news. Depot's advent of code 2025. Depot is running a community leaderboard for advent of code 2025, and they're putting real money behind it. The top five finishers each direct $1,000 to a registered charity of their choice. If you pick a charity supporting STEM education or the developer ecosystem, Depot adds a 50% bonus. They've already generated $7,500 in donations. Here's the format. 12 days of puzzles, unlocking daily at midnight Eastern, starting December 1st. Solve it your own.
Starting point is 00:04:52 pace, there's no time limit. Any language, any skill level, each day brings a two-part programming challenge from Eric Wastell's advent of code. To join Depot's private leaderboard, request access on their events page, they'll send you a code. Whether you're competing for the top five or just want to sharpen your skills alongside other devs, it's a good excuse to write some code this month. Check it out at depot.dev slash events slash advent dash of dash code dash 2025, or just follow the link in the newsletter. It's also in your chapter data. Thank you to Depot for sponsoring change log news. Google Unkills JPEG XL. Quote, in a dramatic turn of events, the Chromium team has reversed its obsolete tag and has decided to support the format in Blink, which is the engine
Starting point is 00:05:37 behind Chrome and Chromium and Edge, given Chrome's position in the browser market share, I predict the format will become a de facto standard for images in the near future. End quote. We're used to things being killed by Google, but unkilled? This is a trend I can get behind. Here's my unkill requests. It's time to bring back zeitgeist, dodgeball, and of course, Google Reader. The next generation of Linux gaming. If the mythical year of the Linux desktop is ever to materialize, it will first be preceded by a sea change in gaming options for the venerable open source OS. The gaming sea change appears to be in full swing, with Steam on Linux hitting an all-time high of over 3% usage last month.
Starting point is 00:06:21 Enter Bazite, a Fedora-based Linux distro that's hyper-focused on making gaming awesome. Quote, Bazite is designed for Linux newcomers and enthusiasts alike with Steam pre-installed, HDR and VRR support, improved CPU schedulers for responsive gameplay, and numerous community-developed tools and tweaks to streamline your gaming and streaming experience, end quote.
Starting point is 00:06:46 The project began back in 2023, but it appears to be maturing and aiming at sustainability by setting up ways to donate with its latest update. Quote, as Bazite matures, we begin to tackle more ambitious projects such as proper secure boot, support for more handheld devices, and conference attendance, which means more costs for us. And we would gladly appreciate the help in covering them. That is the news for now, but go and subscribe to the changelog newsletter for the full scoop of links worth clicking on, such as why I ignore the spotlight as a staff engineer Vanilla CSS is all that you need
Starting point is 00:07:23 and what happens when you take an XKCD joke too literally get in on the newsletter at changelog. News have yourself a great week like subscribe and five-star review us if you dig the show and I'll talk to you again real soon

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.