The Changelog: Software Development, Open Source - Never. Let. AI. Write. Your. Tests. (News)
Episode Date: June 9, 2025Diwank explains why you should never let AI writes your tests, Apple redesigns all of their software platforms, AI has brought about the rise of judgement over technical skills, Peter Steinberger says... Claude Code is now his computer, and the curious case of Memvid.
Transcript
Discussion (0)
What's up my nerds?
I'm Jared and this is ChangeLog News for the week of Monday, June 9th, 2025.
Just days before their much anticipated WWDC keynote Apple Research published a paper on
the strengths and limitations of large reasoning models,
which I can't help but interpret as,
seriously guys, there's good reasons
why our Apple Intelligence rollout has been a dumpster fire.
You'll see.
Okay, let's get into the news.
Never let AI write your tests.
Developer Dewank's field guide
to a new way of building software starts off as a pretty typical
here's how to be productive coding with AI but then he says something near the end and emphatically
so that I haven't heard anybody say quote now we come to the most important principle in AI
assisted development it's so important that I'm going to repeat it in multiple ways until it's burned into your memory.
Never let AI write your tests.
Tests are not just code that verifies other code works.
Tests are executable specifications.
They encode your actual intentions, your edge cases, your understanding of the problem domain.
High performers excel at both speed and stability.
There's no trade-offs. Tests
are how you achieve both."
Dewank says AI can help with test planning, suggest test scenarios, debug, and analyze
test features, but that it should never touch test files, write test code, or modify test
expectations.
Your tests are your specification.
They're your safety net.
They're the encoded wisdom of every bug you fixed
and every edge case you've discovered.
Guard them zealously.
End quote.
I'm not sure if I agree or not.
I don't think I have enough experience yet
to weigh in with more than a hunch.
What do you think?
Does this ring true to you?
Or does it sound overly cautious?
Apple redesigns it all.
The headliner announcement from Apple's WWDC keynote
was a complete redesign of all major software platforms.
Quote announced simultaneously for iOS,
iPadOS, MacOS, WatchOS, TVOS, VisionOS and CarPlay.
Liquid Glass forms a new, universal design language
for the first time.
At its WWDC keynote address, Apple's software chief Craig Federighi said Apple Silicon
has become dramatically more powerful enabling software, materials, and experiences we once
could only dream of.
Inspired by VisionOS, Liquid Glass is layered throughout the system and features rounded
corners that have been matched to the curved screens of the devices.
It behaves just like glass in the real world and morphs when you need more options or move
between views."
I'm not gonna lie, it's giving me Windows Aero vibes.
It'll probably grow on me, but I can't say I'm super excited about this change.
The return of texture depth and expressiveness in UI trend I featured last week coming on
the heels of Airbnb's redesign is much more interesting to this guy.
The Rise of Judgment over Technical Skill Ever since ChatGBT launched our current AI
madness, developers have been asking ourselves, and each other, what it all means in the long
term. We still don't have that answer yet,
but I can confidently say that at least in the medium term,
it means we must move up the value chain
because the once cherished technical skills we've acquired
are being commoditized at a blistering pace.
There's nothing new under the sun.
Quote, in 1995, musician and producer Brian Enno made a profound observation about computer
sequencers that has become increasingly relevant in our AI
powered world. Quote, the great benefit of computer sequencers
is that they remove the issue of skill and replace it with the
issue of judgment. With Cubase or Photoshop, anybody can
actually do anything. And you can make stuff that sounds very much
like stuff you'd hear on the radio
or looks very much like anything you see in magazines.
So the question becomes not whether you can do it or not
because any drudge can do it if they're prepared
to sit in front of the computer for a few days.
The question then is, of all things you can do now,
which do you choose to do?
End quote.
You know what?
Adam and I had a similar conversation about digital photography while on a photo walk
in New York City years ago.
It was my contention then that the skills required to take great pictures were trending
towards zero and when we get to that point, which we're pretty close to now, the only
thing that would matter is taste, which is just another form of judgment.
In other words, it's a way of answering the question of all the perspectives you can
now capture.
Which do you choose to capture?
In one sense, changelog news is me trying to climb my way up the value chain.
Sure, I write some pros too, but not notably well.
And I read them aloud to you, but not all that well.
What I really do is repeatedly answer the question of all the things you can feature,
which do you choose to feature?
It's now time for sponsored news.
Our best customers are now robots.
Kurt Mackey and our friends at Fly
have had quite the experience, quote,
but a funny thing has happened
over the last six months or so.
If you look at the numbers, DX,
developer experience, might not matter that much.
That's because the users driving the most growth
on the platform aren't people at all,
they're robots.
End quote.
We've talked about LLM SEO a few times on the pod,
and this is why.
Because you don't have to attract humans
when coding agents make tool selections at massive scale.
Kurt and his team are now focusing on the latter.
Quote, if you try to think like a robot,
you can predict other things they might want.
Since robot money spends just the same as people money,
I guess we ought to start doing that.
For instance, it should be easy to MCP our API.
The robots can then make their own infrastructure decisions.
End quote.
Lots to glean from this post. Thanks to Fly.io for sharing so candidly
and for sponsoring ChangeLog news.
Claude Code is my computer. Here's Peter Steinberger.
I run Claude Code in no prompt mode. It saves me an hour a day and hasn't broken my Mac in two
months. The $200 per month max plan pays for itself."
This echoes the sentiment that Steve Yegge impressed upon us on last week's show.
After recording that, I took Steve's advice and gave Claude Code the ol' college try
at writing a few scripts that I'd procrastinated because they were just too much work for their
perceived ROI.
Color me impressed.
The first script Claude wrote was delivered so well on my specs that I decided to vibe
code the second one and didn't even look at the code itself.
Worked great.
Peter says this about Claude code.
Quote, Claude code shines because it was built command line first, not bolted onto an IDE
as an afterthought.
The agent has full access to my file system, If you're bold enough, can execute commands,
read output and iterate based on results, end quote.
I think that's right.
I like clogged code more than I like clogged inside of Zed.
It's even more natural in my terminal
than it is in my editor for some reason.
More to come on this front, I'm just getting started.
But yeah, up the value chain we go.
The curious case of MemVid.
Okay, I'm feeling way too AI bullish in this episode,
so here's a nice balancing story.
A graduate student created a software project
that got a lot of attention online, like a lot of attention.
It's pitch quote,
MemVid revolutionizes AI memory management
by encoding text data into videos,
enabling lightning fast semantic search
across millions of text chunks
with sub-second retrieval times.
Unlike traditional vector databases
that consume massive amounts of RAM and storage,
MemVid compresses your knowledge base
into compact video files
while maintaining instant access
to any piece of information."
End quote.
Now on his face, that sounds amazing,
but it also sounds kinda weird.
Why would encoding text into video use less disk space
or make anything faster?
Well, turns out it doesn't.
Quote, testing shows this library's performance
is the opposite of what the readme claims.
Your text will take 100x more disk space.
Searches will be 5x slower.
Setup will take hours, not minutes.
This library will cause serious problems
at production scale.
The readme's performance claims are backwards."
That was posted as an issue on the repo.
On the heels of this discovery came a new contribution,
a proposal for Memvid 1.0, the universal, streamable,
self-contained AI memory format.
Does that sound ambitious?
Does it sound sloppy?
One commenter sure thinks so.
Quote, GitHub is now infested with AI slop.
AI generated repo with obvious overhead
and no practical usages.
People that has AI replaced brains,
giving stars to this,
and AI generated issues.
Perfect. End quote. I guess the AI slopping will continue replace brains, giving stars to this, and AI generated issues. Perfect."
I guess the AI slopping will continue until morale improves.
That's the news for now, but go and subscribe to the changelog newsletter for the full scoop
of links worth clicking on.
Such as
The new HTTP query method
Containerize environments for coding agents
And markdown with superpowers.
Get in on the newsletter at changelog.news.
In case you missed it, last week Steve Yegge shared with us his adventures in babysitting
coding agents and Amanda Silver, CVP of the Developer Division of Microsoft, explained
why we're all builders now.
And coming up this week, on Wednesday, Richard Feldman tells me all about his rock programming
language and on Friday, Justin Searles is back to help us digest all the WWDC announcements.
Have a great week, like, subscribe and leave us a 5 star review if you dig the show and
I'll talk to you again real soon.