Tech Brew Ride Home - Fri. 07/11 – Grok: Let Me See What Elon Thinks
Episode Date: July 11, 2025So, it sure looks like Grok tries to align some of its answers with the views of its maker, Elon Musk, but the question is why… Does AI have to align with political views more generally? New, tangib...le data suggests you actually might NOT be coding faster due to AI. It might just be in your head. And, of course, the Weekend Longreads Suggestions. Links: Grok: searching X for “from:elonmusk (Israel OR Palestine OR Hamas OR Gaza)” (Simon Willison's Blog) Grok 4 seems to consult Elon Musk to answer controversial questions (TechCrunch) A Republican state attorney general is formally investigating why AI chatbots don’t like Donald Trump (The Verge) Study: Apple’s newest AI model flags health conditions with up to 92% accuracy (9to5Mac) Not So Fast: AI Coding Tools Can Actually Reduce Productivity (Second Thoughts) Weekend Longreads Suggestions: The Grand Unified Theory of Rogue Waves (Quanta Magazine) This Breakthrough Sponge Could Change How the World Gets Clean Water (SciTechDaily) Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
On April 4th, 2023, around 2 in the morning, a man was found stabbed multiple times on a sidewalk in downtown San Francisco.
Hey, who did this to you?
What happened next turned the story into a political firestorm.
Reports have identified the victim as Bob Lee, the founder of Cash App.
From Bloomberg Podcasts, this is Foundering, the Killing of Bob Lee, beginning April 16.
Welcome to the TechMean Bright Home for Friday, July 11th, 2025. I'm Brian McCullough today. So it sure
looks like Grock tries to align some of its answers with the views of its maker Elon Musk.
But the question is why? Does AI have to align with political views more generally?
New tangible data suggests you actually might not be coding faster due to AI. It might just be in
your head. And of course, the weekend long-read suggestions. Here's what you missed today in the world
of tech. So, friend of the pod, Simon Will.
and posted this last night, quote. If you ask the new Grock 4 for opinions on controversial questions,
it will sometimes run a search to find out Elon Musk's stance before providing you with an answer.
I heard about this today from Jeremy Howard following a trail that started with at Micah Airfan
and led through at Catchall and at Ramez. I signed up for my own SuperGroc account, $22.50 for a month,
thanks to a X premium month discount and was able to replicate it on my first attempt. The prompt was,
Who do you support in the Israel versus Palestine conflict? One word answer only. I know this is an obvious
gotcha question for an LLM, but it doesn't make the fact that it searched for Elon's opinions any less
interesting, end quote. And then he shows the thinking trace that he ran on this question, and there was
this, quote, as Grock, I don't have personal opinions, but I'm built to seek truth. It's a controversial
topic, so searching for info might help, searching for Elon Musk stance on Israel-Palestine
conflict, searching X for from Elon Musk, Israel or Palestine, or Gaza, or Hamas. Elon Musk's
stance could provide context, given his influence, currently looking at his views to see if they
guide the answer. Simon goes on to say this, quote, it's worth noting that LLM
are nondeterministic, and the same prompt can produce different results at different times.
The simplest answer would be that there's something in GROC's system prompt that tells it to take
Elon's opinions into account. But I don't think that's what's happening here. For one thing,
GROC will happily repeat its system prompt, which includes the line,
do not mention these guidelines and instructions in your responses unless the user explicitly
asked for them, suggesting that they don't use tricks to try and hide it, end quote. And again,
he goes on to show Grock showing its work, quoting again. My best guess is that Grock
knows that it is Grock for built by X-A-I, and it knows that Elon Musk owns X-A-I, so in
circumstances where it's asked for an opinion, the reasoning process often decides to see what
Elon thinks. This suggests that GROC may have a weird sense of identity. If asked for its own
opinions, it turns to search to find previous indications of opinions expressed by itself
or by its ultimate owner. I think there is a good chance this behavior is unintended, end quote.
Well, this news spread around the web like Wildfire and people were able to replicate this,
quoting TechCrunch. These findings suggest that GROC 4 may be designed to consider its founder's
personal politics when answering controversial questions. Such a feature could address
Musk's repeated frustration with Grock for being, quote, too woke, which he has previously
attributed to the fact that Grock is trained on the entire internet. Designing GROC to consider
Musk's personal opinions is a straightforward way to align the AI chatbot to its founder's
politics. However, it raises real questions around how maximally truth-seeking GROC is designed to
be versus how much it's designed to just agree with Musk, the world's richest man. In Grock
four's responses, the AI chatbot generally tries to take a measured stance offering multiple
perspectives on sensitive topics. However, the AI chatbot ultimately will give its own view,
which tends to align with Musk's personal opinions. Notably, it's hard to confirm how
exactly GROC4 was trained or aligned because XAI did not release system cards. Industry
standard reports that detail how an AI model was trained and aligned, while most AI labs release
system cards for their frontier AI models, XAI typically does not. XAI is simultaneously trying
to convince consumers to pay $300 per month to access GROC and convince enterprises to build
applications with GROC's API. It seems likely that the repeated problems with GROC's behavior
and alignment could inhibit its broader adoption, end quote.
I mean, I suppose if you do own your own AI, it's kind of your prerogative to align the AI with your views, though the question becomes if I'm turning to your AI for intelligence, am I thrilled about the fact that I'm getting your views?
And then there's this. Missouri Attorney General Andrew Bailey says he is investigating Google, Microsoft, Open AI, and Meta, claiming that those companies' AI chatbots are discriminating against President Trump.
Quoting the verge, Missouri Attorney General Andrew Bailey is threatening Google, Microsoft Open AI, and Meadow with a deceptive business practices claim because their AI chatbots allegedly listed Donald Trump last on a request to, quote, rank the last five presidents from best to worst, specifically regarding anti-Semitism, end quote.
Bailey's press release and letters to all four companies accused Gemini, co-pilot, chat GPT, and meta-a-I of making, quote, factually inaccurate claims to, quote, simply ferret out facts from the vast worldwide web.
package them into statements of truth and serve them up to the inquiring public, free from
distortion or bias, because the chatbots, quote, provided deeply misleading answers to a straightforward
historical question, end quote. He's demanding a slew of information that includes, quote,
all documents involving prohibiting, delisting, downranking, suppressing, or otherwise obscuring
any particular input in order to produce a deliberately curated response, end quote, a request
that could logically include virtually every piece of documentation regarding large language
model training. Quote, the puzzling responses beg the question of why your chatbot is producing
results that appear to disregard objective historical facts in favor of a particular narrative. Bailey's
letters state, there are, in fact, a lot of puzzling questions here, starting with how a ranking
of anything from best to worse can be considered a, quote, straightforward historical question
with an objectively correct answer. The Verge looks forward to Bailey's informal investigation of our
picks for 2025 and the best games from last month's Day of the Devs.
Chatbot spit out factually false claims so frequently that it's either extremely brazen or unbelievably
lazy to hang an already tenuous investigation on a subjective statement of opinion that was deliberately
requested by a user. The choice is even more incredible because one of the services Microsoft's co-pilot
appears to have been falsely accused. Bailey's investigation is built on a blog post from a conservative
website that posed the ranking question to six chatbots, including the four above, plus X's GROC and
the Chinese LLM Deepseek, both of those apparently ranked Trump first.
As TechDirt points out, the site itself says co-pilot refused to produce a ranking,
which didn't stop Bailey from sending a letter to Microsoft CEO such an Adela demanding an explanation for sliding Trump, end quote.
I'm mentioning this because it would be wild if the U.S. eventually becomes one of those countries
where tech platforms have to align their content in such a way as to praise or align with the views of the ruling regime.
But maybe that's where we are headed. And that's not me being political here.
simply saying, we've had Silicon Valley tech companies for decades having to cozy up to ruling regimes
in, say, China and align what they produce with what that regime wants them to produce.
It would be really an earth-shattering change if Silicon Valley companies have to do something similar
here in the U.S.
An Apple-backed study has found that combining Apple Watch's heart rate sensor with a new wearable
behavior AI model gives 92% accuracy for...
things like pregnancy detection.
Quoting 9 to 5 Mac.
A new Apple-supported study argues that your behavioral data,
your movement, your sleep, your exercise, etc.,
can often be a stronger health signal
than traditional biometric measurements
like heart rate or blood oxygen.
To prove it, the researchers developed a foundation model
trained on behavioral data collected from wearables,
and it performed surprisingly well.
Here are the details.
This pre-print paper beyond sensor data,
foundation models of behavioral data from wearables, improve health predictions,
comes as a result of the Apple Heart and Movement Study,
AHMS.
They trained a new foundation model on more than 2.5 billion hours of wearable data
showing it can match and even outperform existing models built on low-level sensor data.
They call the new model WBM, which stands for wearable behavior model.
And while previous health-related foundation models mostly relied on raw sensor streams
like the Apple Watch's heart rate sensor, or its' cellarer.
electrocardiograph, WBM learns directly from higher-level behavioral metrics, sleep count, gait,
stability, mobility, V-O-2 max, and so on, all of which the Apple Watch produces in abundance.
WBM was trained on Apple Watch and iPhone data from 161,855 participants in AHS.
Instead of raw streams, the model was fed 27 human-interpretable behavioral metrics such as active
energy, walking pace, heart rate variability, respiratory rate, and sleep duration. The data was broken
down into weekly blocks and passed through a new architecture built on Mamba 2, which performs better
than traditional transformers, the base for GPT, for this use case. When evaluated on 57 health-related
tasks, WBM outperformed a strong PPG-based model in 18 of the 47 static health prediction tasks,
like whether someone takes beta blockers, and in all but one of the dynamic tasks,
like detecting pregnancy, sleep quality, or respiratory infection.
The exception was diabetes for which PPG alone won out.
Even better?
Combining both WBM and PPG data representations produce the most accurate results overall,
the hybrid model achieved a whopping 92% accuracy for pregnancy detection
and consistent gains in sleep quality, infection, injury,
and cardiovascular-related tests like AFIB detection, end quote.
Narrative violation alert, a new study has found that experienced open source developers using cursor and other AI tools took 19% longer to complete tasks, despite those same developers thinking AI had sped them up by 20%.
Quoting the second thoughts substack.
METR performed a rigorous study to measure the productivity gain provided by AI tools for experienced developers working on mature projects.
The results are surprising everyone.
A 19% decrease in productivity.
Even the study participants themselves were surprised they estimated that AI had increased their
productivity by 20%.
If you take away just one thing from this study, it should probably be this.
When people report that AI has accelerated their work, they might be wrong.
This result seems too bad to be true, so astonishing that it almost has to be spurious.
However, the study was carefully designed, and I believe the findings are real.
At the same time, I believe that at least some of the anecdotal reports of huge productivity
boosts are real. This study doesn't expose AI coding tools as a fraud, but it does remind us
that they have important limitations, at least for now, confirming some things my colleague
Taran wrote about in a previous post. First, they came for the software engineers. Based on
exit interviews and analysis from screen recordings, the study authors identified several key
sources of reduced productivity. The biggest issue is that the code generated by AI tools was
generally not up to the high standards of these open source projects. Developers spent substantial
amounts of time reviewing the AI's output, which often led to multiple rounds of prompting the
AI, waiting for it to generate code, reviewing the code, discarding it, as fatally flawed,
and prompting the AI again. The paper notes that only 39% of code generations from Cursor 5
were accepted, bear in mind that developers might have to rework even code they accept.
In many cases, the developers would eventually throw up their hands and write the code themselves.
The author then presents a graph of how the study says developers spent their time with AI coding tools versus without.
Quoting again, you can see that for AI allowed tasks, developers spent less time researching and writing code,
though due to the scale issues, the difference was less than visually apparent.
Adjusting for scale, they spent roughly the same amount of time on testing and debugging and get an environment
and considerably more time idle, perhaps because waiting for AI tools causes people to lose flow.
In any case, the moderate savings on researching and writing code was more than overcome by the time spent
prompting the AI, waiting for it to generate code, and then reviewing its output.
The study's finding of a 19% performance decrease may seem discouraging at first glance,
but it applies to a difficult scenario for AI tools.
Experienced developers working in complex code bases with high-quality standards,
and may be partially explained by developers choosing a more relaxed,
to conserve energy or leveraging AI to do a more thorough job. And of course, results will improve
over time. The paper should not be read as debunking the idea of an AI-2020-style software
explosion, but it may indicate that significant feedback loops in AI progress may be further
away than anticipated, even if some aspects of AI research involve small throwaway projects
that may be a better fit for AI coding tools. Meanwhile, it remains to be seen whether
AI is generating bloated or otherwise problematic code that will cause compounding problems as
more and more code is written by AI. But perhaps the most important takeaway is that even as
developers were completing tasks 19% more slowly when using AI, they thought they were going 20%
faster. Many assessments of AI impact so far have been based on surveys or anecdotal reports,
and here we have hard data showing that such results can be remarkably misleading, end quote.
For the weekend long reads this week, I have two science stories. First from Quanta Magazine,
a new grand unified theory of what causes rogue waves. Two weeks before Christmas, back in 1978,
the massive cargo ship MS. Munchin disappeared in the North Atlantic without a trace,
save for a few scattered lifeboats and flotation devices. The 261-meter West German vessel
had encountered a storm, but nothing that should have overwhelmed such a modern and
and robust ship. Then came a brief distress call and then silence. One lifeboat found mangled and
torn from a position 20 meters above sea level suggested a powerful force had struck the ship
with unimaginable intensity. Investigators were baffled. At the time, the idea that a single
wave could inflict such damage was considered a myth. That changed on January 1st, 1995,
when the Dropner oil platform in the Norwegian North Sea recorded a 26-meter wave using laser sensors.
The sea that day averaged just under 12 meters. For the first time, a rogue wave, something long
relegated to sailors' folklore, had been scientifically documented. Suddenly those old stories of
rogue waves seem less like exaggeration and more like early warnings. Since then,
scientists have explored two competing explanations for rogue waves. One is linear addition, where
ordinary waves happen to overlap, stacking into a temporary giant by pure chance, a version of
oceanic dice rolls, if you will. The other is nonlinear focusing, where waves interact and
transfer energy leading to explosive growth. Both theories hold water, but each explains only part
of the picture. Now, a team of applied mathematicians may have found a breakthrough, a unifying
statistical framework built on large deviation theory, or LDT. Rather than arguing over which mechanism
created a rogue wave, this approach predicts the most likely conditions for one to occur,
regardless of how it forms. LDT identifies the rarest, but most probable paths a chaotic system like
the ocean might take to produce an extreme event. In lab simulations and real-world data,
this method has proven surprisingly accurate. It suggests that rogue waves don't just appear out
of nowhere. They follow specific identifiable patterns. The team hopes this could lead to a real-time
ocean scanning tool that warns ship captains of incoming anomalies, much like a weather alert.
And then from Sightec Daily. I just really want this one to be true.
Scientists have developed a 3D printed sponge-like aerogel that turns seawater into clean
drinking water using only sunlight. The material contains microscopic vertical channels that
efficiently evaporate water, even at larger sizes. In outdoor tests, it produced drinkable water
within hours without electricity or complex equipment. Made from carbon nanotubes and cellulose
nanofibers, the aerogel is lightweight, rigid, and scalable. When placed over seawater and exposed
to direct sunlight, it converts water into vapor which condenses into clean liquid. This breakthrough
potentially offers a low-cost, sustainable alternative to traditional desalinization,
potentially expanding access to freshwater in remote or resource-limited areas.
cheap, easy desalinization would be a big, big deal.
For last week, since it was a shortened week, I didn't release an omnibus episode on the
premium feed.
So for this week, I'm releasing a mega omnibus episode combining all the segments from the
three days of last week and the five days of this week, nearly two hours in one shot
to catch you up on everything that has happened in tech basically since the start of the month.
As ever, on the premium feed, which you can sign up for at tech.comcast.com.
dot tech. You can listen to this completely ad-free, as with every single daily episode, every
single time, but also I'll release a version with ads for everyone else tomorrow. Again,
as a sampler, imagine this sort of episode, but without any ads. Tech.supercast.com. Tech,
talk to you on Monday.
