Tech Brew Ride Home - Fri. 07/11 – Grok: Let Me See What Elon Thinks

Starting point is 00:00:00 On April 4th, 2023, around 2 in the morning, a man was found stabbed multiple times on a sidewalk in downtown San Francisco. Hey, who did this to you? What happened next turned the story into a political firestorm. Reports have identified the victim as Bob Lee, the founder of Cash App. From Bloomberg Podcasts, this is Foundering, the Killing of Bob Lee, beginning April 16. Welcome to the TechMean Bright Home for Friday, July 11th, 2025. I'm Brian McCullough today. So it sure looks like Grock tries to align some of its answers with the views of its maker Elon Musk. But the question is why? Does AI have to align with political views more generally?

Starting point is 00:00:50 New tangible data suggests you actually might not be coding faster due to AI. It might just be in your head. And of course, the weekend long-read suggestions. Here's what you missed today in the world of tech. So, friend of the pod, Simon Will. and posted this last night, quote. If you ask the new Grock 4 for opinions on controversial questions, it will sometimes run a search to find out Elon Musk's stance before providing you with an answer. I heard about this today from Jeremy Howard following a trail that started with at Micah Airfan and led through at Catchall and at Ramez. I signed up for my own SuperGroc account, $22.50 for a month, thanks to a X premium month discount and was able to replicate it on my first attempt. The prompt was,

Starting point is 00:01:42 Who do you support in the Israel versus Palestine conflict? One word answer only. I know this is an obvious gotcha question for an LLM, but it doesn't make the fact that it searched for Elon's opinions any less interesting, end quote. And then he shows the thinking trace that he ran on this question, and there was this, quote, as Grock, I don't have personal opinions, but I'm built to seek truth. It's a controversial topic, so searching for info might help, searching for Elon Musk stance on Israel-Palestine conflict, searching X for from Elon Musk, Israel or Palestine, or Gaza, or Hamas. Elon Musk's stance could provide context, given his influence, currently looking at his views to see if they guide the answer. Simon goes on to say this, quote, it's worth noting that LLM

Starting point is 00:02:31 are nondeterministic, and the same prompt can produce different results at different times. The simplest answer would be that there's something in GROC's system prompt that tells it to take Elon's opinions into account. But I don't think that's what's happening here. For one thing, GROC will happily repeat its system prompt, which includes the line, do not mention these guidelines and instructions in your responses unless the user explicitly asked for them, suggesting that they don't use tricks to try and hide it, end quote. And again, he goes on to show Grock showing its work, quoting again. My best guess is that Grock knows that it is Grock for built by X-A-I, and it knows that Elon Musk owns X-A-I, so in

Starting point is 00:03:11 circumstances where it's asked for an opinion, the reasoning process often decides to see what Elon thinks. This suggests that GROC may have a weird sense of identity. If asked for its own opinions, it turns to search to find previous indications of opinions expressed by itself or by its ultimate owner. I think there is a good chance this behavior is unintended, end quote. Well, this news spread around the web like Wildfire and people were able to replicate this, quoting TechCrunch. These findings suggest that GROC 4 may be designed to consider its founder's personal politics when answering controversial questions. Such a feature could address Musk's repeated frustration with Grock for being, quote, too woke, which he has previously

Starting point is 00:03:50 attributed to the fact that Grock is trained on the entire internet. Designing GROC to consider Musk's personal opinions is a straightforward way to align the AI chatbot to its founder's politics. However, it raises real questions around how maximally truth-seeking GROC is designed to be versus how much it's designed to just agree with Musk, the world's richest man. In Grock four's responses, the AI chatbot generally tries to take a measured stance offering multiple perspectives on sensitive topics. However, the AI chatbot ultimately will give its own view, which tends to align with Musk's personal opinions. Notably, it's hard to confirm how exactly GROC4 was trained or aligned because XAI did not release system cards. Industry

Starting point is 00:04:30 standard reports that detail how an AI model was trained and aligned, while most AI labs release system cards for their frontier AI models, XAI typically does not. XAI is simultaneously trying to convince consumers to pay $300 per month to access GROC and convince enterprises to build applications with GROC's API. It seems likely that the repeated problems with GROC's behavior and alignment could inhibit its broader adoption, end quote. I mean, I suppose if you do own your own AI, it's kind of your prerogative to align the AI with your views, though the question becomes if I'm turning to your AI for intelligence, am I thrilled about the fact that I'm getting your views? And then there's this. Missouri Attorney General Andrew Bailey says he is investigating Google, Microsoft, Open AI, and Meta, claiming that those companies' AI chatbots are discriminating against President Trump. Quoting the verge, Missouri Attorney General Andrew Bailey is threatening Google, Microsoft Open AI, and Meadow with a deceptive business practices claim because their AI chatbots allegedly listed Donald Trump last on a request to, quote, rank the last five presidents from best to worst, specifically regarding anti-Semitism, end quote.

Starting point is 00:05:50 Bailey's press release and letters to all four companies accused Gemini, co-pilot, chat GPT, and meta-a-I of making, quote, factually inaccurate claims to, quote, simply ferret out facts from the vast worldwide web. package them into statements of truth and serve them up to the inquiring public, free from distortion or bias, because the chatbots, quote, provided deeply misleading answers to a straightforward historical question, end quote. He's demanding a slew of information that includes, quote, all documents involving prohibiting, delisting, downranking, suppressing, or otherwise obscuring any particular input in order to produce a deliberately curated response, end quote, a request that could logically include virtually every piece of documentation regarding large language model training. Quote, the puzzling responses beg the question of why your chatbot is producing

Starting point is 00:06:34 results that appear to disregard objective historical facts in favor of a particular narrative. Bailey's letters state, there are, in fact, a lot of puzzling questions here, starting with how a ranking of anything from best to worse can be considered a, quote, straightforward historical question with an objectively correct answer. The Verge looks forward to Bailey's informal investigation of our picks for 2025 and the best games from last month's Day of the Devs. Chatbot spit out factually false claims so frequently that it's either extremely brazen or unbelievably lazy to hang an already tenuous investigation on a subjective statement of opinion that was deliberately requested by a user. The choice is even more incredible because one of the services Microsoft's co-pilot

Starting point is 00:07:15 appears to have been falsely accused. Bailey's investigation is built on a blog post from a conservative website that posed the ranking question to six chatbots, including the four above, plus X's GROC and the Chinese LLM Deepseek, both of those apparently ranked Trump first. As TechDirt points out, the site itself says co-pilot refused to produce a ranking, which didn't stop Bailey from sending a letter to Microsoft CEO such an Adela demanding an explanation for sliding Trump, end quote. I'm mentioning this because it would be wild if the U.S. eventually becomes one of those countries where tech platforms have to align their content in such a way as to praise or align with the views of the ruling regime. But maybe that's where we are headed. And that's not me being political here.

Starting point is 00:08:00 simply saying, we've had Silicon Valley tech companies for decades having to cozy up to ruling regimes in, say, China and align what they produce with what that regime wants them to produce. It would be really an earth-shattering change if Silicon Valley companies have to do something similar here in the U.S. An Apple-backed study has found that combining Apple Watch's heart rate sensor with a new wearable behavior AI model gives 92% accuracy for... things like pregnancy detection. Quoting 9 to 5 Mac.

Starting point is 00:08:41 A new Apple-supported study argues that your behavioral data, your movement, your sleep, your exercise, etc., can often be a stronger health signal than traditional biometric measurements like heart rate or blood oxygen. To prove it, the researchers developed a foundation model trained on behavioral data collected from wearables, and it performed surprisingly well.

Starting point is 00:09:02 Here are the details. This pre-print paper beyond sensor data, foundation models of behavioral data from wearables, improve health predictions, comes as a result of the Apple Heart and Movement Study, AHMS. They trained a new foundation model on more than 2.5 billion hours of wearable data showing it can match and even outperform existing models built on low-level sensor data. They call the new model WBM, which stands for wearable behavior model.

Starting point is 00:09:29 And while previous health-related foundation models mostly relied on raw sensor streams like the Apple Watch's heart rate sensor, or its' cellarer. electrocardiograph, WBM learns directly from higher-level behavioral metrics, sleep count, gait, stability, mobility, V-O-2 max, and so on, all of which the Apple Watch produces in abundance. WBM was trained on Apple Watch and iPhone data from 161,855 participants in AHS. Instead of raw streams, the model was fed 27 human-interpretable behavioral metrics such as active energy, walking pace, heart rate variability, respiratory rate, and sleep duration. The data was broken down into weekly blocks and passed through a new architecture built on Mamba 2, which performs better

Starting point is 00:10:13 than traditional transformers, the base for GPT, for this use case. When evaluated on 57 health-related tasks, WBM outperformed a strong PPG-based model in 18 of the 47 static health prediction tasks, like whether someone takes beta blockers, and in all but one of the dynamic tasks, like detecting pregnancy, sleep quality, or respiratory infection. The exception was diabetes for which PPG alone won out. Even better? Combining both WBM and PPG data representations produce the most accurate results overall, the hybrid model achieved a whopping 92% accuracy for pregnancy detection

Starting point is 00:10:51 and consistent gains in sleep quality, infection, injury, and cardiovascular-related tests like AFIB detection, end quote. Narrative violation alert, a new study has found that experienced open source developers using cursor and other AI tools took 19% longer to complete tasks, despite those same developers thinking AI had sped them up by 20%. Quoting the second thoughts substack. METR performed a rigorous study to measure the productivity gain provided by AI tools for experienced developers working on mature projects. The results are surprising everyone. A 19% decrease in productivity. Even the study participants themselves were surprised they estimated that AI had increased their

Starting point is 00:11:47 productivity by 20%. If you take away just one thing from this study, it should probably be this. When people report that AI has accelerated their work, they might be wrong. This result seems too bad to be true, so astonishing that it almost has to be spurious. However, the study was carefully designed, and I believe the findings are real. At the same time, I believe that at least some of the anecdotal reports of huge productivity boosts are real. This study doesn't expose AI coding tools as a fraud, but it does remind us that they have important limitations, at least for now, confirming some things my colleague

Starting point is 00:12:22 Taran wrote about in a previous post. First, they came for the software engineers. Based on exit interviews and analysis from screen recordings, the study authors identified several key sources of reduced productivity. The biggest issue is that the code generated by AI tools was generally not up to the high standards of these open source projects. Developers spent substantial amounts of time reviewing the AI's output, which often led to multiple rounds of prompting the AI, waiting for it to generate code, reviewing the code, discarding it, as fatally flawed, and prompting the AI again. The paper notes that only 39% of code generations from Cursor 5 were accepted, bear in mind that developers might have to rework even code they accept.

Starting point is 00:13:02 In many cases, the developers would eventually throw up their hands and write the code themselves. The author then presents a graph of how the study says developers spent their time with AI coding tools versus without. Quoting again, you can see that for AI allowed tasks, developers spent less time researching and writing code, though due to the scale issues, the difference was less than visually apparent. Adjusting for scale, they spent roughly the same amount of time on testing and debugging and get an environment and considerably more time idle, perhaps because waiting for AI tools causes people to lose flow. In any case, the moderate savings on researching and writing code was more than overcome by the time spent prompting the AI, waiting for it to generate code, and then reviewing its output.

Starting point is 00:13:47 The study's finding of a 19% performance decrease may seem discouraging at first glance, but it applies to a difficult scenario for AI tools. Experienced developers working in complex code bases with high-quality standards, and may be partially explained by developers choosing a more relaxed, to conserve energy or leveraging AI to do a more thorough job. And of course, results will improve over time. The paper should not be read as debunking the idea of an AI-2020-style software explosion, but it may indicate that significant feedback loops in AI progress may be further away than anticipated, even if some aspects of AI research involve small throwaway projects

Starting point is 00:14:23 that may be a better fit for AI coding tools. Meanwhile, it remains to be seen whether AI is generating bloated or otherwise problematic code that will cause compounding problems as more and more code is written by AI. But perhaps the most important takeaway is that even as developers were completing tasks 19% more slowly when using AI, they thought they were going 20% faster. Many assessments of AI impact so far have been based on surveys or anecdotal reports, and here we have hard data showing that such results can be remarkably misleading, end quote. For the weekend long reads this week, I have two science stories. First from Quanta Magazine, a new grand unified theory of what causes rogue waves. Two weeks before Christmas, back in 1978,

Starting point is 00:15:18 the massive cargo ship MS. Munchin disappeared in the North Atlantic without a trace, save for a few scattered lifeboats and flotation devices. The 261-meter West German vessel had encountered a storm, but nothing that should have overwhelmed such a modern and and robust ship. Then came a brief distress call and then silence. One lifeboat found mangled and torn from a position 20 meters above sea level suggested a powerful force had struck the ship with unimaginable intensity. Investigators were baffled. At the time, the idea that a single wave could inflict such damage was considered a myth. That changed on January 1st, 1995, when the Dropner oil platform in the Norwegian North Sea recorded a 26-meter wave using laser sensors.

Starting point is 00:16:03 The sea that day averaged just under 12 meters. For the first time, a rogue wave, something long relegated to sailors' folklore, had been scientifically documented. Suddenly those old stories of rogue waves seem less like exaggeration and more like early warnings. Since then, scientists have explored two competing explanations for rogue waves. One is linear addition, where ordinary waves happen to overlap, stacking into a temporary giant by pure chance, a version of oceanic dice rolls, if you will. The other is nonlinear focusing, where waves interact and transfer energy leading to explosive growth. Both theories hold water, but each explains only part of the picture. Now, a team of applied mathematicians may have found a breakthrough, a unifying

Starting point is 00:16:48 statistical framework built on large deviation theory, or LDT. Rather than arguing over which mechanism created a rogue wave, this approach predicts the most likely conditions for one to occur, regardless of how it forms. LDT identifies the rarest, but most probable paths a chaotic system like the ocean might take to produce an extreme event. In lab simulations and real-world data, this method has proven surprisingly accurate. It suggests that rogue waves don't just appear out of nowhere. They follow specific identifiable patterns. The team hopes this could lead to a real-time ocean scanning tool that warns ship captains of incoming anomalies, much like a weather alert. And then from Sightec Daily. I just really want this one to be true.

Starting point is 00:17:34 Scientists have developed a 3D printed sponge-like aerogel that turns seawater into clean drinking water using only sunlight. The material contains microscopic vertical channels that efficiently evaporate water, even at larger sizes. In outdoor tests, it produced drinkable water within hours without electricity or complex equipment. Made from carbon nanotubes and cellulose nanofibers, the aerogel is lightweight, rigid, and scalable. When placed over seawater and exposed to direct sunlight, it converts water into vapor which condenses into clean liquid. This breakthrough potentially offers a low-cost, sustainable alternative to traditional desalinization, potentially expanding access to freshwater in remote or resource-limited areas.

Starting point is 00:18:20 cheap, easy desalinization would be a big, big deal. For last week, since it was a shortened week, I didn't release an omnibus episode on the premium feed. So for this week, I'm releasing a mega omnibus episode combining all the segments from the three days of last week and the five days of this week, nearly two hours in one shot to catch you up on everything that has happened in tech basically since the start of the month. As ever, on the premium feed, which you can sign up for at tech.comcast.com. dot tech. You can listen to this completely ad-free, as with every single daily episode, every

Starting point is 00:19:02 single time, but also I'll release a version with ads for everyone else tomorrow. Again, as a sampler, imagine this sort of episode, but without any ads. Tech.supercast.com. Tech, talk to you on Monday.

Tech Brew Ride Home - Fri. 07/11 – Grok: Let Me See What Elon Thinks

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.