Tech Brew Ride Home - Thu. 12/12 – The “Killer App” For Vision Pro?

Episode Date: December 12, 2024

More fallout from the whole Cruise wind-down. What it’s like to use some of the new Gemini 2.0 features. Has Apple, quite belatedly, finally done a feature update that provides the Vision Pro with a... “killer app?” An Instagram-like app from China I had never heard of. And one singular, eye-popping datapoint from the CHIPS Act. Links: The end of Cruise is the beginning of a risky new phase for autonomous vehicles (The Verge) Gemini 2.0 Flash: An outstanding multi-modal LLM with a sci-fi streaming model (Simon Willison's Blog) FCC Opens Entire 6-GHz Band to Very-Low-Power Device Operations (TV Tech) The Vision Pro’s ultrawide Mac display is very close to being a killer app (The Verge) China’s Instagram-Style Xiaohongshu Crosses $1 Billion in Profit (Bloomberg) Harvard Is Releasing a Massive Free AI Training Dataset Funded by OpenAI and Microsoft (Wired) US chipmaking boom in doubt after Biden’s defeat (Financial Times) Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 On April 4th, 2023, around 2 in the morning, a man was found stabbed multiple times on a sidewalk in downtown San Francisco. Hey, who did this to you? What happened next turned the story into a political firestorm. Reports have identified the victim as Bob Lee, the founder of Cash App. From Bloomberg Podcasts, this is Foundering, the Killing of Bob Lee, beginning April 16. Welcome to the Tech meme right home for Thursday, December 12th, 2024. I'm Brian McCullough today. More fallout from the whole cruise wind down, what it's like to use some of the new Gemini 2.0 features. Has Apple quite belatedly finally done a feature update that provides the Vision Pro with a killer app, an Instagram-like app from China I'd never heard of, and one singular eye-popping data point about the chips act. Here's what you miss today in the world of tech. Today is a day of follow-up stories, kind of, and also maybe stories. that have bounced out of other stories we've been talking about recently. Honda plans to dissolve a self-driving vehicle partnership with GM after GM announced plans to exit the Robo Taxi business
Starting point is 00:01:17 completely. Honda had invested $852 million in Cruise for its part, while Microsoft expects an around $800 million charge in Q2 of 2025 because they had acquired a minority stake in Cruz back in January 2021. Now, as I said when all this news broke, I wouldn't be surprised to learn eventually that there were simple dollars and cents decisions that led to all of this happening, the economics inside of GM, I mean. But Andrew J. Hawkins at the verge takes a stab at a more macro theory. In short, he posits that maybe robotaxies will never provide the scale of profits to make it worthwhile for automobile producers to actually be in this business.
Starting point is 00:01:59 So why not go back to what they do best, which is just selling cars to individuals, just cars that are maybe mostly autonomous. Quote, Cruz was continuing to rack up huge losses. The Robotaxi subsidiary lost a staggering $3.48 billion in 2020. Kyle Vote, Cruz's co-founder, and Anam's successor as CEO was under mounting pressure to expand the service and bring in more money to help cover the losses. Plus, he was directly competing with Alphabet's Waymo, which had more vehicles and seemingly better technology. And Google's parent company was more willing to spend billions of dollars without any near-term.
Starting point is 00:02:33 profits to win the Robotaxi business. With the screws tightening, vote publicly drew a line in the sand. Cruise would bring in over a billion dollars in revenue by 2025. Instead, Cruz itself never made it to the end of 2024. With Cruz out of the picture, Waymo is one of the only players left aiming to prove that Robotaxies can work in the real world. Amazon's Zooks and Hyundai's Motional are still in the game, albeit far behind Waymo. Tesla is also pursuing its own Robotaxie project, which it claims will launch in 2026. Meanwhile, GM will tackle a new risky experiment, personally owned autonomous vehicles. GM knows how to sell cars to people, and the company already has a hands-free highway driving feature called Super Cruise. Why not just leverage Cruise's fully autonomous technology to make Super Cruise even
Starting point is 00:03:20 better? GM may have scrapped its Ultra Cruise branding to develop a partially autonomous system that covers 95% of driving scenarios, but it still thinks that people want a fully autonomous car of their own on their own terms. I think the application of what the customer wants in a privately owned vehicle is very different, CEO Barras said on Tuesday, but I also think there's a lot of commonality with Cruise's technology, how it seamlessly moves back and forth, I think is something different in a personal autonomous vehicle. Driver assistance technologies, especially so-called level three systems carry their own risks, however. There have been studies that show that the handoff between a partially automated system and a human driver can be especially fraught. When people have
Starting point is 00:04:01 been disconnected from driving for a longer period of time, they may overreact when suddenly taking control in an emergency situation. They may overcorrect steering, break too hard, or be unable to respond correctly because they haven't been paying attention. And those actions can create a domino effect that has the potential to be dangerous, perhaps even fatal. The safety implications are enormous, as are the liability concerns. GM may eventually decide that robotaxies aren't such a bad bet after all. Turning once again to Simon Willison, to give us early impressions of Gemini 2.0 Flash. Simon says the spatial reasoning performance is impressive and its new streaming API is maybe the killer app. Quote, the really cool thing about Gemini 2.0
Starting point is 00:04:48 is the brand new streaming API. This lets you open up a two-way stream to the model, sending audio and video to it and getting text and audio back in real time. I urge you to try this out right now. It works for me in Chrome on my laptop and mobile Safari on my iPhone. It didn't quite work in Firefox? It's pretty similar to the previous live demo but has additional tools so you can tell it to render a chart or run Python code and it will show you the output. This stuff is straight out of science fiction, being able to have an audio conversation with a capable LLM about things that it can see through your camera is one of those we live in the future moments. Worth noting that OpenAI released their own WebSocket streaming API at Dev Day a few months ago, but that one only
Starting point is 00:05:35 handles audio and is currently very expensive to use. Google hasn't announced the pricing for Gemini 2.0 Flash yet. It's still a free preview, but if the Gemini 1.5 series is anything to go by, it's likely to be shockingly inexpensive. I usually don't get too excited about not yet released features, but this thing from the native image output video also caught my eye, and in the video, he shows an edit executed by a prompt which said, turn this car into a convertible, and it produces a perfect edit with no need to fiddle with anything. Quoting again, the dream of multimodal image output is that models can do much more finely-grained image editing than has been possible using previous generations of diffusion-based image models.
Starting point is 00:06:15 OpenAI and Amazon have both promised models with these capabilities in the near future, so it looks like we're going to have a lot of fun with this stuff in 2025. Also, the native audio output demo video shows how good Gemini 2.0 Flash will be at audio output with different voices, intonations, languages, and accents. This looks similar to what's possible with OpenAI's advanced voice mode today. end quote. With all my speculation about AI may be empowering the rise of a new compute product category in the form of smart glasses, methinks this might enable that. The FCC has opened all 1,200 megahertz of the 6 gigahertz band for unlicensed use by very low power devices,
Starting point is 00:06:59 citing growth in wearables, ARVR, and other cutting edge tech, quoting TV tech. Despite opposition from the NAB and other parties who have argued that opening up the spectrum would create problems for fixed microwave links, satellite uplinks, and broadcast auxiliary services that use this spectrum, the FCC has in recent years been opening up parts of the 6 gigahertz band. Prior to the December 11th vote to open up all 1,200 megahertz for very low power devices. The FCC expanded unlicensed use between 5.925 and 7.125 gigahertz, helping to usher in Wi-Fi 6E and set the stage for Wi-Fi 7 and support the growth of the Internet of Things. In October of 2024, FCC chair Jessica Rosenworsal called for further expansion which the NAB opposed.
Starting point is 00:07:44 As broadcasters' extraordinary efforts to help the many communities impacted by Hurricane Helene demonstrate, it is critical for the commission to ensure that broadcasters have access to spectrum that will allow them to provide these essential services in times of crisis and without interference, the NAB said in October. In adopting the new rules, the FCC noted that they will bolster cutting-edge applications like wearable technologies and augmented and virtual reality, which will enhance learning opportunities, improve health care outcomes, and bring new entertainment experiences, end quote. Remember the Vision Pro? I mean, seriously, remember all the hype from almost exactly a year ago? Well, among all the OS updates yesterday, West Davis at the Verge says that the Vision Pro
Starting point is 00:08:30 got an update that is the closest thing to a killer app we've seen for the device, and maybe something they should have launched with. Quote, the Vision Pro has been able to mirror the story. screen of a Mac since day one, but I found the original Mac virtual display feature limiting. Text was sharp at low resolutions, but the screen was cramped. I could get more space at higher resolutions, but the text was too small and blurry to read. Yes, I can blow it up to the size of a bus to make things readable, except then I'm craning my head around too much to see everything. My normal three-monitor setup lets me see the most important stuff with slight movements,
Starting point is 00:09:05 but that just hasn't been possible before now. In VisionOS 2.2, the standard Mac display is now curved, and it seems sharper. It's not retina sharp at the highest resolutions, but I no longer have to make it gigantic to get legible text. The default virtual display becomes one of three options, standard wide and ultra-wide, once your Mac is updated to MacOS15.2, which lets it take over foviated rendering from the Vision Pro. Those two extra modes instantly made the virtual display viable for me, giving me the space I'm accustomed to in my three-monitor life. You can crank the resolution in Ultra Wide all the way up to 10,240 by 2880, if you'd like. But the sweet spot for me has been the wide displays maximum 6-720 by 2880 resolution, which lets me see everything I need
Starting point is 00:09:51 without constantly rotating my Vision Pro-laden head. It ends up feeling more like a real monitor and not some fantasy display that evokes Weird Al Yankovic's song, Frank's 2000 TV. This has made it easier for me to relocate to another room in my house or even outside if I wanted. I wouldn't take it to a coffee shop for a number of reasons. Do I leave it behind when I go to the restroom or wear the Vision Pro in there like a maniac? But I'd absolutely bring it on a work trip. Apple has also made it so that the audio is sent through the headset instead of your computer speakers as it did before. The widescreen options came in handy recently when I strained my back in a way that made it painful to sit upright. I hate doing work on a laptop, but reclining
Starting point is 00:10:29 in bed with the Vision Pro on was suddenly a real option for me. There are quirks, though, switching between the display modes can be sluggish, and your Mac doesn't always remember what resolution you set, so if you switch from wide to ultra-wide and back, you might find all your windows piled up on top of each other. And the keyboard awareness feature, which shows your keyboard, even if you have one of Apple's immersive environments fully turned on, works great with my magic keyboard, but doesn't reliably show the mechanical one I prefer. Still, those are minor issues. The expanded virtual display is a critical upgrade, and if it's not in killer app territory, it's at least right next door to it. It still doesn't help the Vision Pro with its biggest issues, like that our bodies are all different
Starting point is 00:11:09 and not everyone will find it comfortable to use for long stretches of time, and it doesn't make Apple's headset any less expensive, but it does help that my Vision Pro is now more than a personal movie theater. Now it's a gigantic high-res curve display with perfect viewing angles, too. That makes the price feel a little closer to right, end quote. Wanted to put this on your radar because it was news to me, Zhao Hong-shu is apparently an Instagram-like app that is, extremely popular with younger Chinese users, so much so that it has reportedly told investors it will double its profits to more than $1 billion in 2024 ahead of a potential IPO.
Starting point is 00:11:51 So the rise of Chinese apps continue. Quote, The strong results are likely to revive speculation around the market debut of a startup valued at $20 billion in its last funding round in 2021. Zhao Hongshu, which literally means little red book, started out like its U.S. cousin as a repository of personal travel and dining photos, before branching out into reviews and live shopping. Started in 2013 by Charlene Mao Wenchiao and Miranda Ku Fang as a shopping guide for Chinese tourists,
Starting point is 00:12:19 Zhao Hongshu now counts some 300 million monthly active users. Its rapid growth comes in part at the expense of incumbent e-commerce leaders, Alibaba and J.D.com, like bite dances, Dao Yan, the startup enjoyed explosive growth through getting influencers to sell products to millions of users. When shoppers scroll through videos and photos, they can buy tagged products with just a few clicks. Still, online commerce is slowing more broadly as Chinese consumers tighten their belts during a severe economic downturn. Apps in general have also experienced a decline in growth from COVID-era peaks when millions under lockdown turned to their smartphones for entertainment and necessities.
Starting point is 00:12:55 Zhao Hongshu, which is backed by Hongshan and Alibaba, among other well-known names, is one of just a few Chinese internet leaders to remain privately held, end quote. Harvard has released a high-quality data set of nearly one million public domain books, created with funding from Microsoft and OpenAI, that anyone can use to train AI tools, quoting Wired. Around five times the size of the notorious books three data set that was used to train AI models like Metaith Lama, the institutional data initiatives database spans genres, decades, and languages, with classics from Shakespeare, Charles Dickens, and Dante,
Starting point is 00:13:35 included alongside obscure Czech math textbooks and Welsh pocket dictionaries. Greg Leppert, executive director of the institutional data initiative, says the project is an attempt to level the playing field by giving the general public, including small players in the AI industry and individual researchers, access to the sort of highly refined and curated content repositories that normally only established tech giants have the resources to assemble. It's gone through rigorous review, he says. Leppert believes the new public domain database could be used in conjunction with other licensed materials to build artificial intelligence models. I think about it a bit like the way Linux has become a foundational operating system for so much of the world, he says, noting that companies would still need to use additional training data to differentiate their models from those of their competitors. In addition to the Trova Books, the Institutional Data Initiative is also working with the Boston Public Library to scan millions of articles from different newspapers now in the public domain,
Starting point is 00:14:28 and it says it's open to forming similar collaborations down the line. The exact way the book's dataset will be released is not settled. The Institutional Data Initiative has asked Google to work together on public distribution, and the company has pledged its support, end quote. And finally today, the whole Chips Act, as we've discussed many times, is part of a geopolitical movement to have access to silicon and compute supplies inside your own borders. But it was also an effort to reverse decades of offshoring of high-tech manufacturing. Take this single data point as an example of what that meant. Quoting the F.T. A Peterson Institute study found that spending on the construction of U.S. computer and electronics
Starting point is 00:15:14 manufacturing facilities had skyrocketed in the last two years with more electronics construction proceeding in 2024 than in the two previous decades. This has coincided with a wider stock market frenzy around companies that design chips powering artificial intelligence such as U.S. listed invidia and arm, but rebuilding manufacturing in the U.S. requires participation from companies such as Taiwan Semiconductor manufacturing company, the world's biggest chipmaker, and its South Korean rival Samsung. You could say that the chip boom hasn't even started yet, one analyst said, adding the real benefits in growth that could come from it are not likely to pay off until a few years from now, end quote. Nothing more for you today. Talk to you tomorrow.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.