Tech Brew Ride Home - Thu. 01/23 – “Humanity’s Last Exam”

Episode Date: January 23, 2025

Ok, that whole AI datacenter joint venture thing seems to have gotten messy. A ByteDance board member thinks TikTok might have a way out without selling. Netflix rakes in a bunch of Oscar nominations.... And humanity’s final exam has been formulated to see when AI has actually graduated to true intelligence. Sponsors: Qualialife.com/ride and code ride Links: OpenAI, SoftBank Each Commit $19 Billion to Stargate Data Center Venture (The Information) OpenAI’s Stargate Deal Heralds Shift Away From Microsoft (WSJ) TikTok’s parent company is in active discussions about a deal, board member says (CNN) Co-founder of French Crypto Startup Freed After Kidnapping (Bloomberg) Subaru Security Flaws Exposed Its System for Tracking Millions of Cars (Wired) Oscar Nomination Scorecard: Netflix Leads Among Studios With 16, A24 Close Behind With 14 (Variety) When A.I. Passes This Test, Look Out (NyTimes) Learn more about your ad choices. Visit megaphone.fm/adchoices

Transcript
Discussion (0)
Starting point is 00:00:00 On April 4th, 2023, around 2 in the morning, a man was found stabbed multiple times on a sidewalk in downtown San Francisco. Hey, who did this to you? What happened next turned the story into a political firestorm. Reports have identified the victim as Bob Lee, the founder of Cash App. From Bloomberg Podcasts, this is Foundering, the Killing of Bob Lee, beginning April 16. Welcome to the Tech Meme Right Home for Thursday, January 23rd, 2025. I'm Brian McCullough today. Okay, that whole AI data center joint venture thing seems to have gotten messy.
Starting point is 00:00:45 A Bight Dance board member thinks TikTok might have a way out without selling. Netflix rakes in a bunch of Oscar nominations and Humanity's final exam has been formulated to see when AI has actually graduated to true intelligence. Here's what you miss today in the world of tech. So as I described yesterday, there were some doubts raised in some corners about that whole Stargate joint venture announcement, but that has blown up into a whole ball of speculation. Some say this was just a vaporware-like announcement to generate headlines and goodwill with the new administration. Sam Altman and Elon Musk got into it on socials after Musk tweeted that SoftBank didn't have the money secured, as he said. Altman replied, wrong, as you surely know. Want to come visit the first site already on. underway. This is great for the country. I realize what is great for the country isn't always
Starting point is 00:01:38 what's optimal for your companies, but in your new role, I hope you'll mostly put emoji American flag first, end quote. The information says that Sam Altman told some colleagues that OpenAI and SoftBank will each commit $19 billion to Stargate, and Open AI would effectively hold a 40% interest in the joint venture. Quote, Altman's comments on Tuesday suggest the company will have to raise $19 billion through equity or debt and company leaders previously. told colleagues they were prepared to raise debt for data center projects. The company projected to generate about $4 billion in revenue last year while still burning considerable sums of cash due to high computing costs. OpenAI raised $6.6 billion in equity funding last fall at a $157 billion
Starting point is 00:02:19 valuation and has raised a total of about $20 billion, mostly from Microsoft, which has rights to Open AI's intellectual property but is largely unaffiliated with Stargate. The scale of the data center venture is unprecedented. On Wednesday, Altman likened Stargate to a venture fund with OpenAI and SoftBank as two of the general partners. Two other general partners will be Oracle and MGX, an Abu Dhabi fund that also owns a piece of the new Delaware registered venture. GPs would commit $45 billion in total to the project, he said, implying that $7 billion would come from Oracle and MGX. The rest of the money for Stargate would come from investors categorized as limited partners as well as several types of debt financing.
Starting point is 00:02:57 Such debt could eventually trade publicly, he said. Altman said OpenAI would be a customer of Stargate, but would also have operational control of it. It's possible that Stargate would raise even more capital than the $500 billion that Son had stipulated, at which point OpenAI would lose some control, end quote. Now remember, the OpenAI-led Stargate announcement only mentioned Microsoft as a tech partner, and sources are telling the journal that Microsoft and OpenAI were arguing over Microsoft's ability to fulfill OpenAI's computing needs in the months leading up to the Stargate announcement. So is this just the long whispered about Microsoft and open AI divorce starting to come out in public. Quoting the journal. In the months leading up to the
Starting point is 00:03:37 announcement, the two sides had been haggling over what to do about OpenAI's seemingly insatiable appetite for computing power and its contention Microsoft couldn't fulfill it, even though their agreement didn't allow OpenAI to easily switch to others, said people familiar with the discussions. Open AI is almost certainly reliant on Microsoft to provide it with the data centers it needs to build and operate its sophisticated AI software. That has been a part of their agreement since Microsoft first invested in 2019. With the success of ChatGPT, OpenAI's need for computing power surged, its executives have said ending the exclusive cloud contracts could be crucial to compete with rival AI developers that don't have the same constraints. The two have been arguing over capacity on and off for years, the people said, and talks have intensified in recent months.
Starting point is 00:04:19 Like other tech giants, Microsoft has rapidly increased its investment in AI infrastructure and recently said it would spend $80 billion on AI data centers in its current fiscal year. but it also has other customers and partners besides OpenAI. OpenAI has wanted Microsoft to allow it to get cloud computing elsewhere, such as Google. Microsoft has insisted that would be a violation of the exclusivity agreement. Altman has complained that Microsoft was violating the agreement by not providing it with enough data center capacity, people familiar with the matter said. Last year, OpenAI and Microsoft discussed building their own supercomputer for training OpenAI models, but it didn't happen, people familiar with the matter said. Some of the details about the supercomputer were earlier reported by the information. While Microsoft wasn't the center of
Starting point is 00:05:04 the announcement this week, Altman maintained that his relationship with his biggest backer remains good. When one ex-user said Stargate meant that the friendship between Open AI and Microsoft was over, Altman responded with a popular internet misspelling of the word more. Absolutely not. Very important and huge partnership for a long time to come, he posted, We Just Need More Compute. Details of Stargate's structure and financing are still unclear. If the venture builds even part of the announced infrastructure, it could give OpenAI a competitive advantage. For Microsoft, Stargate could mean it has less of a hold on its AI partner, but also be exposed to less risk. In its own announcement coinciding with the Stargate rollout, Microsoft said much about its relationship with
Starting point is 00:05:45 OpenAI will remain the same. It will continue to host OpenAI software on its Azure cloud computing platform. That means when people use software like ChatGPT, OpenAI has to pay Azure. It will also continue to build some new data centers for OpenAI, end quote. bite dance board member bill ford says bite dance is exploring alternatives to selling tic-tok u.s, including a change to local control to comply with the u.s. ban law. We are optimistic. We will find a solution Ford said speaking at the world economic forum in davos. There are a number of alternatives. We can talk to President Trump and his team about that are short of selling the company that allow the company to continue to operate maybe with a change of control of some kind,
Starting point is 00:06:30 but short of having to sell. I'm optimistic about the dialogue that is merging between President Trump and President G. Ford said that might help create a much more constructive environment, a much higher level of engagement that could lead to a positive solution. We'll get on with it as soon as maybe the end of the week in terms of negotiating what might work. The Chinese government, the U.S. government, and the company and the board all have to be involved in this conversation. Ford said, adding that there could be solutions, quote, short of divestiture. The objective is for TikTok to continue operating, he said. Ford is also the chief executive officer of General Atlantic, which is an investor in BiteDance TikTok's parent company.
Starting point is 00:07:12 Here's a weird one. France says David Balland, co-founder of French crypto wallet startup Ledger, was kidnapped from his home on January 21st and freed yesterday after a police operation, quoting Bloomberg. After his kidnapping, Balland was taken in a car to another address where he was held captive. Emergency services are now treating him, the prosecutor's office said, and quote. Founded in 2014, Ledger has emerged. as a key player in the crypto security landscape known for its specialized hardware wallets that help investors safeguard their digital assets. In 2023, the company raised a 100 million euro funding round, pushing its valuation to 1.3 billion euro. In 2019, in a strategic move to control its manufacturing
Starting point is 00:07:57 process, Ledger established a production facility in Vierrezan, a quaint French town in the heart of the country. Between that facility and the company's headquarters in Paris, the company has approximately 700 employees. As one of Ledger's founding team in 2014, Balland played a pivotal role in the company's growth story, particularly during his tenure as site director of the Viezer Zun factory from 2019 to 2021, according to his LinkedIn profile. Obviously too early to speculate on details or motives here, but you would imagine plenty of nefarious parties might want a way to hack into Ledger's crypto wallets. Another one from the Your Car is tracking you like your phone tracks you file. Research have detailed Subaru's now-fixed web vulnerabilities that let them track millions of
Starting point is 00:08:49 Subaru's via its Starlink features in the U.S. Canada and Japan. Quoting Wired, you can retrieve at least a year's worth of location history for the car where it's pings precisely sometimes multiple times a day, researcher Sam Curry says. Whether somebody's cheating on their wife or getting an abortion or part of some political group, there are a million scenarios where you could weaponize this against someone, end quote. Curry and another researcher, Subam Shah, today revealed in a blog post their method for hacking and tracking millions of Subaru's, which they believe would have allowed hackers to target any of the company's vehicles equipped with its digital features known as Starlink in the U.S. Canada or Japan. Vulnerabilities they found in a Subaru website intended for the company's staff allowed them to hijack an employee's account to both reassign control of cars' starlink features and also access all the vehicle location data available to employees, including the car's location every time its engine started.
Starting point is 00:09:40 Shah reported their findings to Subaru in late November, and Subaru quickly patched its Starlink Security, but the researchers warn that the Subaru web vulnerabilities are just the latest in a long line of similar web-based flaws. They and other security researchers working with them have found that affected well over a dozen carmakers including Accura, Genesis, Honda, Hyundai, Infinity, Kia, Toyota, and many others. There is little doubt they say that similarly serious hackable bugs exist in other auto companies' web tools that have yet to be discovered. In Subaru's case, in particular, they also point out that their discovery hints at how pervasively those with access to Subaru's portal can track its customers' movements, a privacy issue that will last far longer
Starting point is 00:10:18 than the web vulnerabilities that exposed it. The thing is, even though this is patch, this functionality is still going to exist for Subaru employees, Corey said. It's just normal functionality that an employee can pull up a year's worth of your location history, end quote. Yeah, the deeper issue lies in Subaru's extensive data collection practices. The company maintains at least a year's worth of detailed location history for vehicles accessible to employees through what Curry describes as essentially a button on Subaru's admin panel. While Subaru has patched these specific vulnerabilities and maintains they've never experienced unauthorized access, the incident spotlights a broader industry trend.
Starting point is 00:10:53 As Mozilla Foundation recently reported, 92% of modern vehicles offer minimal data control to owners with 84% of manufacturers reserving rights to share or sell collected information. Robert Harrell from the Consumer Federation of California puts it perfectly. People are being tracked in ways that they have no idea are happening, end quote. Netflix earned 16 Oscar nominations it was announced this morning, including 13 for Amelia Perez, which is the most nominated film of the year. A-24 came in second with 14 nominations, quoting variety. Amelia Perez landed 13 nods in total, including Best Picture, Best Actress for Carlos Sophia Gascon,
Starting point is 00:11:35 best supporting actress for Zoe Saldana, best director for Jacques Audyard. The streamer also received nominations for Wallace and Grommet Vengeance Most Fowl in the Best Animated Feature Category and for Pablo Larian's Maria for cinematography, end quote. The studio A-24 came in next with 14 nominations spanning their slate of films this year, including The Ambitious Historical Piece, The Brutalist, and Sing Sing starring Coleman Domingo. The Brutalist matched Wicked with 10 nominations each positioning them as front-roters, Universal, which produced Wicked, secured 13 nominations, while their specialty arm-focused features garnered 12 nominations through their popular papal drama, Conclave, and Robert Eggers Nosferatu. While Universal's combined nominations of 25 outpaced Netflix's total, it's worth noting that the Academy's methodology counts awards by individual distributor rather than parent company.
Starting point is 00:12:27 Finally today, let me introduce you to Humanity's Last Exam, a new evaluation that those behind it claim is the hardest ever AI-T, test consisting of around 3,000 multiple-choice questions. This is the Turing test, the Voigt-Conf test from Blade Runner, the test to see if the machines have become sylphil wheel. Quoting the times. For years, AI systems were measured by giving new models a variety of standardized benchmark tests. Many of these tests consisted of challenging SAT-calibre problems in areas like math, science, and logic. Comparing the models' scores over time served as a rough measure of AI progress. But AI systems eventually got too good at those tests. So new harder tests were created, often with the types of questions graduate students might encounter on their exams, those tests aren't in good shape either.
Starting point is 00:13:20 New models from companies like OpenAI, Google, and Anthropic have been getting high scores on many Ph.D-level challenges, limiting those tests' usefulness, and leading to a chilling question. Are AI systems getting too smart for us to measure? This week, researchers at the Center for AI Safety and Scale AI are releasing a possible answer to that question. A. New, evaluation called Humanity's Last Exam that they claim is the hardest test ever administered to AI Systems. Humanity's Last Exam is the brainchild of Dan Hendricks, a well-known AI safety researcher and director for the Center for AI Safety. The test's original name, Humanity's Last Stand, was discarded for being overly dramatic. Mr. Hendricks worked with Scale AI and AI company,
Starting point is 00:14:01 where he is an advisor to compile the test, which consists of roughly 3,000 multiple choice and short answer questions designed to test AI systems' abilities and areas ranging from analytic philosophy to rocket engineering. Questions were submitted by experts in these fields, including college professors and prize-winning mathematicians who were asked to come up with extremely difficult questions they knew the answers to. Here, try your hand at a question about hummingbird anatomy from the test. Homingbirds within Apod Formis uniquely have a bilaterally paired oval bone, a sesamoid embedded in the quadrilateral portion of the expanded cruciate aponeurosis of insertion of M-depressor caude. How many paired tendons are supported by
Starting point is 00:14:49 this sesamoid bone? Answer with a number. Just an aside here, obviously, I couldn't answer that, much less pronounce it. But back to the article, the questions on humanity's last exam went through a two-step filtering process. First, submitted questions were given to a AI model to solve. If models couldn't answer them, or if in the case of multiple choice questions, the models did worse than by random guessing, the questions were given to a set of human reviewers who refined them and verified the correct answers. Experts who wrote top-rated questions were paid between $500,000 and $5,000 per question, as well as receiving credit for contributing to the exam. There are other tests trying to measure advanced AI capabilities in certain domains such as
Starting point is 00:15:30 Frontier Math, a test developed by Epic AI and ARC-AGI, a test developed by the AI researcher Francois Cholet. But Humanity's last exam is aimed at determining how good AI systems are at answering complex questions across a wide variety of academic subjects, giving us what might be thought of as a general intelligence score. We are trying to estimate the extent to which AI can automate a lot of really difficult intellectual labor, Mr. Hendricks said. Once the list of questions had been compiled, the researchers gave Humanity's last exam to six leading AI models, including Google's Gemini 1.5 Pro and Anthropic Claude 3.5 Sonnet. all of them failed miserably. OpenAI's O-1 system scored the highest of the bunch with a score of
Starting point is 00:16:11 8.3%. Mr. Hendricks said he expected those scores to rise quickly and potentially surpass 50% by the end of the year. At that point, he said AI systems might be considered world-class oracles capable of answering questions on any topic more accurately than human experts. And we might have to look for other ways to measure AI's impacts, like looking at economic data or judging whether it can make novel discoveries in areas like math and science. You can imagine a better version of this where we can give questions that we don't know the answers to yet, and we're able to verify if the model is able to help solve it for us, said Summer U, Scale AIs, Director of Research, and an organizer of the exam, end quote. Nothing more for you today. Talk to you tomorrow.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.