Tech Brew Ride Home - Thu. 01/23 – “Humanity’s Last Exam”
Episode Date: January 23, 2025Ok, that whole AI datacenter joint venture thing seems to have gotten messy. A ByteDance board member thinks TikTok might have a way out without selling. Netflix rakes in a bunch of Oscar nominations.... And humanity’s final exam has been formulated to see when AI has actually graduated to true intelligence. Sponsors: Qualialife.com/ride and code ride Links: OpenAI, SoftBank Each Commit $19 Billion to Stargate Data Center Venture (The Information) OpenAI’s Stargate Deal Heralds Shift Away From Microsoft (WSJ) TikTok’s parent company is in active discussions about a deal, board member says (CNN) Co-founder of French Crypto Startup Freed After Kidnapping (Bloomberg) Subaru Security Flaws Exposed Its System for Tracking Millions of Cars (Wired) Oscar Nomination Scorecard: Netflix Leads Among Studios With 16, A24 Close Behind With 14 (Variety) When A.I. Passes This Test, Look Out (NyTimes) Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
On April 4th, 2023, around 2 in the morning, a man was found stabbed multiple times on a sidewalk in downtown San Francisco.
Hey, who did this to you?
What happened next turned the story into a political firestorm.
Reports have identified the victim as Bob Lee, the founder of Cash App.
From Bloomberg Podcasts, this is Foundering, the Killing of Bob Lee, beginning April 16.
Welcome to the Tech Meme Right Home for Thursday, January 23rd, 2025.
I'm Brian McCullough today.
Okay, that whole AI data center joint venture thing seems to have gotten messy.
A Bight Dance board member thinks TikTok might have a way out without selling.
Netflix rakes in a bunch of Oscar nominations and Humanity's final exam has been formulated to see when AI has actually graduated to true intelligence.
Here's what you miss today in the world of tech.
So as I described yesterday, there were some doubts raised in some corners about that whole Stargate joint venture announcement, but that has blown up into a whole ball of speculation.
Some say this was just a vaporware-like announcement to generate headlines and goodwill with the new administration.
Sam Altman and Elon Musk got into it on socials after Musk tweeted that SoftBank didn't have the money secured, as he said.
Altman replied, wrong, as you surely know. Want to come visit the first site already on.
underway. This is great for the country. I realize what is great for the country isn't always
what's optimal for your companies, but in your new role, I hope you'll mostly put emoji American flag
first, end quote. The information says that Sam Altman told some colleagues that OpenAI and
SoftBank will each commit $19 billion to Stargate, and Open AI would effectively hold a 40%
interest in the joint venture. Quote, Altman's comments on Tuesday suggest the company will have to
raise $19 billion through equity or debt and company leaders previously.
told colleagues they were prepared to raise debt for data center projects. The company projected
to generate about $4 billion in revenue last year while still burning considerable sums of cash
due to high computing costs. OpenAI raised $6.6 billion in equity funding last fall at a $157 billion
valuation and has raised a total of about $20 billion, mostly from Microsoft, which has rights
to Open AI's intellectual property but is largely unaffiliated with Stargate. The scale of the
data center venture is unprecedented. On Wednesday, Altman likened Stargate to a venture
fund with OpenAI and SoftBank as two of the general partners. Two other general partners will be
Oracle and MGX, an Abu Dhabi fund that also owns a piece of the new Delaware registered venture.
GPs would commit $45 billion in total to the project, he said, implying that $7 billion
would come from Oracle and MGX. The rest of the money for Stargate would come from
investors categorized as limited partners as well as several types of debt financing.
Such debt could eventually trade publicly, he said. Altman said OpenAI would be a customer of
Stargate, but would also have operational control of it. It's possible that Stargate would
raise even more capital than the $500 billion that Son had stipulated, at which point OpenAI
would lose some control, end quote. Now remember, the OpenAI-led Stargate announcement only
mentioned Microsoft as a tech partner, and sources are telling the journal that Microsoft and
OpenAI were arguing over Microsoft's ability to fulfill OpenAI's computing needs in the months
leading up to the Stargate announcement. So is this just the long whispered about Microsoft and
open AI divorce starting to come out in public. Quoting the journal. In the months leading up to the
announcement, the two sides had been haggling over what to do about OpenAI's seemingly insatiable
appetite for computing power and its contention Microsoft couldn't fulfill it, even though their
agreement didn't allow OpenAI to easily switch to others, said people familiar with the discussions.
Open AI is almost certainly reliant on Microsoft to provide it with the data centers it needs to
build and operate its sophisticated AI software. That has been a part of their agreement since Microsoft
first invested in 2019.
With the success of ChatGPT, OpenAI's need for computing power surged, its executives have said ending the exclusive cloud contracts could be crucial to compete with rival AI developers that don't have the same constraints.
The two have been arguing over capacity on and off for years, the people said, and talks have intensified in recent months.
Like other tech giants, Microsoft has rapidly increased its investment in AI infrastructure and recently said it would spend $80 billion on AI data centers in its current fiscal year.
but it also has other customers and partners besides OpenAI. OpenAI has wanted Microsoft to allow it to
get cloud computing elsewhere, such as Google. Microsoft has insisted that would be a violation of
the exclusivity agreement. Altman has complained that Microsoft was violating the agreement by not
providing it with enough data center capacity, people familiar with the matter said. Last year,
OpenAI and Microsoft discussed building their own supercomputer for training OpenAI models,
but it didn't happen, people familiar with the matter said. Some of the details about
the supercomputer were earlier reported by the information. While Microsoft wasn't the center of
the announcement this week, Altman maintained that his relationship with his biggest backer remains good.
When one ex-user said Stargate meant that the friendship between Open AI and Microsoft was over,
Altman responded with a popular internet misspelling of the word more. Absolutely not. Very important
and huge partnership for a long time to come, he posted, We Just Need More Compute.
Details of Stargate's structure and financing are still unclear. If the venture builds even part of the
announced infrastructure, it could give OpenAI a competitive advantage. For Microsoft, Stargate could
mean it has less of a hold on its AI partner, but also be exposed to less risk. In its own
announcement coinciding with the Stargate rollout, Microsoft said much about its relationship with
OpenAI will remain the same. It will continue to host OpenAI software on its Azure cloud
computing platform. That means when people use software like ChatGPT, OpenAI has to pay Azure. It
will also continue to build some new data centers for OpenAI, end quote.
bite dance board member bill ford says bite dance is exploring alternatives to selling tic-tok u.s,
including a change to local control to comply with the u.s. ban law. We are optimistic. We will find a
solution Ford said speaking at the world economic forum in davos. There are a number of alternatives.
We can talk to President Trump and his team about that are short of selling the company
that allow the company to continue to operate maybe with a change of control of some kind,
but short of having to sell. I'm optimistic about the dialogue that is merging between President
Trump and President G. Ford said that might help create a much more constructive environment,
a much higher level of engagement that could lead to a positive solution. We'll get on with it
as soon as maybe the end of the week in terms of negotiating what might work. The Chinese government,
the U.S. government, and the company and the board all have to be involved in this conversation.
Ford said, adding that there could be solutions, quote, short of divestiture. The objective is for
TikTok to continue operating, he said. Ford is also the chief executive officer of General Atlantic,
which is an investor in BiteDance TikTok's parent company.
Here's a weird one. France says David Balland, co-founder of French crypto wallet startup
Ledger, was kidnapped from his home on January 21st and freed yesterday after a police operation,
quoting Bloomberg. After his kidnapping, Balland was taken in a car to another address where he was
held captive. Emergency services are now treating him, the prosecutor's office said, and quote.
Founded in 2014, Ledger has emerged.
as a key player in the crypto security landscape known for its specialized hardware wallets that help investors
safeguard their digital assets. In 2023, the company raised a 100 million euro funding round,
pushing its valuation to 1.3 billion euro. In 2019, in a strategic move to control its manufacturing
process, Ledger established a production facility in Vierrezan, a quaint French town in the heart
of the country. Between that facility and the company's headquarters in Paris, the company has approximately
700 employees. As one of Ledger's founding team in 2014, Balland played a pivotal role in the company's
growth story, particularly during his tenure as site director of the Viezer Zun factory from 2019 to
2021, according to his LinkedIn profile. Obviously too early to speculate on details or motives here,
but you would imagine plenty of nefarious parties might want a way to hack into Ledger's crypto wallets.
Another one from the Your Car is tracking you like your phone tracks you file. Research
have detailed Subaru's now-fixed web vulnerabilities that let them track millions of
Subaru's via its Starlink features in the U.S. Canada and Japan. Quoting Wired,
you can retrieve at least a year's worth of location history for the car where it's
pings precisely sometimes multiple times a day, researcher Sam Curry says.
Whether somebody's cheating on their wife or getting an abortion or part of some political
group, there are a million scenarios where you could weaponize this against someone, end quote.
Curry and another researcher, Subam Shah, today revealed in a blog post their method for hacking and tracking millions of Subaru's,
which they believe would have allowed hackers to target any of the company's vehicles equipped with its digital features known as Starlink in the U.S. Canada or Japan.
Vulnerabilities they found in a Subaru website intended for the company's staff allowed them to hijack an employee's account to both reassign control of cars' starlink features and also access all the vehicle location data available to employees, including the car's location every time its engine started.
Shah reported their findings to Subaru in late November, and Subaru quickly patched its Starlink
Security, but the researchers warn that the Subaru web vulnerabilities are just the latest in a long
line of similar web-based flaws. They and other security researchers working with them have
found that affected well over a dozen carmakers including Accura, Genesis, Honda, Hyundai, Infinity,
Kia, Toyota, and many others. There is little doubt they say that similarly serious hackable
bugs exist in other auto companies' web tools that have yet to be discovered. In Subaru's case,
in particular, they also point out that their discovery hints at how pervasively those with access
to Subaru's portal can track its customers' movements, a privacy issue that will last far longer
than the web vulnerabilities that exposed it. The thing is, even though this is patch,
this functionality is still going to exist for Subaru employees, Corey said. It's just normal
functionality that an employee can pull up a year's worth of your location history, end quote.
Yeah, the deeper issue lies in Subaru's extensive data collection practices. The company maintains
at least a year's worth of detailed location history for vehicles accessible to employees
through what Curry describes as essentially a button on Subaru's admin panel.
While Subaru has patched these specific vulnerabilities and maintains they've never experienced
unauthorized access, the incident spotlights a broader industry trend.
As Mozilla Foundation recently reported, 92% of modern vehicles offer minimal data control to owners
with 84% of manufacturers reserving rights to share or sell collected information.
Robert Harrell from the Consumer Federation of California puts it perfectly.
People are being tracked in ways that they have no idea are happening, end quote.
Netflix earned 16 Oscar nominations it was announced this morning, including 13 for Amelia Perez,
which is the most nominated film of the year.
A-24 came in second with 14 nominations, quoting variety.
Amelia Perez landed 13 nods in total, including Best Picture, Best Actress for Carlos Sophia Gascon,
best supporting actress for Zoe Saldana,
best director for Jacques Audyard. The streamer also received nominations for Wallace and Grommet
Vengeance Most Fowl in the Best Animated Feature Category and for Pablo Larian's Maria for cinematography,
end quote. The studio A-24 came in next with 14 nominations spanning their slate of films this year,
including The Ambitious Historical Piece, The Brutalist, and Sing Sing starring Coleman Domingo.
The Brutalist matched Wicked with 10 nominations each positioning them as front-roters,
Universal, which produced Wicked, secured 13 nominations, while their specialty arm-focused features garnered 12 nominations through their popular papal drama, Conclave, and Robert Eggers Nosferatu.
While Universal's combined nominations of 25 outpaced Netflix's total, it's worth noting that the Academy's methodology counts awards by individual distributor rather than parent company.
Finally today, let me introduce you to Humanity's Last Exam, a new evaluation that those behind it claim is the hardest ever AI-T,
test consisting of around 3,000 multiple-choice questions. This is the Turing test, the Voigt-Conf test from
Blade Runner, the test to see if the machines have become sylphil wheel. Quoting the times.
For years, AI systems were measured by giving new models a variety of standardized benchmark tests.
Many of these tests consisted of challenging SAT-calibre problems in areas like math, science, and logic.
Comparing the models' scores over time served as a rough measure of AI progress. But AI systems
eventually got too good at those tests. So new harder tests were created, often with the types
of questions graduate students might encounter on their exams, those tests aren't in good shape either.
New models from companies like OpenAI, Google, and Anthropic have been getting high scores on
many Ph.D-level challenges, limiting those tests' usefulness, and leading to a chilling question.
Are AI systems getting too smart for us to measure? This week, researchers at the Center for
AI Safety and Scale AI are releasing a possible answer to that question. A. New,
evaluation called Humanity's Last Exam that they claim is the hardest test ever administered to AI
Systems. Humanity's Last Exam is the brainchild of Dan Hendricks, a well-known AI safety researcher
and director for the Center for AI Safety. The test's original name, Humanity's Last Stand,
was discarded for being overly dramatic. Mr. Hendricks worked with Scale AI and AI company,
where he is an advisor to compile the test, which consists of roughly 3,000 multiple choice
and short answer questions designed to test AI systems' abilities and areas ranging from
analytic philosophy to rocket engineering. Questions were submitted by experts in these fields,
including college professors and prize-winning mathematicians who were asked to come up with
extremely difficult questions they knew the answers to. Here, try your hand at a question
about hummingbird anatomy from the test. Homingbirds within Apod Formis uniquely have a bilaterally
paired oval bone, a sesamoid embedded in the quadrilateral portion of the expanded
cruciate aponeurosis of insertion of M-depressor caude. How many paired tendons are supported by
this sesamoid bone? Answer with a number. Just an aside here, obviously, I couldn't
answer that, much less pronounce it. But back to the article, the questions on humanity's last
exam went through a two-step filtering process. First, submitted questions were given to a
AI model to solve. If models couldn't answer them, or if in the case of multiple choice questions,
the models did worse than by random guessing, the questions were given to a set of human reviewers
who refined them and verified the correct answers. Experts who wrote top-rated questions were
paid between $500,000 and $5,000 per question, as well as receiving credit for contributing to the exam.
There are other tests trying to measure advanced AI capabilities in certain domains such as
Frontier Math, a test developed by Epic AI and ARC-AGI, a test developed by the
AI researcher Francois Cholet. But Humanity's last exam is aimed at determining how good AI systems are
at answering complex questions across a wide variety of academic subjects, giving us what
might be thought of as a general intelligence score. We are trying to estimate the extent to which
AI can automate a lot of really difficult intellectual labor, Mr. Hendricks said.
Once the list of questions had been compiled, the researchers gave Humanity's last exam to six
leading AI models, including Google's Gemini 1.5 Pro and Anthropic Claude 3.5 Sonnet.
all of them failed miserably. OpenAI's O-1 system scored the highest of the bunch with a score of
8.3%. Mr. Hendricks said he expected those scores to rise quickly and potentially surpass 50% by the
end of the year. At that point, he said AI systems might be considered world-class oracles
capable of answering questions on any topic more accurately than human experts. And we might have to
look for other ways to measure AI's impacts, like looking at economic data or judging whether it can
make novel discoveries in areas like math and science. You can imagine a better version of this
where we can give questions that we don't know the answers to yet, and we're able to verify
if the model is able to help solve it for us, said Summer U, Scale AIs, Director of Research,
and an organizer of the exam, end quote. Nothing more for you today. Talk to you tomorrow.
