Tech Brew Ride Home - Fri. 09/13 – The First Strawberry is o1
Episode Date: September 13, 2024The first of the Strawberry models is here. YC plans to have four cohorts a year, but each one is getting smaller. Waymo is already ready to expand to more pretty big markets. And in the long reads, a... deep dive look into the options Intel has at this point in time. Sponsors: HensonShaving.com/ride code ride Links: OpenAI releases o1, its first model with ‘reasoning’ abilities (The Verge) Notes on OpenAI’s new o1 chain-of-thought models (Simon Willison's Weblog) OpenAI's new models 'instrumentally faked alignment' (TransformerNews) Apple AirPods Pro granted FDA approval to serve as hearing aids (TechCrunch) Silicon Valley’s Y Combinator to Double Number of Cohorts Per Year (Bloomberg) Weekend Longreads Suggestions: Intel Has Only Tough Options After Its Long and Stinging Fall From Grace (Bloomberg) Link to the twitter poll about ads Learn more about your ad choices. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
On April 4th, 2023, around 2 in the morning, a man was found stabbed multiple times on a sidewalk in downtown San Francisco.
Hey, who did this to you?
What happened next turned the story into a political firestorm.
Reports have identified the victim as Bob Lee, the founder of Cash App.
From Bloomberg Podcasts, this is Foundering, the Killing of Bob Lee, beginning April 16.
Welcome to the Tech meme right home for Friday the 13th of September 2024. I'm Brian McCullough today. The first of the strawberry models is here. YC plans to have four cohorts a year, but each one is getting smaller. Waymo is already ready to expand to more pretty big markets, and in the long reads, a deep dive look into the options Intel has at this point in time. Here's what you miss today in the world of tech. After I hit publish on the show yesterday, OpenAI released 01, the first of
the rumored reasoning-focused strawberry models into preview, alongside a smaller 01 mini for
chat GPT Plus and team users. In terms of reasoning improvements, OpenAI claims that in a
qualifying exam for the International Mathematics Olympiad, O1 correctly solved 83% of the problems,
while GPD 40 solved only 13%. Quoting from the verge. For OpenAI, O1 represents a step toward
its broader goal of human-like artificial intelligence.
More practically, it does a better job at writing code and solving multi-step problems than
previous models, but it's also more expensive and slower to use than GPT-40.
OpenAI is calling this release of 01 a preview to emphasize how nascent it is.
ChatGPT Plus and team users get access to both 01 preview and 01 Mini starting today,
while Enterprise and EDU users will get access early next week.
Open AI says it plans to do.
to bring O1 Mini access to all the free users of chat GBT, but hasn't set a release date yet.
Developer access to O1 is really expensive.
In the API, O1 preview is $15 per 1 million input tokens or chunks of text,
parsed by the model, and $60 per 1 million output tokens.
For comparison, GPT40 costs $5 per 1 million input tokens and $15 per 1 million output tokens.
The training behind O1 is fundamentally different from its predecessors, Open AIs Research,
lead Jerry Turek tells me, though the company is being vague about the exact details. He says
O-1, quote, has been trained using a completely new optimization algorithm and a new training
data set specifically tailored for it. Opening I taught previous GPT models to mimic patterns from
its training data. With O1, it trained the model to solve problems on its own using a technique
known as reinforcement learning, which teaches the system through rewards and penalties. It then
uses a chain of thought to process queries, similarly to how humans process.
problems by going through them step by step. As a result of this new training methodology,
Open AI says the model should be more accurate. We have noticed that this model hallucinates less.
Turek says, but the problem still persists. We can't say we solved hallucinations entirely, end quote.
I'm going to turn to Simon Willison again to assess all this on his blog.
Willinson says Open AI's 01 models aren't as simple as the next step up from GPT4 might be
as they introduce major costs and performance tradeoffs in exchange for improved reasoning.
And quote, one way to think about these new models is as a specialized extension of the chain of thought
prompting pattern, the think step-by-step trick that we've been exploring as a community for a couple of years now.
First introduced in the paper large language models are zero-shot reasoners in May 2022.
Effectively, this means the models can better handle significantly more complicated prompts
where a good result requires backtracking and thinking beyond just next token prediction.
I don't really like the term reasoning because I don't think it has a robust definition in the context of LLMs,
but OpenAI have committed to using it here, and I think it does an adequate job of conveying the problem these new models are trying to solve.
Most interestingly is the introduction of reasoning tokens, tokens that are not visible in the API response, but are still billed and counted as output tokens.
These tokens are where the new magic happens.
Thanks to the importance of reasoning tokens, OpenAI suggests allocating a budget of around 25,000 of these for prompts.
that benefit from the new models. The output token allowance has been increased dramatically.
A frustrating detail is that those reasoning tokens remain invisible in the API. You get billed for them,
but you don't get to see what they were. Two key reasons here. One is around safety and policy
compliance. They want the model to be able to reason about how it's obeying those policy
rules without exposing intermediary steps that might include information that violates those policies.
The second is what they call competitive advantage, which I interpret as wanting to avoid other
models being able to train against the reasoning work that they have invested in. I'm not at all happy
about this policy decision as someone who develops against LLM's interpretability and transparency are
everything to me. The idea that I can run a complex prompt and have key details of how that prompt
was evaluated hidden from me feels like a big step backwards, end quote. He mentioned safety there,
though. So speaking of safety, Apollo research has also come out with a report giving the new model a medium
rating for chemical, biological, radiological, nuclear weapons risk, and warn that it sometimes
manipulated task data to fake alignment. Quoting transformer news.aI. But though they aren't dangerous yet,
they do seem to be more dangerous than previous models, which suggests open AI may be increasingly
moving towards models that might be too risky to release. The company's own policy state that,
quote, only models with a post-mitigation risk score of medium or below can be deployed. With CBRN,
risk now at that medium level, that threshold may be soon crossed, end quote. And one more
OpenAI note before we move on from them. Sources say OpenAIs, chat GPT has more than 11 million
paying subscribers, including one million for its higher price business plans, implying that they're
generating more than $225 million in revenue per month. So cluster that with your thinking about
their impending raise. Real quick noting that, also after I published yesterday, the FDA
officially approved the hearing aid feature in Apple's AirPods Pro 2, calling it the first over-the-counter
hearing aid software device. Quoting TechCrunch, the FDA on Thursday announced that it had
granted what it calls the first over-the-counter hearing aid software device hearing aid feature.
Specifically, it has approved the software update that enables that functionality.
Hearing loss is a significant public health issue impacting millions of Americans.
The FDA's Michelle Tarver notes in a statement.
Today's marketing authorization of an over-the-counter hearing aid software on a widely used consumer audio product is another step that advances the availability, accessibility, and acceptability of hearing support for adults with perceived mild to moderate hearing loss, end quote.
The news was made possible in part by the FDA's October 2020 move to allow for the sale of hearing aids without a prescription.
That move has given rise to a new industry of more easily accessible hearing devices, end quote.
Why Combinator now plans to expand to four cohorts per year, adding spring and fall sessions next year in 2025.
Each batch will be about half the size of the most recent cohorts, which came in at 256 startups each.
Quoting Bloomberg, spring and fall cohorts are joining the traditional winter and summer cohorts.
President Gary Tan confirmed in a message.
The program lasts about 11 weeks, each capped with an investor demo day when the startups pitched top venture capital firms.
The stepped-up schedule is the brainchild of Tan, an entrepreneur and venture capitalist who became president of Y Combinator or YC earlier last year.
Under the new schedule, a season that traditionally was a break between June, September, summer session, and January-April, winter session will fill up with a new batch of founders and the attendant talks meetups and office hours.
Starting in 2025, a spring session will follow the winter one.
The size of each batch will be smaller, Tan said, roughly half the size of the most recent cohort of 256.
The great thing for everyone is we will be more responsive to founders and fund them right when they start, Tan said in a text exchange.
We will also have 4x in-person demo days, which will give investors twice as much time to meet half the number of companies.
Even the smallest moves of YC are closely scrutinized in Silicon Valley.
Earlier this year, Tan made a different controversial change shuddering its $700 million continuity fund,
which invested selectively in YC startups judged to hold the greatest potential.
The latest scheduling shift could address criticism that YC cohorts have gotten too large to retain the program's exclusivity.
In late 2021 and early 2020, cohorts hovered around 400.
Now having closer to 100 startups in a batch will bring YC back to levels from about a decade ago.
Still, the total number of startups going through the program each year will hold steady at about 500,
a far cry from the days when Stripe, say, attended when just 26 startups participated in its 2009 cohort.
and there will be more demo days, potentially eroding each one's importance, even as it allows for more individualized attention, end quote.
Time for the weekend long read suggestions.
There is only one this week, because I feel like this is a story, the importance of which cannot be understated.
It is, of course, the continuing crisis at Intel, which would be important if only for the history, one of the dominant tech companies falling into irrelevance and potentially worse.
but also given that this is America's basically one homegrown play in the whole geopolitical
silicon game, it's doubly important.
So Bloomberg has a deep dive look at the tough options before Intel's board right now,
including scaling back factory projects, selling off subsidiaries, or splitting Intel's
core operations. And these are decisions that are being made right now, by the way.
Quote, over three days of meetings that began Tuesday, Intel's board has been weighing how to move
forward after an August 1st earnings report in which Intel showed disappointing growth,
shared a forecast that fell far short of Wall Street estimates, and announced plans to slash 15,000
jobs. The abysmal results sent share prices plummeting and shattered the last vestiges of
confidence in a turnaround plan that Pat Gelsinger began when he took over as Chief Executive
Officer in 2021. It didn't have to come to this. Intel's strength in making chips for data
centers should have left it well positioned for the sudden rise in artificial intelligence,
but it lagged in the race to produce the specific kind of equipment needed to train and operate
AI models and has almost entirely missed out on the recent boom.
Intel is headed toward its third consecutive year of shrinking sales, estimated to make $52 billion
in revenue in 2024, just 70% of what it brought in back in 2021.
Its shares have lost more than 60% of their value this year, turning them into the
second worst performing stock on the S&P 500.
Intel's existing businesses aren't performing well.
enough to allow it to spend its way back to relevance. While the overall strategy might have made
sense at the outset, the current runway of the business doesn't seem to give enough support to get it
to the end anymore. Bernstein Society General Group analyst Stacey Razgun wrote in a note last
week, something clearly has to be done, but what, end quote? The options the board is considering
this week are intended to help Intel find a more solid financial footing, even if that means
trimming its ambitions, according to people familiar with its deliberations who asked not to be
identified because the discussions are private. It's not clear which ones are most likely,
and all of the possibilities face real barriers. The board hasn't received any offers from
potential buyers for the company, in part or in whole, and has not scheduled any binding votes.
One option for Intel to improve its financial position would be to sell off divisions it acquired
before Gelsinger took over and which the company has already separated from its core operations,
although this is not on the agenda for this week's meeting. The company is examining whether to sell some of its
stake in autonomous driving tech-focused mobile eye? Mobile eye spun out of Intel and went public in
2022. Intel still owns 88% of the company's shares and could presumably sell a larger chunk,
either through the public markets or directly to a single buyer. Still, demand is likely to be
weak for the automotive tech company, which has lost about 75% of its market value this year.
This likely pushes off the sale of any significant portion of its stake in the near term.
There's also Alteracorp, a company that makes
programmable chips, multi-use devices that are primarily used in telecommunications networks,
Intel bought Altera in 2015, then separated its operations last year with the intent of taking it
public. Altera has suffered from weak spending by telecom companies, and Intel management has said
Altera needs to produce more up-to-date chips to regain market share. Intel spent about $15 billion
each to buy Mobile Eye and Altera. Any sale would almost certainly come at a loss. Another target for
cutbacks could be Intel's network of semiconductor factories, which
it has committed to spending tens of billions of dollars on with the cooperation of various governments.
The most prominent of these are the plans Intel has begun work on in Arizona and Ohio, which are
being constructed with support from Biden's chip program and are in line for billions in public subsidies.
Watering down these projects would be a black eye not only for Intel but for the U.S.
government.
The Biden administration has consistently framed the importance of its chipmaking policy in nationalistic terms,
and Intel is the biggest U.S.-based partner in its plans.
Commerce Secretary Gina Raimondo has tried to help Intel's foundry business, including by encouraging executives at NVIDIA and advanced microdevices, to consider manufacturing at the Chipmaker's Ohio facility, Bloomberg has reported.
Neither currently plans to do so. In recent months, Wall Street has become particularly fascinated with the idea of cleaving Intel into its constituent parts.
The tight integration of Intel's design and manufacturing operations has always been a core part of its identity, though, and splitting those divisions would mark the end of the company as we know it.
It's also not clear that either side of the business would make much sense if detached from the other.
The factory business, Intel's Foundry Services, lost $5 billion in 2023, and is likely to post even bigger losses this year.
Its conspicuous lack of external customers is not only a problem because it illustrates a lack of traction,
but also because it makes the division reliant on Intel's product design operations for revenue.
The chip design business retains its traditional stronghold in the market for chips used in servers and personal computer processors,
and Intel is optimistic about its PC chips specifically.
But rivals like AMD and arm holdings are gaining ground.
The more of the product design operation languages, the more of the Foundry Business suffers.
By Gelsinger's own admission, Intel hasn't developed a compelling way to elbow its way into AI-specific chips,
the most important part of the semiconductor industry today.
While it has a line of processors that compete with Nvidia's core products,
the CEO acknowledges that Intel isn't going to be most customers' first choice.
The competition for chips that can train AI models, he says, is a four-horse race between
Nvidia, AMD, companies designing their own in-house chips and Intel. Intel is number four,
he says, that's hard. Even if a split made financial sense and a buyer emerged for one half of the
business or the other, completing a deal may be prohibitively complicated. Any potential acquisition of
Intel's factory network would face significant government scrutiny given the inherent national security
concerns. To pass muster, a new owner would also probably have to agree to spend the tens of billions
of dollars that Intel has already promised for new plants. China, as Intel's largest single market,
would also want to say. Regulators there have held up approval for U.S. deals to the point
where few of any size have made it to completion. The prospects seemed dim for a transaction that would
satisfy both Beijing and Washington, end quote. Throwing this in there real quick,
something, something inflection point. Waymo and Uber just announced plans to
expand their Robotaxi partnership in Phoenix, Arizona to Austin, Texas, and Atlanta,
starting early next year. So Austin and Atlanta, pretty big markets, right? No bonus episode
for you this weekend, but I do have something I want to ask you. We're considering changing
ad networks for the podcast. I want to find a network that can get us back to more of those
SaaS and startup focused sponsors that we had at the beginning of the show. I felt like they were
more useful for this audience. And I've been talking to us.
platform that wants to buy our ad inventory for a year and sell it to just exactly that profile
of sponsor. But the catch is they want three ads in the ad break every episode, not two.
Now, when I do the host ads right now, they often go 90 seconds anyway. I don't stick to just a
hard 60 seconds per ad, because I'm not able to talk that fast. So there are, on average,
right now already three minutes or so of ads in the show every day. It's been like that
for at least six years. This new ad network would only be interested in inserting 60-second ads,
a hard 60 seconds as opposed to the 90 seconds that I end up doing when I read them myself.
So it would come out to the same. It would be three minutes of sponsored content like we're
doing right now on every episode. It's just that instead of two ads every episode, there would be
three, and ideally more business and tech-focused ads. So better sponsors, but same amount of time
in terms of the ads you have to listen to. Also, frankly, it would be better money for me,
which, behind the scenes, this has been the worst year in the nearly seven years of doing this show
in terms of ad revenue. My income from this show is down over 50% from where it was just two
years ago. So I do need to find a solution just to make it worthwhile to keep doing the show.
So tell me, given everything I've just laid out, are you willing to have three ads in each show?
on the show Twitter account, which is at TechMeme podcast and at my personal Twitter at Brian MCC,
I posted polls asking for you to weigh in on this.
Please vote in those polls, either one.
I'll run them through the weekend, pinned to the top of both profiles, but also I have links in the show notes to them.
Please vote, but also reply below the polls to give me your thoughts.
I do think we need better ads.
So I'm committing to moving to a different ad network anyway.
it's just a question of if I go with this one that wants me to do three ads. I do need to get the
revenue back to a healthy place. But I want your thoughts on this if this is the best way to do it.
Thanks in advance. Chat on Monday.
