The AI Daily Brief: Artificial Intelligence News and Analysis - The AI Agent Era Begins
Episode Date: October 13, 2024A reading and discussion inspired by https://www.sequoiacap.com/article/generative-ais-act-o1/ Concerned about being spied on? Tired of censored responses? AI Daily Brief listeners receive a 20% disco...unt on Venice Pro. Visit https://venice.ai/nlw and enter the discount code NLWDAILYBRIEF. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown
Transcript
Discussion (0)
Today on the AI Daily Brief, a primer on the agenic era of AI.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
To join the conversation, follow the Discord link in our show notes.
Hello, friends, happy weekend.
It being the weekend, of course, that means it is a long reads episode, and today is a really interesting one.
The topic of agents is growing ever more important, as more and more agentic software actually starts to come online.
Today we are reading a piece from Sequoia called Generative AIs Act 01, The Agentic Reasoning Era Begins.
Now, what's great about this piece is that it actually provides a bit of background that'll help you have a technical understanding of how the agent space differs from the model space that we've been operating in for the last couple of years before getting into the specific business implications.
There is a lot of really good thinking in here, but it's a long piece.
So for the rest of this episode, I am going to turn it over to AI me, thanks to 11 Labs, to read
Sequoia's essay.
Agents are a topic that we are going to continuously come back to.
In fact, I'm thinking about at some point doing a full agent week.
For now, though, this serves as something of a primer, and I hope that you enjoy it.
Generative AI's Acta 1. The agentic reasoning era begins.
Two years into the generative AI revolution, research is progressing the field from
thinking fast, rapid fire pre-trained responses, to thinking slow, reasoning at inference time.
This evolution is unlocking a new cohort of agentic applications.
On the second anniversary of our essay, Generative AI, a creative new world, the AI ecosystem looks
very different, and we have some predictions for what's on the horizon. The foundation layer of
the generative AI market is stabilizing in an equilibrium with a key set of scaled players
and alliances, including Microsoft OpenAI, AWS, Anthropic, Meta, and Google DeepMind. Only
scaled players with economic engines and access to vast sums of capital remain in play. While the fight
is far from over and keeps escalating in a game theoretic fashion, the market structure itself is solidifying,
and it's clear that we will have increasingly cheap and plentiful next token predictions.
As the LLM market structure stabilizes, the next frontier is now emerging. The focus is shifting
to the development and scaling of the reasoning layer, where system two thinking takes precedence.
Inspired by models like AlphaGo, this layer aims to endow AI systems with deliberate reasoning,
problem-solving and cognitive operations at inference time that go beyond rapid pattern matching.
And new cognitive architectures and user interfaces are shaping how these reasoning capabilities
are delivered to and interact with users.
What does all of this mean for founders in the AI market?
What does this mean for incumbent software companies?
And where do we, as investors, see the most promising layer for returns in the generative
AI stack?
In our latest essay on the state of the generative AI market, we'll explore how the consolidation
of the foundational LLM layer has set the stage for the race to scale these higher
order reasoning and agenetic capabilities, and discuss a new generation of killer apps with
novel cognitive architectures and user interfaces.
Strawberry Fields Forever
The most important model update of 2024 goes to OpenAI with 01, formerly known as Q asterisk
and also known as Strawberry.
This is not just a reassertion of OpenAI's rightful place atop the model quality
leaderboards, but also a notable improvement on the status quo architecture.
More specifically, this is the first example of a model with true general reasoning
capabilities, which they've achieved with inference time compute. What does that mean? Pre-trained models are
doing next token prediction on an enormous amount of data. They rely on training time compute.
An emergent property of scale is basic reasoning, but this reasoning is very limited. What if you
could teach a model to reason more directly? This is essentially what's happening with strawberry.
When we say inference time compute, what we mean is asking the model to stop and think before giving
you a response, which requires more compute at inference time. Hence, inference time compute.
The stop and think part is reasoning.
AlphaGo X-LLMs.
So what is the model doing when it stops and thinks?
Let's first take a quick detour to March 2016 in Seoul.
One of the most seminal moments in deep learning history took place here.
AlphaGo's match against legendary Go-master Lee Sedal.
This wasn't just any AI-vers-human match.
It was the moment the world saw AI do more than just mimic patterns.
It was thinking.
What made AlphaGo different from previous gameplay AI systems, like Deep Blue,
Like LLMs, AlphaGo was first pre-trained to mimic human experts from a database of roughly 30 million moves from previous games and more from self-play.
But rather than provide a knee-jerk response that comes out of the pre-trained model,
AlphaGo takes the time to stop and think.
At inference time, the model runs a search or simulation across a wide range of potential future scenarios,
scores those scenarios, and then responds with the scenario or answer that has the highest expected value.
The more time AlphaGo is given, the better it performs.
With zero inference time compute, the model can't beat the best human players.
But as the inference time scales, AlphaGo gets better and better until it surpasses the very best humans.
Let's bring it back to the LLM world.
What's hard about replicating AlphaGo here is constructing the value function, or the function by which the responses are scored.
If you're playing Go, it's more straightforward.
You can simulate the game all the way to the end, see who wins, and then calculate an expected value of the next move.
If you're coding, it's somewhat straightforward.
You can test the code and see if it works.
But how do you score the first draft of an essay, or a travel itinerary, or a summary of key terms
in a long document? This is what makes reasoning hard with current methods, and it's why Strawberry
is comparatively strong on domains proximate to logic, for example, coding, math, the sciences,
and not as strong in domains that are more open-ended and unstructured, for example, writing.
While the actual implementation of Strawberry is a closely guarded secret, the key ideas involve
reinforcement learning around the chains of thought generated by the model. Auditing the model's
Chains of thought suggests that something fundamental and exciting is happening that actually
resembles how humans think and reason. For example, O1 is showing the ability to backtrack when it
gets stuck as an emergent property of scaling inference time. It is also showing the ability to
think about problems the way a human would, for example, visualize the points on a sphere to solve a
geometry problem, and to think about problems in new ways, for example, solving problems in programming
competitions in a way that humans would not. And there is no shortage of new ideas to push
forward inference time compute, for example, new ways of calculating the reward function, new ways
of closing the generator verifier gap that research teams are working on as they try to improve
the model's reasoning capabilities. In other words, deep reinforcement learning is cool again,
and it's enabling an entire new reasoning layer. System 1 versus System 2. Thinking. This leap from
pre-trained instinctual responses, System 1 to deeper, deliberate reasoning, system 2, is the next
frontier for AI. It's not enough for models to simply know things. They need to pause, evaluate,
and reason through decisions in real time. Think of pre-training as the system one layer. Whether a
model is pre-trained on millions of moves in Go or petabytes of internet-scale text, LLMs, its job
is to mimic patterns, whether that's human gameplay or language. But mimicry, as powerful as it is,
isn't true reasoning. It can't properly think its way through complex novel situations,
especially those out of sample. This is where System 2 thinking comes in.
and it's the focus of the latest wave of AI research.
When a model stops to think,
it isn't just generating learned patterns
or spitting out predictions based on past data.
It's generating a range of possibilities
considering potential outcomes
and making a decision based on reasoning.
For many tasks, system one is more than enough.
As Noam Brown pointed out on our latest episode of training data,
thinking for longer about what the capital of Bhutan is doesn't help.
You either know it or you don't.
Quick, pattern-based recall works perfectly here,
But when we look at more complex problems like breakthroughs in mathematics or biology,
quick instinctive responses don't cut it.
These advances require deep thinking, creative problem solving, and most importantly, time.
The same is true for AI.
To tackle the most challenging meaningful problems, AI will need to evolve beyond quick
in-sample responses and take its time to come up with the kind of thoughtful reasoning
that defines human progress.
A new scaling law.
The inference race is on.
The most important insight from the 01 paper is that there's a new,
scaling law in town. Pre-training LLMs follows a well-understood scaling law. The more compute and
data you spend on pre-training the model, the better it performs. The O-1 paper has opened up an
entire new plane for scaling compute. The more inference time or test time compute you give the model,
the better it reasons. What happens when the model can think for hours? Days, decades. Will we
solve the Riemann hypothesis? Will we answer Asimov's last question? This shift will move us from a
world of massive pre-training clusters toward inference clouds, environments that can scale compute
dynamically based on the complexity of the task. One model to rule them all? What happens as OpenAI,
Anthropic, Google, and meta scale their reasoning layers and develop more and more powerful
reasoning machines? Will we have one model to rule them all? One hypothesis at the outset of the
generative AI market was that a single model company would become so powerful and all-encompassing
that it would subsume all other applications. This prediction has been wrong so far in two
ways. First, there is plenty of competition at the model layer with constant leapfrogging for soda
capabilities. It's possible that someone figures out continuous self-improve with broad domain
self-play and achieves takeoff, but at the moment we have seen no evidence of this. Quite to the
contrary, the model layer is a knife fight, with price per token for GPT4 coming down 98% since the last
dev day. Second, the models have largely failed to make it into the application layer as breakout
products, with the notable exception of chat GPT. The real world is messy. Great research
don't have the desire to understand the nitty-gritty end-to-end workflows of every possible
function in every possible vertical. It is both appealing and economically rational for them to stop
at the API and let the developer universe worry about the messiness of the real world.
Good news for the application layer. Today's episode is brought to you by Venice. Venice is a
private, uncensored generative AI app. It accesses open source models to enable text,
image, and code generation without the fear of being spied on or having your data exploited.
discuss anything with Venice without concern about it being monitored, sold, or given to advertisers
and governments. Venice is different because your conversations and creations are kept securely
within the browser, never stored or accessible by Venice. Unlike other AI apps, Venice
won't tell you what's okay to say or not. Venice won't patronize you. It simply provides direct
access to machine intelligence, no topics are off limits, no ideas or taboo. With Venice,
you're in control of the AI as you should be. Pro subscriptions are available for $49 a year or $8 per month.
Daily Brief listeners receive a 20% discount on Venice Pro.
Visit venice.a.I. slash NLW. DailyBreef, that's NLW Daily Brief, all one word.
Today's episode is brought to you by Super Intelligent.
Every single business workflow and function is being remade and reimagined with artificial
intelligence.
There is a huge challenge, however, of going from the potential of AI to actually capturing
that value, and that gap is what Superintelligent is dedicated to filling.
Superintelligent accelerates AI adoption and engagement to help teams actually use AI to increase
productivity and drive business value. An interactive AI use case registry gives your company full
visibility into how people are using artificial intelligence right now. Pair that with capabilities
building content in the form of tutorials, learning paths, and a use case library. And superintelligent
helps people inside your company show how they're getting value out of AI while providing
resources for people to put that inspiration into action. The next three teams,
that sign up with 100 or more seats are going to get free embedded consulting.
That's a process by which our super intelligent team sits with your organization,
figures out the specific use cases that matter most to you,
and helps actually ensure support for adoption of those use cases to drive real value.
Go to Bsuper.ai to learn more about this AI enablement network,
and now back to the show.
The messy real world, custom cognitive architectures.
The way you plan and prosecute actions to reach your goals as a scientist,
is vastly different from how you would work as a software engineer. Moreover, it's even different
as a software engineer at different companies. As the research labs further push the boundaries on
horizontal general purpose reasoning, we still need application or domain-specific reasoning to deliver
useful AI agents. The messy real world requires significant domain and application-specific reasoning
that cannot efficiently be encoded in a general model. Enter cognitive architectures or how your
system thinks, the flow of code and model interactions that takes user input and performs actions or
generates a response. For example, in the case of factory, each of their droid products has a
custom cognitive architecture that mimics the way that a human thinks to solve a specific task,
like reviewing pull requests or writing and executing a migration plan to update a service from one
back end to another. The factory droid will break down all of the dependencies,
propose the relevant code changes, add unit tests, and pull in a human to review. Then after approval,
run the changes across all of the files in a dev environment and merge the code if all the tests pass,
just like how a human might do it, in a set of discrete tasks rather than one generalized black box answer.
What's happening with apps? Imagine you want to start a business in AI. What layer of the stack do you
target? Do you want to compete on infra? Good luck beating Nvidia and the hyperscalers. Do you want to
compete on the model? Good luck beating OpenAI and Mark Zuckerberg. Do you want to compete on apps?
Good luck beating corporate IT and global systems integrators. Oh, wait, that actually sounds pretty doable.
models are magic, but they're also messy. Mainstream enterprises can't deal with black boxes,
hallucinations, and clumsy workflows. Consumers stare at a blank prompt and don't know what to ask.
These are opportunities in the application layer. Two years ago, many application layer companies
were derided as just a wrapper on top of GPT3. Today, those rappers turn out to be one of the
only sound methods to build enduring value. What began as rappers have evolved into cognitive
architectures. Application layer AI companies are not just UIs on top of a foundation model.
Far from it. They have sophisticated cognitive architectures that typically include multiple
foundation models with some sort of routing mechanism on top, vector and or graph databases
for Rags, guardrails to ensure compliance, and application logic that mimics the way a human
might think about reasoning through a workflow. Service as a software. The cloud transition was
software as a service. Software companies became cloud service providers. This was a $350 billion
opportunity. Thanks to agentic reasoning, the AI transition is service as a software.
software. Software companies turn labor into software. That means the addressable market is not the
software market, but the services market measured in the trillions of dollars. What does it mean to
sell work? Sierra is a good example. B2C companies put Sierra on their website to talk with
customers. The job to be done is to resolve a customer issue. Sierra gets paid per resolution.
There is no such thing as a seat. You have a job to be done. Sierra does it. They get paid
accordingly. This is the true north for many AI companies. Sierra benefits from having a
graceful failure mode, escalation to a human agent. Not all companies are so lucky. An emerging pattern is to
deploy as a co-pilot first, human in the loop, and use those reps to earn the opportunity to deploy as
an autopilot, no human in the loop. GitHub co-pilot is a good example of this. A new cohort of
agentic applications. With generative AI's budding reasoning capabilities, a new class of agentic
applications is starting to emerge. What shape do these application layer companies take? Interestingly,
these companies look different than their cloud predecessors. Cloud companies targeted the software
profit pool. AI companies target the services profit pool. Cloud companies sold software, dollar per seat.
AI companies sell work, dollar per outcome. Cloud companies like to go bottoms up with frictionless
distribution. AI companies are increasingly going top down with high touch, high trust delivery models.
We are seeing a new cohort of these agentic applications emerge across all sectors of the knowledge
economy. Here are some examples. Harvey, AI, lawyer. Glean, AI, work assistant, factory, AI software
engineer, Abridge, AI medical scribe, Expo AI Pentester, Sierra, AI customer support agent. By bringing the
marginal cost of delivering these services down, in line with the plummeting cost of inference,
these agenic applications are expanding and creating new markets. Take Expo, for example. Expo is building an
AI pen tester. A pen test or penetration test is a simulated cyber attack on a computer system that
companies perform in order to evaluate their own security systems. Before Generative AI, companies
hired pen testers only in limited circumstances, for example, when required for compliance.
Because human pen testing is expensive, it's a manual task performed by a highly skilled human.
However, Expo is now demonstrating automated pen tests built on the latest reasoning LLMs
that match the performance of the most highly skilled human pen testers.
This multiplies the pen testing market and opens up the possibility of continuous pen testing
for companies of all shapes and sizes.
What does this mean for the SaaS universe?
Earlier this year, we met with our limited partners.
Their top question was,
will the AI transition destroy your existing cloud companies?
We began with a strong default of no.
The classic battle between startups and incumbents is a horse race between startups building
distribution and incumbents building product.
Can the young companies with cool products get to a bunch of customers before the incumbents who own the customers come up with cool products?
Given that so much of the magic in AI is coming from the foundation models, our default assumption has been no.
The incumbents will do just fine because those foundation models are just as accessible to them as they are to the startup universe,
and they have the pre-existing advantages of data and distribution.
The primary opportunity for startups is not to replace incumbent software companies.
It's to go after automatable pools of work.
That being said, we are no longer so sure.
See above re-cognitive architectures.
There's an enormous amount of engineering required to turn the raw capabilities of a model
into a compelling, reliable end-to-end business solution.
What if we're just dramatically underestimating what it means to be AI-native?
Twenty years ago, the on-prem software companies scoffed at the idea of SaaS.
What's the big deal?
We can run our own servers and deliver this stuff over the internet too.
Sure, conceptually it was simple, but what followed was a wholesale reinvention of the business.
EPD went from waterfalls and PRDs to agile development and AB testing.
GTM went from top-down enterprise sales and steak dinners to bottoms-up PLG and product analytics.
Business models went from high ASPs and maintenance streams to high NDRs and usage-based pricing.
Very few on-prem companies made the transition.
What if AI is an analogous shift?
Could the opportunity for AI be both selling work and replacing software?
With day.com, we have seen a glimpse of the future.
Day is an AI-native CRM.
Systems integrators make billions of dollars
configuring Salesforce to meet your needs.
With nothing but access to your email and calendar
and answers to a one-page questionnaire,
Day automatically generates a CRM
that is perfectly tailored to your business.
It doesn't have all the bells and whistles yet,
but the magic of an auto-generated CRM
that remains fresh with zero human input
is already causing people to switch.
The Investment Universe
Where are we spending our cycles as investors?
Where is funding being deployed?
Here's our quick take.
Infrastructure.
This is a lot of the same.
the domain of hyperscalers. It's being driven by game theoretic behavior, not microeconomics.
Terrible place for venture capitalists to be. Models. This is the domain of hyperscalers and financial
investors. Hyperscalers are trading balance sheets for income statements, investing money that's just
going to round-trip back to their cloud businesses in the form of compute revenue. Financial investors
are skewed by the wowed by science bias. These models are super cool and these teams are
incredibly impressive. Microeconomics be damned. Developer tools and infrastructure
software. Less interesting for strategics and more interesting for venture capitalists.
Approximately 15 companies with dollar 1BN plus of revenue were created at this layer
during the cloud transition, and we suspect the same could be true with AI.
Apps. The most interesting layer for venture capital. Approximately 20 application layer
companies with dollar 1BN plus in revenue were created during the cloud transition, another
approximately 20 were created during the mobile transition, and we suspect the same will be
true here. Closing thoughts. In generative AI's
next act, we expect to see the impact of reasoning R and D ripple into the application layer.
These ripples are fast and deep.
Most of the cognitive architectures to date incorporate clever, unhobbling techniques.
Now that these capabilities are becoming baked deeper into the models themselves,
we expect that agentic applications will become much more sophisticated and robust,
quickly.
Back in the research lab, reasoning and inference time compute will continue to be a strong theme
for the foreseeable future.
Now that we have a new scaling law, the next race is on.
But for any given domain, it is still hard to.
to gather real-world data and encode domain and application-specific cognitive architectures.
This is again where last-mile app providers may have the upper hand in solving the diverse
set of problems in the messy real world.
Thinking ahead, multi-agent systems like factories' droids may begin to proliferate as
ways of modeling reasoning and social learning processes.
Once we can do work, we can have teams of workers accomplishing so much more.
What we're all eagerly awaiting is generative AI's move 37, that moment when, like in
AlphaGo's second game against Lee Siddalt, a genera.
general AI system surprises us with something superhuman, something that feels like independent thought.
This does not mean that the AI wakes up, AlphaGo did not, but that we have simulated processes
of perception, reasoning, and action that the AI can explore in truly novel and useful ways.
This may in fact be AGI, and if so, it will not be a singular occurrence, it will merely
be the next phase of technology.
