The AI Daily Brief: Artificial Intelligence News and Analysis - The AI Agent Era Begins

Starting point is 00:00:00 Today on the AI Daily Brief, a primer on the agenic era of AI. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Hello, friends, happy weekend. It being the weekend, of course, that means it is a long reads episode, and today is a really interesting one. The topic of agents is growing ever more important, as more and more agentic software actually starts to come online. Today we are reading a piece from Sequoia called Generative AIs Act 01, The Agentic Reasoning Era Begins. Now, what's great about this piece is that it actually provides a bit of background that'll help you have a technical understanding of how the agent space differs from the model space that we've been operating in for the last couple of years before getting into the specific business implications.

Starting point is 00:00:57 There is a lot of really good thinking in here, but it's a long piece. So for the rest of this episode, I am going to turn it over to AI me, thanks to 11 Labs, to read Sequoia's essay. Agents are a topic that we are going to continuously come back to. In fact, I'm thinking about at some point doing a full agent week. For now, though, this serves as something of a primer, and I hope that you enjoy it. Generative AI's Acta 1. The agentic reasoning era begins. Two years into the generative AI revolution, research is progressing the field from

Starting point is 00:01:28 thinking fast, rapid fire pre-trained responses, to thinking slow, reasoning at inference time. This evolution is unlocking a new cohort of agentic applications. On the second anniversary of our essay, Generative AI, a creative new world, the AI ecosystem looks very different, and we have some predictions for what's on the horizon. The foundation layer of the generative AI market is stabilizing in an equilibrium with a key set of scaled players and alliances, including Microsoft OpenAI, AWS, Anthropic, Meta, and Google DeepMind. Only scaled players with economic engines and access to vast sums of capital remain in play. While the fight is far from over and keeps escalating in a game theoretic fashion, the market structure itself is solidifying,

Starting point is 00:02:08 and it's clear that we will have increasingly cheap and plentiful next token predictions. As the LLM market structure stabilizes, the next frontier is now emerging. The focus is shifting to the development and scaling of the reasoning layer, where system two thinking takes precedence. Inspired by models like AlphaGo, this layer aims to endow AI systems with deliberate reasoning, problem-solving and cognitive operations at inference time that go beyond rapid pattern matching. And new cognitive architectures and user interfaces are shaping how these reasoning capabilities are delivered to and interact with users. What does all of this mean for founders in the AI market?

Starting point is 00:02:41 What does this mean for incumbent software companies? And where do we, as investors, see the most promising layer for returns in the generative AI stack? In our latest essay on the state of the generative AI market, we'll explore how the consolidation of the foundational LLM layer has set the stage for the race to scale these higher order reasoning and agenetic capabilities, and discuss a new generation of killer apps with novel cognitive architectures and user interfaces. Strawberry Fields Forever

Starting point is 00:03:05 The most important model update of 2024 goes to OpenAI with 01, formerly known as Q asterisk and also known as Strawberry. This is not just a reassertion of OpenAI's rightful place atop the model quality leaderboards, but also a notable improvement on the status quo architecture. More specifically, this is the first example of a model with true general reasoning capabilities, which they've achieved with inference time compute. What does that mean? Pre-trained models are doing next token prediction on an enormous amount of data. They rely on training time compute. An emergent property of scale is basic reasoning, but this reasoning is very limited. What if you

Starting point is 00:03:39 could teach a model to reason more directly? This is essentially what's happening with strawberry. When we say inference time compute, what we mean is asking the model to stop and think before giving you a response, which requires more compute at inference time. Hence, inference time compute. The stop and think part is reasoning. AlphaGo X-LLMs. So what is the model doing when it stops and thinks? Let's first take a quick detour to March 2016 in Seoul. One of the most seminal moments in deep learning history took place here.

Starting point is 00:04:09 AlphaGo's match against legendary Go-master Lee Sedal. This wasn't just any AI-vers-human match. It was the moment the world saw AI do more than just mimic patterns. It was thinking. What made AlphaGo different from previous gameplay AI systems, like Deep Blue, Like LLMs, AlphaGo was first pre-trained to mimic human experts from a database of roughly 30 million moves from previous games and more from self-play. But rather than provide a knee-jerk response that comes out of the pre-trained model, AlphaGo takes the time to stop and think.

Starting point is 00:04:37 At inference time, the model runs a search or simulation across a wide range of potential future scenarios, scores those scenarios, and then responds with the scenario or answer that has the highest expected value. The more time AlphaGo is given, the better it performs. With zero inference time compute, the model can't beat the best human players. But as the inference time scales, AlphaGo gets better and better until it surpasses the very best humans. Let's bring it back to the LLM world. What's hard about replicating AlphaGo here is constructing the value function, or the function by which the responses are scored. If you're playing Go, it's more straightforward.

Starting point is 00:05:10 You can simulate the game all the way to the end, see who wins, and then calculate an expected value of the next move. If you're coding, it's somewhat straightforward. You can test the code and see if it works. But how do you score the first draft of an essay, or a travel itinerary, or a summary of key terms in a long document? This is what makes reasoning hard with current methods, and it's why Strawberry is comparatively strong on domains proximate to logic, for example, coding, math, the sciences, and not as strong in domains that are more open-ended and unstructured, for example, writing. While the actual implementation of Strawberry is a closely guarded secret, the key ideas involve

Starting point is 00:05:44 reinforcement learning around the chains of thought generated by the model. Auditing the model's Chains of thought suggests that something fundamental and exciting is happening that actually resembles how humans think and reason. For example, O1 is showing the ability to backtrack when it gets stuck as an emergent property of scaling inference time. It is also showing the ability to think about problems the way a human would, for example, visualize the points on a sphere to solve a geometry problem, and to think about problems in new ways, for example, solving problems in programming competitions in a way that humans would not. And there is no shortage of new ideas to push forward inference time compute, for example, new ways of calculating the reward function, new ways

Starting point is 00:06:21 of closing the generator verifier gap that research teams are working on as they try to improve the model's reasoning capabilities. In other words, deep reinforcement learning is cool again, and it's enabling an entire new reasoning layer. System 1 versus System 2. Thinking. This leap from pre-trained instinctual responses, System 1 to deeper, deliberate reasoning, system 2, is the next frontier for AI. It's not enough for models to simply know things. They need to pause, evaluate, and reason through decisions in real time. Think of pre-training as the system one layer. Whether a model is pre-trained on millions of moves in Go or petabytes of internet-scale text, LLMs, its job is to mimic patterns, whether that's human gameplay or language. But mimicry, as powerful as it is,

Starting point is 00:07:06 isn't true reasoning. It can't properly think its way through complex novel situations, especially those out of sample. This is where System 2 thinking comes in. and it's the focus of the latest wave of AI research. When a model stops to think, it isn't just generating learned patterns or spitting out predictions based on past data. It's generating a range of possibilities considering potential outcomes

Starting point is 00:07:27 and making a decision based on reasoning. For many tasks, system one is more than enough. As Noam Brown pointed out on our latest episode of training data, thinking for longer about what the capital of Bhutan is doesn't help. You either know it or you don't. Quick, pattern-based recall works perfectly here, But when we look at more complex problems like breakthroughs in mathematics or biology, quick instinctive responses don't cut it.

Starting point is 00:07:50 These advances require deep thinking, creative problem solving, and most importantly, time. The same is true for AI. To tackle the most challenging meaningful problems, AI will need to evolve beyond quick in-sample responses and take its time to come up with the kind of thoughtful reasoning that defines human progress. A new scaling law. The inference race is on. The most important insight from the 01 paper is that there's a new,

Starting point is 00:08:13 scaling law in town. Pre-training LLMs follows a well-understood scaling law. The more compute and data you spend on pre-training the model, the better it performs. The O-1 paper has opened up an entire new plane for scaling compute. The more inference time or test time compute you give the model, the better it reasons. What happens when the model can think for hours? Days, decades. Will we solve the Riemann hypothesis? Will we answer Asimov's last question? This shift will move us from a world of massive pre-training clusters toward inference clouds, environments that can scale compute dynamically based on the complexity of the task. One model to rule them all? What happens as OpenAI, Anthropic, Google, and meta scale their reasoning layers and develop more and more powerful

Starting point is 00:08:57 reasoning machines? Will we have one model to rule them all? One hypothesis at the outset of the generative AI market was that a single model company would become so powerful and all-encompassing that it would subsume all other applications. This prediction has been wrong so far in two ways. First, there is plenty of competition at the model layer with constant leapfrogging for soda capabilities. It's possible that someone figures out continuous self-improve with broad domain self-play and achieves takeoff, but at the moment we have seen no evidence of this. Quite to the contrary, the model layer is a knife fight, with price per token for GPT4 coming down 98% since the last dev day. Second, the models have largely failed to make it into the application layer as breakout

Starting point is 00:09:36 products, with the notable exception of chat GPT. The real world is messy. Great research don't have the desire to understand the nitty-gritty end-to-end workflows of every possible function in every possible vertical. It is both appealing and economically rational for them to stop at the API and let the developer universe worry about the messiness of the real world. Good news for the application layer. Today's episode is brought to you by Venice. Venice is a private, uncensored generative AI app. It accesses open source models to enable text, image, and code generation without the fear of being spied on or having your data exploited. discuss anything with Venice without concern about it being monitored, sold, or given to advertisers

Starting point is 00:10:16 and governments. Venice is different because your conversations and creations are kept securely within the browser, never stored or accessible by Venice. Unlike other AI apps, Venice won't tell you what's okay to say or not. Venice won't patronize you. It simply provides direct access to machine intelligence, no topics are off limits, no ideas or taboo. With Venice, you're in control of the AI as you should be. Pro subscriptions are available for $49 a year or $8 per month. Daily Brief listeners receive a 20% discount on Venice Pro. Visit venice.a.I. slash NLW. DailyBreef, that's NLW Daily Brief, all one word. Today's episode is brought to you by Super Intelligent.

Starting point is 00:10:56 Every single business workflow and function is being remade and reimagined with artificial intelligence. There is a huge challenge, however, of going from the potential of AI to actually capturing that value, and that gap is what Superintelligent is dedicated to filling. Superintelligent accelerates AI adoption and engagement to help teams actually use AI to increase productivity and drive business value. An interactive AI use case registry gives your company full visibility into how people are using artificial intelligence right now. Pair that with capabilities building content in the form of tutorials, learning paths, and a use case library. And superintelligent

Starting point is 00:11:32 helps people inside your company show how they're getting value out of AI while providing resources for people to put that inspiration into action. The next three teams, that sign up with 100 or more seats are going to get free embedded consulting. That's a process by which our super intelligent team sits with your organization, figures out the specific use cases that matter most to you, and helps actually ensure support for adoption of those use cases to drive real value. Go to Bsuper.ai to learn more about this AI enablement network, and now back to the show.

Starting point is 00:12:03 The messy real world, custom cognitive architectures. The way you plan and prosecute actions to reach your goals as a scientist, is vastly different from how you would work as a software engineer. Moreover, it's even different as a software engineer at different companies. As the research labs further push the boundaries on horizontal general purpose reasoning, we still need application or domain-specific reasoning to deliver useful AI agents. The messy real world requires significant domain and application-specific reasoning that cannot efficiently be encoded in a general model. Enter cognitive architectures or how your system thinks, the flow of code and model interactions that takes user input and performs actions or

Starting point is 00:12:40 generates a response. For example, in the case of factory, each of their droid products has a custom cognitive architecture that mimics the way that a human thinks to solve a specific task, like reviewing pull requests or writing and executing a migration plan to update a service from one back end to another. The factory droid will break down all of the dependencies, propose the relevant code changes, add unit tests, and pull in a human to review. Then after approval, run the changes across all of the files in a dev environment and merge the code if all the tests pass, just like how a human might do it, in a set of discrete tasks rather than one generalized black box answer. What's happening with apps? Imagine you want to start a business in AI. What layer of the stack do you

Starting point is 00:13:21 target? Do you want to compete on infra? Good luck beating Nvidia and the hyperscalers. Do you want to compete on the model? Good luck beating OpenAI and Mark Zuckerberg. Do you want to compete on apps? Good luck beating corporate IT and global systems integrators. Oh, wait, that actually sounds pretty doable. models are magic, but they're also messy. Mainstream enterprises can't deal with black boxes, hallucinations, and clumsy workflows. Consumers stare at a blank prompt and don't know what to ask. These are opportunities in the application layer. Two years ago, many application layer companies were derided as just a wrapper on top of GPT3. Today, those rappers turn out to be one of the only sound methods to build enduring value. What began as rappers have evolved into cognitive

Starting point is 00:14:04 architectures. Application layer AI companies are not just UIs on top of a foundation model. Far from it. They have sophisticated cognitive architectures that typically include multiple foundation models with some sort of routing mechanism on top, vector and or graph databases for Rags, guardrails to ensure compliance, and application logic that mimics the way a human might think about reasoning through a workflow. Service as a software. The cloud transition was software as a service. Software companies became cloud service providers. This was a $350 billion opportunity. Thanks to agentic reasoning, the AI transition is service as a software. software. Software companies turn labor into software. That means the addressable market is not the

Starting point is 00:14:44 software market, but the services market measured in the trillions of dollars. What does it mean to sell work? Sierra is a good example. B2C companies put Sierra on their website to talk with customers. The job to be done is to resolve a customer issue. Sierra gets paid per resolution. There is no such thing as a seat. You have a job to be done. Sierra does it. They get paid accordingly. This is the true north for many AI companies. Sierra benefits from having a graceful failure mode, escalation to a human agent. Not all companies are so lucky. An emerging pattern is to deploy as a co-pilot first, human in the loop, and use those reps to earn the opportunity to deploy as an autopilot, no human in the loop. GitHub co-pilot is a good example of this. A new cohort of

Starting point is 00:15:25 agentic applications. With generative AI's budding reasoning capabilities, a new class of agentic applications is starting to emerge. What shape do these application layer companies take? Interestingly, these companies look different than their cloud predecessors. Cloud companies targeted the software profit pool. AI companies target the services profit pool. Cloud companies sold software, dollar per seat. AI companies sell work, dollar per outcome. Cloud companies like to go bottoms up with frictionless distribution. AI companies are increasingly going top down with high touch, high trust delivery models. We are seeing a new cohort of these agentic applications emerge across all sectors of the knowledge economy. Here are some examples. Harvey, AI, lawyer. Glean, AI, work assistant, factory, AI software

Starting point is 00:16:11 engineer, Abridge, AI medical scribe, Expo AI Pentester, Sierra, AI customer support agent. By bringing the marginal cost of delivering these services down, in line with the plummeting cost of inference, these agenic applications are expanding and creating new markets. Take Expo, for example. Expo is building an AI pen tester. A pen test or penetration test is a simulated cyber attack on a computer system that companies perform in order to evaluate their own security systems. Before Generative AI, companies hired pen testers only in limited circumstances, for example, when required for compliance. Because human pen testing is expensive, it's a manual task performed by a highly skilled human. However, Expo is now demonstrating automated pen tests built on the latest reasoning LLMs

Starting point is 00:16:57 that match the performance of the most highly skilled human pen testers. This multiplies the pen testing market and opens up the possibility of continuous pen testing for companies of all shapes and sizes. What does this mean for the SaaS universe? Earlier this year, we met with our limited partners. Their top question was, will the AI transition destroy your existing cloud companies? We began with a strong default of no.

Starting point is 00:17:19 The classic battle between startups and incumbents is a horse race between startups building distribution and incumbents building product. Can the young companies with cool products get to a bunch of customers before the incumbents who own the customers come up with cool products? Given that so much of the magic in AI is coming from the foundation models, our default assumption has been no. The incumbents will do just fine because those foundation models are just as accessible to them as they are to the startup universe, and they have the pre-existing advantages of data and distribution. The primary opportunity for startups is not to replace incumbent software companies. It's to go after automatable pools of work.

Starting point is 00:17:53 That being said, we are no longer so sure. See above re-cognitive architectures. There's an enormous amount of engineering required to turn the raw capabilities of a model into a compelling, reliable end-to-end business solution. What if we're just dramatically underestimating what it means to be AI-native? Twenty years ago, the on-prem software companies scoffed at the idea of SaaS. What's the big deal? We can run our own servers and deliver this stuff over the internet too.

Starting point is 00:18:17 Sure, conceptually it was simple, but what followed was a wholesale reinvention of the business. EPD went from waterfalls and PRDs to agile development and AB testing. GTM went from top-down enterprise sales and steak dinners to bottoms-up PLG and product analytics. Business models went from high ASPs and maintenance streams to high NDRs and usage-based pricing. Very few on-prem companies made the transition. What if AI is an analogous shift? Could the opportunity for AI be both selling work and replacing software? With day.com, we have seen a glimpse of the future.

Starting point is 00:18:49 Day is an AI-native CRM. Systems integrators make billions of dollars configuring Salesforce to meet your needs. With nothing but access to your email and calendar and answers to a one-page questionnaire, Day automatically generates a CRM that is perfectly tailored to your business. It doesn't have all the bells and whistles yet,

Starting point is 00:19:05 but the magic of an auto-generated CRM that remains fresh with zero human input is already causing people to switch. The Investment Universe Where are we spending our cycles as investors? Where is funding being deployed? Here's our quick take. Infrastructure.

Starting point is 00:19:20 This is a lot of the same. the domain of hyperscalers. It's being driven by game theoretic behavior, not microeconomics. Terrible place for venture capitalists to be. Models. This is the domain of hyperscalers and financial investors. Hyperscalers are trading balance sheets for income statements, investing money that's just going to round-trip back to their cloud businesses in the form of compute revenue. Financial investors are skewed by the wowed by science bias. These models are super cool and these teams are incredibly impressive. Microeconomics be damned. Developer tools and infrastructure software. Less interesting for strategics and more interesting for venture capitalists.

Starting point is 00:19:55 Approximately 15 companies with dollar 1BN plus of revenue were created at this layer during the cloud transition, and we suspect the same could be true with AI. Apps. The most interesting layer for venture capital. Approximately 20 application layer companies with dollar 1BN plus in revenue were created during the cloud transition, another approximately 20 were created during the mobile transition, and we suspect the same will be true here. Closing thoughts. In generative AI's next act, we expect to see the impact of reasoning R and D ripple into the application layer. These ripples are fast and deep.

Starting point is 00:20:27 Most of the cognitive architectures to date incorporate clever, unhobbling techniques. Now that these capabilities are becoming baked deeper into the models themselves, we expect that agentic applications will become much more sophisticated and robust, quickly. Back in the research lab, reasoning and inference time compute will continue to be a strong theme for the foreseeable future. Now that we have a new scaling law, the next race is on. But for any given domain, it is still hard to.

Starting point is 00:20:50 to gather real-world data and encode domain and application-specific cognitive architectures. This is again where last-mile app providers may have the upper hand in solving the diverse set of problems in the messy real world. Thinking ahead, multi-agent systems like factories' droids may begin to proliferate as ways of modeling reasoning and social learning processes. Once we can do work, we can have teams of workers accomplishing so much more. What we're all eagerly awaiting is generative AI's move 37, that moment when, like in AlphaGo's second game against Lee Siddalt, a genera.

Starting point is 00:21:20 general AI system surprises us with something superhuman, something that feels like independent thought. This does not mean that the AI wakes up, AlphaGo did not, but that we have simulated processes of perception, reasoning, and action that the AI can explore in truly novel and useful ways. This may in fact be AGI, and if so, it will not be a singular occurrence, it will merely be the next phase of technology.

The AI Daily Brief: Artificial Intelligence News and Analysis - The AI Agent Era Begins

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.