The AI Daily Brief: Artificial Intelligence News and Analysis - Context Engineering: What It Is and Why It Matters

Starting point is 00:00:00 This podcast is supported by Google. Hey everyone, David here, one of the product leads for Google Gemini. If you dream it and describe it, V-O-3 and Gemini can help you bring it to life as a video. Now with incredible sound effects, background noise, and even dialogue. Try it with a Google AI Pro plan or get the highest access with the Ultra Plan. Sign up at Gemini.com to get started and show us what you create. Today on the AI Daily Brief, what is context engineering and why does it matter? matter. Before that in the headlines, a big victory for Anthropic when it comes to fair use.

Starting point is 00:00:39 The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Hello, friends. Quick announcements today. Thank you first to our sponsors for today's show. That would be Blitzy, Plum, Vanta, and Google Gemini. And of course, if you are interested in getting an ad-free version of the show, go to patreon.com.com slash AI Daily Brief. Announcements are the same as they've been for a while. We are deep in fall sponsorship discussions, so if you are interested, hit me at NLW at Breakdown.network. We also will have some more super-intelligent news soon, including some hiring, so keep an ear out for that. But there is a lot to talk about today, including a new term, which you are going to hear a lot more. So with that, let's dive in.

Starting point is 00:01:21 Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. We kick off today with a fairly big victory for Anthropic in their copyright case as a federal judge rules that AI training is fair use. Now, Anthropic is one of the many AI labs that are fighting with authors and publishers over their use of copyrighted works and training data. Each lab is running effectively the same argument that AI training is analogous to reading and is therefore not a breach of copyright under fair use provisions. A federal judge has now accepted that argument handing Anthropic and early victory in the case. Christina Frohawk, a professor of legal writing at the University of Miami School of Law, explained, the court treats the AI as akin to a

Starting point is 00:02:01 human learning from copyrighted material. It's fair use if you and I pick up a book and read it and develop our own thoughts. She said the court came to the same conclusion about AI models. In handing down the ruling, the judge commented that the, quote, authors' complaint is no different than it would be if they complained that training school children to write well would result in an explosion of competing works. He noted that copyright law, quote, seeks to advance original works of authorship, not to protect authors against competition. Now, all that said, this is only a partial victory for Anthropic. The order was only decided on this extremely narrow piece of law, with a further dispute in the case going to a separate trial. That trial will deal with what Anthropic

Starting point is 00:02:40 refers to as their central library, a corpus of all the books in the world, used to create training datasets. As well as scanning in physical books, plaintiffs claim Anthropic pirated seven million digital copies to create this repository. In a prelude of what's to come, the judge wrote, this order doubts that any accused infringer could ever meet its burden of explaining why downloading source copies from pirate sites that could have been purchased or otherwise access lawfully, was itself reasonably necessary to any subsequent fair use. According to the copyright law, willful infringement can carry a maximum penalty of 150,000 per work. If the court rules that Anthropic breached copyright millions of times in pirating books, the fines could easily

Starting point is 00:03:18 bankrupt the startup. The judge noted that the fact that, quote, Anthropic later bought a copy of a book it earlier stole off the internet, will not absolve it of liability for theft, but it may affect the extent of statutory damages. One of the interesting points to note is that the fair use ruling was based on Anthropics' AI outputs being transformative. That is, the AI model wasn't capable of directly reproducing copyrighted works. It was trained to create something new out of its training data. In fact, the judge referred to Claude's outputs as exceedingly transformative, noting, like any reader aspiring to be a writer,

Starting point is 00:03:50 Anthropics' LLMs trained upon works not to race ahead and replicate or supplant them, but to turn a hard corner and create something different. So ultimately, Anthropic is far from off the hook, and the law is far from settled, but it is still a landmark ruling for the AI copyright question. Importantly, this is only a ruling in federal court, so it isn't binding in other cases and could still be appealed. However, it can be used to persuade other judges to follow this interpretation of the law for the time being.

Starting point is 00:04:16 Obviously, this is and remains a contentious area, one in which I fully anticipate we will need to have end up before the Supreme Court before we actually finally know how it will be handled. Next up, staying on the law train for a moment, Sam Allman is fighting the I.O. lawsuit on X. The legal battle surrounding Johnny Ives' AI device startup and the identically named Google spin-off is starting to get a little bit nasty. Lawsuit filings are now circulating, but Sam

Starting point is 00:04:41 Alman decided to take his version of the story direct. Yesterday, he posted, Jason Rugolo had been hoping we would invest in or acquire his company IO, IYO, and was quite persistent in his efforts. We passed and were clear along the way. Now he is suing OpenAI over the name. This is silly, disappointing and wrong. I made a lot of time to talk to Jason on his repeated outreaches because I like helping founders. A few days before the lawsuit, he asked again for us to acquire his company, even after we tried to pass just before. It is cool to try super hard to raise money or get acquired and to do whatever you can to make your company succeed. It is not cool to turn to a lawsuit when you don't get what you want. It sets a terrible precedent for trying to help the ecosystem. All that said, I wish Jason and his

Starting point is 00:05:23 team the best building great products. The world certainly needs more of that and less lawsuits. OpenAI's legal filings, meanwhile, tell basically the same story. That technical staff at IO, this new OpenAI division, met with Rugolo out of a sense of professional courtesy, were unimpressed with a broken demo, and moved on to build something other than what they had seen. In Rugolo's responses on Twitter, he basically said it was about the name. In one post, he wrote, there are 675 other two-letter names that they can choose that aren't ours. And basically, if you want to read on how the community thinks about this, I think that on the one hand, after Sam shared these emails, the OpenAI side of the story that

Starting point is 00:06:00 they just weren't all that impressed, looks pretty resonant or at least true from what they were discussing internally, but at the same time, people also kind of feel like, hey, did you have to choose a name that close? Ultimately, my guess is that it's not worth the trouble, and OpenAI just changes the name, but what do I know? One more interesting thing on OpenAI, the company has quietly designed a productivity suite for chat GPT, which could put them in direct competition with big backers like Microsoft. The features would allow users to collaborate on documents and communicate with each other, similar to the functionality of Microsoft Office or maybe even more directly Google Workspace. The information reports that no

Starting point is 00:06:36 decision has been made about launching the feature, but a release could drive a further wedge in OpenAI's relationship with their backer Microsoft. In some ways, though, this is just a natural extension of the canvas feature, which gives users a separate document window inside ChatGBTT, making the assistant more useful in work settings. It would also allow OpenAI to compete to be an everything app. Coincidentally, we got news earlier this week that XAI is working on a productivity suite as well. So is this some big competitive change? Or is it just that all of these products are trending towards the same direction and are going to have some similar features? I tend to think it's more that than any sort of big new competition between these two frenemies.

Starting point is 00:07:14 Last up, a couple of product updates before we get to the main episode. First, Airtable have made a major move relaunching as an AI-native app. CEO Howie Lou posted, instead of just adding more AI capabilities to our existing platform, we treated this as a refounding moment for the company. We started with a clean slate imagining of the ideal form factor for building apps in the agenic era. The no-code database platform is now a fully functional vibe coding app as well. Users can now use natural language to prompt apps into existence while integrating them into Airtable's production-ready components. Lou gave the examples of creating a VC deal tracker that does automated company research, or a marketing campaign manager that monitors all relevant

Starting point is 00:07:54 competitors. The AI integration also means you can easily run queries across your database. For example, you can get the assistant to crunch thousands of support tickets to find common pain points quickly. The rebuild also adds agentic functionality built in to help you manage large data workflows. Howie wrote, when the cost of making and continually evolving apps drops to zero, everything changes. Companies will build exactly what they need rather than settling for rigid off-the-shelf software. The new default is AI, generated apps plus built-in AI agents working 24-7. What's needed in this new era is a new form factor and paradigm for software, the AI-native app platform. This is the new airtable. And what's

Starting point is 00:08:32 launching today is just the beginning. We're excited to release a slew of new AI-powered capabilities in the months ahead. Sneak peek, generate any visualization, agents leveraging MCP, agentically sourced datasets, and much, much more. Another company getting all agentic is 11 Labs, who have launched a new voice AI assistant called 11AI. The pitch is that this voice assistant has full MCP integration, so it can pull data from services including perplexity, Slack, Gmail, and Google Calendar. You can even connect your own MCP servers so the assistant can theoretically access anything you want it to.

Starting point is 00:09:02 Functionally, this is pretty similar to the voice assistant that Anthropic announced earlier last month alongside the launch of voice mode. It's designed as a voice interface to access all sorts of AI functionality. The advertising is even similar. Anthropic advertised their product as being able to help power a young professional. through their morning, while the 11 Labs ad followed a similar story, but featuring a young man rolling out of bed with five minutes to spare until a web conference with his boss. The assistant helped him delay the meeting over email, order a greasy breakfast, and remember

Starting point is 00:09:29 what his boss's pet is for small talk. The release came alongside the long-awaited mobile app so you can chat to 11 Labs assistant on the go. Now, I'm not sure about the positioning of these assistants. I am a little more skeptical than most of these sort of generalist consumer assistants, but I could be very wrong. But still, once again, if some of the theme of the Open A.I. Productivity Suite is the convergence of all of these platforms into one common set of features. This is yet again another example of that. Anyways, that is going to do it for today's AID Daily Brief Headlines edition. Next up, the main episode.

Starting point is 00:10:01 This episode is brought to you by Blitzy. Now, I talk to a lot of technical and business leaders who are eager to implement cutting-edge AI, but instead of building competitive modes, their best engineers are stuck modernizing ancient codebases or updating frameworks just to keep the lights on. These projects like migrating Java 17 to Java 21 often means staffing a team for a year or more. And sure, co-pilots help, but we all know they hit context limits fast, especially on large legacy systems. Blitzy flips the script. Instead of engineers doing 80% of the work, Blitzy's autonomous platform handles the heavy lifting,

Starting point is 00:10:32 processing millions of lines of code and making 80% of the required changes automatically. One major financial firm used Blitzy to modernize a 20 million line Java code base in just three and a half months, cutting 30,000 engineering hours and accelerating their entire roadmap. Email Jack at blitzie.com with modernize in the subject line for prioritized onboarding. Visit blitzie.com today before your competitors do. Today's episode is brought to you by Plum. You put in the hours, testing the prompts, refining JSON, and wrangling nodes on the canvas. Now it's time to get paid for it.

Starting point is 00:11:05 Plum is the only platform designed for technical creators who want to productize their AI workflows. With Plum, you can build, share and monetize your flows without giving away your prompts or configuration. When you're ready to make improvements, you can push updates to your subscribers with a single click. Launch your first paid workflow at useplum.com. That's plum with a B and start scaling your impact. Today's episode is brought to you by Vanta. In today's business landscape, businesses can't just claim security, they have to prove it. Achieving compliance with a framework like SOC2, ISO-27-01, HIPAA, GDPR, and more, is how businesses can demonstrate strong security practices.

Starting point is 00:11:45 The problem is that navigating security and compliance is time-consuming and complicated. It can take months of work and use up valuable time and resources. Vanta makes it easy and faster by automating compliance across 35-plus frameworks. It gets you audit-ready in weeks instead of months and saves you up to 85% of associated costs. In fact, a recent IDC White Paper found that Vanta customers achieved $535,000 per year in benefits, and the platform pays for itself in just three months. The proof is in the numbers. More than 10,000 global companies trust Vanta.

Starting point is 00:12:15 For a limited time, listeners get $1,000 off at vanta.com slash NLW. That's VANTA.com slash NLW for $1,000 off. Welcome back to the AI Daily Brief. Today we are talking about a term that you might have heard a little bit here and there on this show. Maybe you've seen it start to appear more on X or in articles. We're going to talk about what it means, why it's coming up more and more right now. and why it matters for the industry as a whole. To kick us off, let's turn to a recent tweet from Toby Luckie, the CEO of Shopify.

Starting point is 00:12:48 Last week, he wrote, I really like the term context engineering over prompt engineering. It describes the core skill better, the art of providing all the context for the task to be plausibly solved by the LLM. Now, a lot of folks jumped into the conversation to agree. McKay Wrigley wrote totally agree. These days you get way less performance bonus out of dumb tricks like I'll pay you $100 if you get it right, which is how it should be. All of the alpha is an assembling context well to

Starting point is 00:13:15 reduce the fog of war for the model. It's converging to human-ish info needs. Nick Dobo says soon it, context engineering, will include providing the tools, agent environment, and guardrails so the LLMs can find the context on its own. So basically what we have here is a different way to think about how to get the most out of LLMs. Since the beginning of Chatsybt, there has been this new field of prompt engineering, which has spawned innumerable courses and online tutorials, and many tricks and tips and quirks of how to ask in the right way to get the things you need out of LLMs. Now, along the way, prompt engineering has become more and more, let's say, diffuse, if not at this stage, less important. And what I mean by that is that the smarter that

Starting point is 00:13:59 models get, the more that tips from six or 12 months ago cease to work. And in many cases, there's also UI-related or interface-related abstraction of prompting, where some amount of prompt engineering is being taken over by the tools themselves. To take one example, when I was designing a cover for a recent episode, my prompt to ideogram was fun retro-futuristic cover for the quest for the Solopreneur unicorn 1950s mid-century modern. However, what ideogram turned that into was a retro-futuristic book cover in the style of 1950s mid-century modern design featuring a determined sharply dressed solopreneur riding a majestic glowing white unicorn through a swirling nebula. The solopreneur wears a tailored gray suit, a confident smile, and pilots the unicorn with a futuristic joystick, while the

Starting point is 00:14:41 unicorn's horn emits a beam of light, illuminating the path ahead. The background consists of stylized geometric planets and stars rendered in a vibrant palette of teal, orange, and yellow, with the title, the quest for the solopreneur unicorn boldly displayed in a classic chrome-accented font. So you're seeing this type of thing happen a lot more in different tools where the tool themselves are trying to take the essence of what you were asking for and do a better prompt than you could do. Context is something different, and it refers to another part of the value chain of these LLMs. Context is all of the information you give an LLM that helps it answer the question more correctly. So, for example, if you are using ChatGPT's O3 or O3 Pro, which

Starting point is 00:15:21 is particularly optimized to be better at context, when you add a bunch of files to your prompt, that is the context that you're giving it. Context engineering then becomes about, are you giving the LLM the right information that it needs to give you the output that you're looking for? And it turns out this isn't just about which documents to share with it. It's also literally an engineering task around how to carry context across more complex systems. You might remember we recently talked about a post from Cognition who creates Devin called Don't Build Multi-Agen's. and this was all anchored around context and context engineering. Basically, the argument in this piece, for those who don't remember,

Starting point is 00:16:00 was that the multi-agent workflow, where an agent breaks down a task and hands it to multiple different sub-agents with an agent that then combines the results on the other side, is one that is doomed to be fairly brittle because the transmission of context from agent to sub-agent and then sub-agent back to agent can be really difficult. The example he gave was this. Suppose your task is built a Flappy Bird clone,

Starting point is 00:16:23 This gets divided into Subtask 1, build a moving game background with green pipes and hitboxes, and Subtask 2, build a bird that you can move up and down. It turns out Sub-Agent 1 accidentally mistook your sub-task and started building a background that looks like Super Mario Bros. Sub-Agent 2 built you a bird, but it doesn't look like a game asset, and it moves nothing like the one in Flappy Bird. Now the final agent is left with the undesirable task of combining these two miscommunications. Now he goes to some potential solves, but still finds them unreliable, and ultimately,

Starting point is 00:16:53 comes to the idea of instead building a single-threaded linear agent. In the cognition model, the agent breaks down the task and breaks it into subtasks rather than sub-agents. So the same agent does the breaking down of the task, then the doing of Sub-Task 1 and Sub-Task 2, and then combines the results, with the idea being largely that this carries context between the different tasks better than the other multi-agent system.

Starting point is 00:17:19 Here the context is continuous. At the same time, they recognize that as very large tasks start to have so many subparts that context windows start to overflow, that there may be a need for a new approach. One architecture that they share is the idea of a side-long context compression LLM, which basically across each stage compresses the conversation and action so far, i.e. the context, into a set of key moments and decisions, with that compressed context being what informs the next subtasks work.

Starting point is 00:17:49 Now, whether you agree with this strategy or not is not the point of this piece. it's to show how context engineering is starting to become a part of some of the most important questions in AI, which has to do with how to build agents that are actually highly functional. And if you look around for about five minutes, we are seeing a ton of discussion of context engineering pop up. Just a couple of days ago, Lance Martin wrote on their blog, a post called Context Engineering for Agents. Lance writes, as Andre Carpathy puts it, LLMs are a kind of new operating system. The LLM is like the CPU, and its context window is like RAN, representing a working memory for the model. Context enters an LLM in several ways,

Starting point is 00:18:28 including prompts, user instructions, retrieval, eG documents, and tool calls, eG APIs. Just like RAM, the LLM context window has limited communication bandwidth to handle these various sources of context, and just like an operating system curates what fits into a CPU's RAM, we can think about context engineering as packaging and managing the context needed for an LLM to perform a task. So once again, this is coming at that same issue that we saw in the Cognition blog of having to engineer systems that get the right context, but don't just dump everything in willy-nilly. What Lance points out is the growing importance of this domain. He points to a quote from cognition again who writes context engineering is effectively

Starting point is 00:19:09 the number one job of engineers building AI agents, and another quote from Anthropic that read, agents often engage in conversations spanning hundreds of turns, requiring careful context management strategies. Now, the second part of this blog is all about the ways that we can manage that context and new strategies for that sort of context management, which is a little bit more technical and out of scope for this particular show, but I will include this in the show notes so you can go check it out for yourself. Lance talks about curating context, i.e. managing the tokens that an agent sees at each turn, persisting context, involving systems to store, save, and retrieve context over time, and isolating context involving approaches to partition context across agents

Starting point is 00:19:49 or environments. Lance points out that we are still at the very beginning early baby steps for forming general principles for building agents, and that's why there's such an explosion in this discussion. Another post that was published on the same day comes from the Langchain blog, and is called the rise of context engineering. The piece reads, Context Engineering is building dynamic systems to provide the right information and tools in the right format, such that the LLM can plausibly accomplish the task. Most of the time when an agent is not performing reliably, the underlying cause is that the appropriate context, instructions, and tools have not been communicated to the model.

Starting point is 00:20:25 LLM applications are evolving from single prompts to more complex dynamic agentic systems. As such, context engineering is becoming the most important skill an AI engineer can develop. And again, this piece really reiterates that when agentic systems mess up and LLMs tend to mess up, either because they're just not good enough or because it didn't have the appropriate context. What's more, the author argues that as models get better, it tends to be more that second reason. The author concludes, context engineering isn't a new idea. Agent Builders have been doing it for the past year or two. It's a new term that aptly describes an increasingly important skill. So I think that there are actually two different domains of context engineering that are

Starting point is 00:21:07 worth us keeping in mind and that are worth you and I exploring. The first is context engineering in the context of AI engineers and actual agent building. In other words, for people who are building agentic systems, software engineers that are thinking about how to make agents more performant and work on higher complexity and higher order tasks, these questions of context engineering are about system design. They're about things like the context compression LLM that sits alongside a single agent system and makes it work better. There is a whole entire important discourse happening in that domain that will influence the shape of the agents that even non-coders and non-technical people ultimately interact with. However, my strong guess is that we're likely to start seeing context

Starting point is 00:21:54 engineering also refer to a term for consumers and just regular LLM users. In the same way that we have increasingly taught ourselves or tried to teach ourselves how to prompt LLMs to get the most out of them, my guess is that context engineering in a user environment is going to become a more important field and discipline as well. What's the right amount of information to give any given model? Which models are better at different types of information? Indeed, one area where we have started to see this is in the release of O3 Pro. You'll remember that the piece from latent space that I thought was the best summary of O3 Pro was called God is Hungry for Context, and it basically argued that the big difference between O3 and O3 Pro was that O3Pro was better at handling

Starting point is 00:22:39 lots and lots of context. When the authors of this piece gave it a huge volume of information about their company, including past meeting notes and recorded audio, it came back with a much better strategy for them than O3 did alone. And so in that, we have context engineering from a user standpoint, both in terms of model selection and which model is going to be better at context, and second, in terms of what type of context to give it. I think it's an extremely dynamic field. I think it's likely going to be every bit, if not more important than prompt engineering in how we use these tools.

Starting point is 00:23:12 And I'm excited to share more about this as it becomes a bigger part of conversation. For now, though, that is going to do it for today's little baby primer on context engineering. I hope this was useful. Appreciate you guys listening and watching as always. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Context Engineering: What It Is and Why It Matters

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.