The AI Daily Brief: Artificial Intelligence News and Analysis - Context Engineering: What It Is and Why It Matters
Episode Date: June 26, 2025Context engineering quickly becomes a core skill for anyone working with large language models (LLMs) and AI agents. Unlike prompt engineering, which is about crafting single questions or requests, co...ntext engineering focuses on providing the right background, files, and environment so LLMs can solve your task.Get Ad Free AI Daily Brief: https://patreon.com/AIDailyBriefBrought to you by:Gemini - Supercharge your creativity and productivity - http://gemini.google/KPMG – Go to https://kpmg.com/ai to learn more about how KPMG can help you drive value with our AI solutions.Blitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months AGNTCY - The AGNTCY is an open-source collective dedicated to building the Internet of Agents, enabling AI agents to communicate and collaborate seamlessly across frameworks. Join a community of engineers focused on high-quality multi-agent software and support the initiative at agntcy.org Vanta - Simplify compliance - https://vanta.com/nlwPlumb - The automation platform for AI experts and consultants https://useplumb.com/The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Subscribe to the newsletter: https://aidailybrief.beehiiv.com/Join our Discord: https://bit.ly/aibreakdownInterested in sponsoring the show? nlw@breakdown.network
Transcript
Discussion (0)
This podcast is supported by Google.
Hey everyone, David here, one of the product leads for Google Gemini.
If you dream it and describe it, V-O-3 and Gemini can help you bring it to life as a video.
Now with incredible sound effects, background noise, and even dialogue.
Try it with a Google AI Pro plan or get the highest access with the Ultra Plan.
Sign up at Gemini.com to get started and show us what you create.
Today on the AI Daily Brief, what is context engineering and why does it matter?
matter. Before that in the headlines, a big victory for Anthropic when it comes to fair use.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
Hello, friends. Quick announcements today. Thank you first to our sponsors for today's show.
That would be Blitzy, Plum, Vanta, and Google Gemini. And of course, if you are interested in getting
an ad-free version of the show, go to patreon.com.com slash AI Daily Brief. Announcements are the same
as they've been for a while. We are deep in fall sponsorship discussions, so if you are
interested, hit me at NLW at Breakdown.network. We also will have some more super-intelligent news
soon, including some hiring, so keep an ear out for that. But there is a lot to talk about today,
including a new term, which you are going to hear a lot more. So with that, let's dive in.
Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around
five minutes. We kick off today with a fairly big victory for Anthropic in their copyright case
as a federal judge rules that AI training is fair use. Now, Anthropic is one of the many AI
labs that are fighting with authors and publishers over their use of copyrighted works and training
data. Each lab is running effectively the same argument that AI training is analogous to reading
and is therefore not a breach of copyright under fair use provisions. A federal judge has now accepted
that argument handing Anthropic and early victory in the case. Christina Frohawk, a professor of legal
writing at the University of Miami School of Law, explained, the court treats the AI as akin to a
human learning from copyrighted material. It's fair use if you and I pick up a book and read it and develop
our own thoughts. She said the court came to the same conclusion about AI models. In handing down
the ruling, the judge commented that the, quote, authors' complaint is no different than it would be
if they complained that training school children to write well would result in an explosion of competing
works. He noted that copyright law, quote, seeks to advance original works of authorship, not to
protect authors against competition. Now, all that said, this is only a partial victory for Anthropic.
The order was only decided on this extremely narrow piece of law, with a further
dispute in the case going to a separate trial. That trial will deal with what Anthropic
refers to as their central library, a corpus of all the books in the world, used to create
training datasets. As well as scanning in physical books, plaintiffs claim Anthropic pirated
seven million digital copies to create this repository. In a prelude of what's to come, the judge
wrote, this order doubts that any accused infringer could ever meet its burden of explaining why
downloading source copies from pirate sites that could have been purchased or otherwise
access lawfully, was itself reasonably necessary to any subsequent fair use. According to the
copyright law, willful infringement can carry a maximum penalty of 150,000 per work. If the court rules
that Anthropic breached copyright millions of times in pirating books, the fines could easily
bankrupt the startup. The judge noted that the fact that, quote, Anthropic later bought a copy of a book
it earlier stole off the internet, will not absolve it of liability for theft, but it may affect
the extent of statutory damages. One of the interesting points
to note is that the fair use ruling was based on Anthropics' AI outputs being transformative.
That is, the AI model wasn't capable of directly reproducing copyrighted works.
It was trained to create something new out of its training data.
In fact, the judge referred to Claude's outputs as exceedingly transformative,
noting, like any reader aspiring to be a writer,
Anthropics' LLMs trained upon works not to race ahead and replicate or supplant them,
but to turn a hard corner and create something different.
So ultimately, Anthropic is far from off the hook, and the law is far from settled,
but it is still a landmark ruling for the AI copyright question.
Importantly, this is only a ruling in federal court,
so it isn't binding in other cases and could still be appealed.
However, it can be used to persuade other judges
to follow this interpretation of the law for the time being.
Obviously, this is and remains a contentious area,
one in which I fully anticipate we will need to have end up
before the Supreme Court
before we actually finally know how it will be handled.
Next up, staying on the law train for a moment,
Sam Allman is fighting the I.O. lawsuit on
X. The legal battle surrounding Johnny Ives' AI device startup and the identically named Google
spin-off is starting to get a little bit nasty. Lawsuit filings are now circulating, but Sam
Alman decided to take his version of the story direct. Yesterday, he posted, Jason Rugolo had been
hoping we would invest in or acquire his company IO, IYO, and was quite persistent in his efforts.
We passed and were clear along the way. Now he is suing OpenAI over the name. This is silly,
disappointing and wrong. I made a lot of time to talk to Jason on his repeated outreaches because I like
helping founders. A few days before the lawsuit, he asked again for us to acquire his company, even after
we tried to pass just before. It is cool to try super hard to raise money or get acquired and to do
whatever you can to make your company succeed. It is not cool to turn to a lawsuit when you don't get
what you want. It sets a terrible precedent for trying to help the ecosystem. All that said, I wish Jason and his
team the best building great products. The world certainly needs more of that and less lawsuits.
OpenAI's legal filings, meanwhile, tell basically the same story.
That technical staff at IO, this new OpenAI division, met with Rugolo out of a sense of professional courtesy,
were unimpressed with a broken demo, and moved on to build something other than what they had seen.
In Rugolo's responses on Twitter, he basically said it was about the name.
In one post, he wrote, there are 675 other two-letter names that they can choose that aren't ours.
And basically, if you want to read on how the community thinks about this,
I think that on the one hand, after Sam shared these emails, the OpenAI side of the story that
they just weren't all that impressed, looks pretty resonant or at least true from what they
were discussing internally, but at the same time, people also kind of feel like, hey, did you
have to choose a name that close? Ultimately, my guess is that it's not worth the trouble,
and OpenAI just changes the name, but what do I know?
One more interesting thing on OpenAI, the company has quietly designed a productivity suite for
chat GPT, which could put them in direct competition with big backers like Microsoft. The features would
allow users to collaborate on documents and communicate with each other, similar to the functionality
of Microsoft Office or maybe even more directly Google Workspace. The information reports that no
decision has been made about launching the feature, but a release could drive a further wedge in OpenAI's
relationship with their backer Microsoft. In some ways, though, this is just a natural extension of the
canvas feature, which gives users a separate document window inside ChatGBTT, making the
assistant more useful in work settings. It would also allow OpenAI to compete to be an everything app.
Coincidentally, we got news earlier this week that XAI is working on a productivity suite as well.
So is this some big competitive change? Or is it just that all of these products are trending
towards the same direction and are going to have some similar features? I tend to think it's more
that than any sort of big new competition between these two frenemies.
Last up, a couple of product updates before we get to the main episode. First, Airtable have made a major
move relaunching as an AI-native app. CEO Howie Lou posted, instead of just adding more AI
capabilities to our existing platform, we treated this as a refounding moment for the company.
We started with a clean slate imagining of the ideal form factor for building apps in the
agenic era. The no-code database platform is now a fully functional vibe coding app as well.
Users can now use natural language to prompt apps into existence while integrating them
into Airtable's production-ready components. Lou gave the examples of creating a VC deal tracker that
does automated company research, or a marketing campaign manager that monitors all relevant
competitors. The AI integration also means you can easily run queries across your database.
For example, you can get the assistant to crunch thousands of support tickets to find
common pain points quickly. The rebuild also adds agentic functionality built in to help you
manage large data workflows. Howie wrote, when the cost of making and continually evolving apps
drops to zero, everything changes. Companies will build exactly what they need rather than settling
for rigid off-the-shelf software. The new default is AI,
generated apps plus built-in AI agents working 24-7. What's needed in this new era is a new form
factor and paradigm for software, the AI-native app platform. This is the new airtable. And what's
launching today is just the beginning. We're excited to release a slew of new AI-powered capabilities
in the months ahead. Sneak peek, generate any visualization, agents leveraging MCP, agentically sourced
datasets, and much, much more. Another company getting all agentic is 11 Labs, who have launched
a new voice AI assistant called 11AI.
The pitch is that this voice assistant has full MCP integration, so it can pull data from
services including perplexity, Slack, Gmail, and Google Calendar.
You can even connect your own MCP servers so the assistant can theoretically access anything
you want it to.
Functionally, this is pretty similar to the voice assistant that Anthropic announced earlier
last month alongside the launch of voice mode.
It's designed as a voice interface to access all sorts of AI functionality.
The advertising is even similar.
Anthropic advertised their product as being able to help power a young professional.
through their morning, while the 11 Labs ad followed a similar story, but featuring a young man
rolling out of bed with five minutes to spare until a web conference with his boss.
The assistant helped him delay the meeting over email, order a greasy breakfast, and remember
what his boss's pet is for small talk. The release came alongside the long-awaited mobile app so you can chat
to 11 Labs assistant on the go. Now, I'm not sure about the positioning of these assistants.
I am a little more skeptical than most of these sort of generalist consumer assistants, but I
could be very wrong. But still, once again, if some of the theme of the Open A.I.
Productivity Suite is the convergence of all of these platforms into one common set of features.
This is yet again another example of that.
Anyways, that is going to do it for today's AID Daily Brief Headlines edition.
Next up, the main episode.
This episode is brought to you by Blitzy.
Now, I talk to a lot of technical and business leaders who are eager to implement cutting-edge AI,
but instead of building competitive modes, their best engineers are stuck modernizing ancient
codebases or updating frameworks just to keep the lights on.
These projects like migrating Java 17 to Java 21 often means staffing a team for a year or more.
And sure, co-pilots help, but we all know they hit context limits fast, especially on large legacy systems.
Blitzy flips the script.
Instead of engineers doing 80% of the work, Blitzy's autonomous platform handles the heavy lifting,
processing millions of lines of code and making 80% of the required changes automatically.
One major financial firm used Blitzy to modernize a 20 million line Java code base in just three and a half months,
cutting 30,000 engineering hours and accelerating their entire roadmap.
Email Jack at blitzie.com with modernize in the subject line for prioritized onboarding.
Visit blitzie.com today before your competitors do.
Today's episode is brought to you by Plum.
You put in the hours, testing the prompts, refining JSON, and wrangling nodes on the canvas.
Now it's time to get paid for it.
Plum is the only platform designed for technical creators who want to productize their AI workflows.
With Plum, you can build,
share and monetize your flows without giving away your prompts or configuration.
When you're ready to make improvements, you can push updates to your subscribers with a single click.
Launch your first paid workflow at useplum.com. That's plum with a B and start scaling your impact.
Today's episode is brought to you by Vanta. In today's business landscape, businesses can't just claim
security, they have to prove it. Achieving compliance with a framework like SOC2, ISO-27-01, HIPAA, GDPR, and more,
is how businesses can demonstrate strong security practices.
The problem is that navigating security and compliance is time-consuming and complicated.
It can take months of work and use up valuable time and resources.
Vanta makes it easy and faster by automating compliance across 35-plus frameworks.
It gets you audit-ready in weeks instead of months and saves you up to 85% of associated costs.
In fact, a recent IDC White Paper found that Vanta customers achieved $535,000 per year in benefits,
and the platform pays for itself in just three months.
The proof is in the numbers.
More than 10,000 global companies trust Vanta.
For a limited time, listeners get $1,000 off at vanta.com slash NLW.
That's VANTA.com slash NLW for $1,000 off.
Welcome back to the AI Daily Brief.
Today we are talking about a term that you might have heard a little bit here and there on this show.
Maybe you've seen it start to appear more on X or in articles.
We're going to talk about what it means, why it's coming up more and more right now.
and why it matters for the industry as a whole.
To kick us off, let's turn to a recent tweet from Toby Luckie, the CEO of Shopify.
Last week, he wrote,
I really like the term context engineering over prompt engineering.
It describes the core skill better,
the art of providing all the context for the task to be plausibly solved by the LLM.
Now, a lot of folks jumped into the conversation to agree.
McKay Wrigley wrote totally agree.
These days you get way less performance bonus out of dumb tricks like I'll pay you $100
if you get it right, which is how it should be. All of the alpha is an assembling context well to
reduce the fog of war for the model. It's converging to human-ish info needs. Nick Dobo says soon
it, context engineering, will include providing the tools, agent environment, and guardrails so the
LLMs can find the context on its own. So basically what we have here is a different way to think
about how to get the most out of LLMs. Since the beginning of Chatsybt, there has been this new
field of prompt engineering, which has spawned innumerable courses and online tutorials,
and many tricks and tips and quirks of how to ask in the right way to get the things you need
out of LLMs. Now, along the way, prompt engineering has become more and more, let's say,
diffuse, if not at this stage, less important. And what I mean by that is that the smarter that
models get, the more that tips from six or 12 months ago cease to work. And in many cases,
there's also UI-related or interface-related abstraction of prompting, where some amount of prompt
engineering is being taken over by the tools themselves. To take one example, when I was designing a
cover for a recent episode, my prompt to ideogram was fun retro-futuristic cover for the quest for
the Solopreneur unicorn 1950s mid-century modern. However, what ideogram turned that into was a retro-futuristic
book cover in the style of 1950s mid-century modern design featuring a determined sharply dressed
solopreneur riding a majestic glowing white unicorn through a swirling nebula. The solopreneur wears a
tailored gray suit, a confident smile, and pilots the unicorn with a futuristic joystick, while the
unicorn's horn emits a beam of light, illuminating the path ahead. The background consists of
stylized geometric planets and stars rendered in a vibrant palette of teal, orange, and yellow,
with the title, the quest for the solopreneur unicorn boldly displayed in a classic chrome-accented
font. So you're seeing this type of thing happen a lot more in different tools where the tool
themselves are trying to take the essence of what you were asking for and do a better
prompt than you could do. Context is something different, and it refers to another part of the
value chain of these LLMs. Context is all of the information you give an LLM that helps it answer
the question more correctly. So, for example, if you are using ChatGPT's O3 or O3 Pro, which
is particularly optimized to be better at context, when you add a bunch of files to your prompt,
that is the context that you're giving it. Context engineering then becomes about, are you giving
the LLM the right information that it needs to give you the output that you're looking for?
And it turns out this isn't just about which documents to share with it. It's also literally
an engineering task around how to carry context across more complex systems. You might remember we
recently talked about a post from Cognition who creates Devin called Don't Build Multi-Agen's.
and this was all anchored around context and context engineering.
Basically, the argument in this piece, for those who don't remember,
was that the multi-agent workflow,
where an agent breaks down a task and hands it to multiple different sub-agents
with an agent that then combines the results on the other side,
is one that is doomed to be fairly brittle
because the transmission of context from agent to sub-agent
and then sub-agent back to agent can be really difficult.
The example he gave was this.
Suppose your task is built a Flappy Bird clone,
This gets divided into Subtask 1, build a moving game background with green pipes and hitboxes,
and Subtask 2, build a bird that you can move up and down.
It turns out Sub-Agent 1 accidentally mistook your sub-task and started building a background that
looks like Super Mario Bros.
Sub-Agent 2 built you a bird, but it doesn't look like a game asset, and it moves nothing
like the one in Flappy Bird.
Now the final agent is left with the undesirable task of combining these two miscommunications.
Now he goes to some potential solves, but still finds them unreliable, and ultimately,
comes to the idea of instead building a single-threaded linear agent.
In the cognition model, the agent breaks down the task
and breaks it into subtasks rather than sub-agents.
So the same agent does the breaking down of the task,
then the doing of Sub-Task 1 and Sub-Task 2,
and then combines the results,
with the idea being largely that this carries context
between the different tasks better than the other multi-agent system.
Here the context is continuous.
At the same time, they recognize
that as very large tasks start to have so many subparts that context windows start to overflow,
that there may be a need for a new approach.
One architecture that they share is the idea of a side-long context compression LLM,
which basically across each stage compresses the conversation and action so far, i.e. the context,
into a set of key moments and decisions, with that compressed context being what informs the next
subtasks work.
Now, whether you agree with this strategy or not is not the point of this piece.
it's to show how context engineering is starting to become a part of some of the most important
questions in AI, which has to do with how to build agents that are actually highly functional.
And if you look around for about five minutes, we are seeing a ton of discussion of context
engineering pop up. Just a couple of days ago, Lance Martin wrote on their blog, a post called
Context Engineering for Agents. Lance writes, as Andre Carpathy puts it, LLMs are a kind of new
operating system. The LLM is like the CPU, and its context window is like RAN,
representing a working memory for the model. Context enters an LLM in several ways,
including prompts, user instructions, retrieval, eG documents, and tool calls, eG APIs.
Just like RAM, the LLM context window has limited communication bandwidth to handle these various
sources of context, and just like an operating system curates what fits into a CPU's RAM,
we can think about context engineering as packaging and managing the context needed for an LLM
to perform a task. So once again, this is coming at that
same issue that we saw in the Cognition blog of having to engineer systems that get the right
context, but don't just dump everything in willy-nilly. What Lance points out is the growing importance
of this domain. He points to a quote from cognition again who writes context engineering is effectively
the number one job of engineers building AI agents, and another quote from Anthropic that read,
agents often engage in conversations spanning hundreds of turns, requiring careful context management
strategies. Now, the second part of this blog is all about the ways that we can manage that
context and new strategies for that sort of context management, which is a little bit more technical
and out of scope for this particular show, but I will include this in the show notes so you can
go check it out for yourself. Lance talks about curating context, i.e. managing the tokens that an
agent sees at each turn, persisting context, involving systems to store, save, and retrieve
context over time, and isolating context involving approaches to partition context across agents
or environments. Lance points out that we are still at the very beginning early baby steps for
forming general principles for building agents, and that's why there's such an explosion in this
discussion. Another post that was published on the same day comes from the Langchain blog,
and is called the rise of context engineering. The piece reads,
Context Engineering is building dynamic systems to provide the right information and tools in the
right format, such that the LLM can plausibly accomplish the task.
Most of the time when an agent is not performing reliably, the underlying cause is that
the appropriate context, instructions, and tools have not been communicated to the model.
LLM applications are evolving from single prompts to more complex dynamic agentic systems.
As such, context engineering is becoming the most important skill an AI engineer can develop.
And again, this piece really reiterates that when agentic systems mess up and LLMs tend to
mess up, either because they're just not good enough or because it didn't have the appropriate
context. What's more, the author argues that as models get better, it tends to be more that
second reason. The author concludes, context engineering isn't a new idea. Agent Builders have been
doing it for the past year or two. It's a new term that aptly describes an increasingly important
skill. So I think that there are actually two different domains of context engineering that are
worth us keeping in mind and that are worth you and I exploring. The first is context engineering
in the context of AI engineers and actual agent building. In other words, for people who are building
agentic systems, software engineers that are thinking about how to make agents more performant and
work on higher complexity and higher order tasks, these questions of context engineering are about
system design. They're about things like the context compression LLM that sits alongside a single
agent system and makes it work better. There is a whole entire important discourse happening in that
domain that will influence the shape of the agents that even non-coders and non-technical people
ultimately interact with. However, my strong guess is that we're likely to start seeing context
engineering also refer to a term for consumers and just regular LLM users. In the same way that we
have increasingly taught ourselves or tried to teach ourselves how to prompt LLMs to get the most out of them,
my guess is that context engineering in a user environment is going to become a more important
field and discipline as well. What's the right amount of information to give any given model?
Which models are better at different types of information? Indeed, one area where we have started
to see this is in the release of O3 Pro. You'll remember that the piece from latent space that I
thought was the best summary of O3 Pro was called God is Hungry for Context, and it basically
argued that the big difference between O3 and O3 Pro was that O3Pro was better at handling
lots and lots of context. When the authors of this piece gave it a huge volume of information
about their company, including past meeting notes and recorded audio, it came back with a much
better strategy for them than O3 did alone. And so in that, we have context engineering from a
user standpoint, both in terms of model selection and which model is going to be better at context,
and second, in terms of what type of context to give it.
I think it's an extremely dynamic field.
I think it's likely going to be every bit, if not more important than prompt engineering
in how we use these tools.
And I'm excited to share more about this as it becomes a bigger part of conversation.
For now, though, that is going to do it for today's little baby primer on context engineering.
I hope this was useful.
Appreciate you guys listening and watching as always.
And until next time, peace.
