Latent Space: The AI Engineer Podcast - Context Engineering for Agents - Lance Martin, LangChain
Episode Date: September 11, 2025Lance: https://www.linkedin.com/in/lance-martin-64a33b5/How Context Fails: https://www.dbreunig.com/2025/06/22/how-contexts-fail-and-how-to-fix-them.htmlHow New Buzzwords Get Created: https://www.dbre...unig.com/2025/07/24/why-the-term-context-engineering-matters.htmlContent Engineering: https://rlancemartin.github.io/2025/06/23/context_engineering/ https://docs.google.com/presentation/d/16aaXLu40GugY-kOpqDU4e-S0hD1FmHcNyF0rRRnb1OU/edit?usp=sharingManus Post: https://manus.im/blog/Context-Engineering-for-AI-Agents-Lessons-from-Building-ManusCognition Post: https://cognition.ai/blog/dont-build-multi-agentsMulti-Agent Researcher: https://www.anthropic.com/engineering/multi-agent-research-systemHuman-in-the-loop + Memory: https://github.com/langchain-ai/agents-from-scratch- Bitter Lesson in AI Engineering -Hyung Won Chung on the Bitter Lesson in AI Research: Bitter Lesson w/ Claude Code: Learning the Bitter Lesson in AI Engineering: https://rlancemartin.github.io/2025/07/30/bitter_lesson/Open Deep Research: https://github.com/langchain-ai/open_deep_research https://academy.langchain.com/courses/deep-research-with-langgraphScaling and building things that “don’t yet work”: - Frameworks -Roast framework at Shopify / standardization of orchestration tools: MCP adoption within Anthropic / standardization of protocols: How to think about frameworks: https://blog.langchain.com/how-to-think-about-agent-frameworks/RAG benchmarking: https://rlancemartin.github.io/2025/04/03/vibe-code/Simon’s talk with memory-gone-wrong: https://simonwillison.net/2025/Jun/6/six-months-in-llms/Full Video EpisodeTimestamps00:00 Introduction and Background00:53 The Rise of Context Engineering01:57 Context Engineering vs Prompt Engineering05:56 The Five Categories of Context Engineering10:02 Multi-Agent Systems and Context Isolation14:48 Classical Retrieval vs Agentic Search17:12 LLMs.txt and MCP Servers24:51 Context Pruning and Memory Management37:25 Memory Systems and Human-in-the-Loop42:55 The Bitter Lesson Applied to AI Engineering51:21 Frameworks, Abstractions, and Building for the Future This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Transcript
Discussion (0)
Hey, everyone. Welcome to the Latinspace podcast.
This is Alessio, Founder of Kernel Labs, and I'm joined by Swix, Fundera of Small AI.
Hello, hello. We are so happy to be in the remote studio with Lance Martin from Langchain, Landgraf, and everything else he does. Welcome.
It's great to be here. I'm a long-time listener to the pod, and is finally great to be on.
You've been part of our orbit for a while. You spoke at one of the AIEs, and also, obviously, we're pretty close with Langchain.
recently though you've also like been doing a lot of tutorials i remember you did like r1 deep researcher
which is a pretty popular project and async ambient agents but the thing that really sort of prompted me
to reach out and say like okay it's finally time for the lance martin pod is your recent work on
context engineering which is all the rage uh how'd you get into it well you know it's funny buzzwords
emerge oftentimes when people have a shared experience and i think lots of people started building
agents, kind of early this year, mid this year, quote unquote, the year of agents. And I think
what happened is when you kind of put together an agent, it's just tool-clogging a loop,
it's relatively simple to lay out. But it's actually quite tricky to get to work well.
In a particular, managing context with agents is a hard problem. Carpath had put out that tweet,
canonizing the term, context engineering. And he kind of mentioned this nice definition,
which is context of engineering is the challenge of feeding an LM just the right context for the next step,
is highly applicable to agents. And I think that really resonated with a lot of people.
I in particular had that experience over the past year working on agents, and I wrote about that
a little bit in my piece talking about building open deep research over the past year.
So I think it was kind of an interesting point that the term capture a common experience
that many people were having, and it took hold because of that.
How do you define the lines between prompt engineering and like context engineering? So is the prompt
optimization, like context engineering?
in your mind?
Like, I think people are confused.
Like, are we replacing the term?
Like, what is it?
Well, I think that, you know, prompt engineering is kind of a subset of contact engineering.
I think when we kind of move from chat models and chat interactions to agents, there's a big shift
that occurred.
So with chat models, working on chat GPT, the human message is really the primary input.
And, of course, a lot of time and effort is spend and crafting the right message that's passed
to the model.
With agents, the game is a bit trickier, though, because the age.
agent's getting context not just from the human, but now context is flowing in from tool calls
during the agent trajectory. And so I think this was really the key challenge that I observed
and many people observed is like, oof, when you put together an agent, you're not only managing,
of course, the system instructions, system prompt, and of course user instructions. You also have to
manage all this context that's flowing at each step over the course of a large number of tool calls.
And I think there's been a number of good pieces on this. Manus put out a great piece,
talking about contact engineering with madness.
And they made the point that the typical
manis task is like 50 tool calls.
Anthropics multi-agent research is another nice example of this.
They mentioned that the typical production agent,
and this is probably referring to Cloud Code,
could be other agents that they've produced,
is like hundreds of tool calls.
When I had my first experience with this,
and I think many people have this experience,
you put together an agent,
you're sold the story that's just tool calling in a loop.
That's pretty simple.
You put it together.
I was building deep research.
These research tool calls
are pretty token heavy.
And suddenly you're finding that
my deep researcher, for example,
with a naive tool calling loop
was using 500,000 tokens.
It was like $1 to $2 per run.
I think this is an experience
that many people had.
And I think it's kind of that
the challenge is realizing
that, oof, building agents
is actually a little bit tricky
because if you just naively
plumb in the context
from each of those tool calls,
naively, you just hit the context window
of the LM,
That's kind of the obvious problem.
But also, Jeff from Chrome out spoke about this on the recent pod.
There's all these weird and idiosyncratic failure modes as context is longer.
So Jeff has that nice report on context rot.
And so you have both these problems happening.
If you build a naive agent, context is flowing in from all these tool calls.
It could be dozens to hundreds.
And there's degradation and performance with respect to context length.
And also the trivial problem with hitting the context window itself.
So this was kind of, I think, the most.
motivation for this new idea of actually it's very important to engineer the context that you're
feeding you to an agent. And that spawned into a bunch of different ideas that I put together
in the blog posts that people are using to handle this, drawn from Anthropic, from my own experience,
from Manus and others. So I'm just going to put some of the relevant materials on screen just because
we like to, you know, part this is going to do. We'd like to have some visual aid. We did our
posts on 55 and we call it Thinking with Tools. So we're part of the
tools is to get context. And I think using tools to obtain more context, like the agent can
figure out what context it needs and if you just tell it to. And then the other one is, actually,
I thought you did a blog post on this, but apparently it was just like, this is it.
I will say it's funny. And actually, I was hoping you'd bring this up. I also have a blog post,
but it's all moving so quickly that I did a meetup after the blog post and updated the story
a little bit with this meetup. So actually, this is a better thing to show. But I do have a
blog post too. But things changed between my blog post and the meetup, which were like two weeks
apart. So that's how quickly these things are moving. Exactly. That's the blog post.
Should we do this sequentially then? I think it's actually okay to just hit the meetup.
Because it's just easier to follow one thing. And it's like a super set of the blog post story.
Okay. How do you define the five categories? So, I mean, I understand what offload kind of means,
but like can you maybe, yeah, go deeper. Yeah, yeah. We should, let's walk through these, actually.
When I talked about naive agents, and the first time I built an agent,
agent makes a bunch of tool calls.
Those tool calls are passed back to the LLM at each turn,
and you naively just plumb all that context back.
And of course, what you see is the context window grows significantly
because this tool feedback is accumulating in your message history.
A perspective that Manna shared in particular I thought was really good.
It's important and useful to offload context.
Don't just naively send back the full context of each of your tool calls.
you can actually offload it,
and they talk about offloading it to disk.
So they talk about this idea of using the file system
as externalized memory,
rather than just writing back the full concept of your tool calls,
which could be token-heavy,
write those to disk,
and you can write back a summary,
it could be a URL,
something so that the agent knows it's retrieved a thing.
It can fetch that on-demand,
but you're not just naively pushing
all that raw context back to the model.
So that's this off-loading concept.
Note that it could be a file system,
It could also be, for example, agent state.
So Langraph, for example, has this notion of state.
So it could be kind of the agent runtime state object.
It could be the file system.
But the point is you're not just plumbing all the context from your tool calls back into the agent's message history.
You're saving it an externalized system.
You're fetching it as needed.
This saves token costs significantly.
So that's the offloading concept.
I guess the question on the offloading is like,
What's the minimum summary metadata or whatever you need to keep in the context to let the model
understand what's in the offloaded context?
Like if you're doing deep research, obviously you're offloading kind of like the full pages maybe,
but like how do you generate like an effective summary or blurb about what's in the file?
This is actually a very interesting and important point.
So I'll give an example from what I did with Open Deep Research.
So Open Deep Research is a deep research agent that I've been working on for about a year.
And it's now according to Deep Research spends the best.
best-performing deep research agent, at least on that particular benchmark.
So it's pretty good.
Listen, it's not as good as open-eyed deep research, which uses end-end-r-r-l.
It's all fully open-source, and it's pretty strong.
So I just do carefully prompted summarization.
I try to prompt the summarization model to give an exhaustive set of kind of bullet points of the
key things that are in the post, just so the agent can know whether to retrieve the full
context later.
So I think it's kind of prompting, if you're doing summarization, carefully for recall, compressing it, but like making sure that all the key bullet points necessary for the LLM to know what's in that piece of full context is actually very important when you're doing this kind of summarization step.
Now, cognition had a really nice blog post talking about this as well.
And they mentioned you can really spend a lot of time on summarization, so I don't want to trivialize it.
but at least my experience has been it's worked quite effectively.
Prompt a model carefully to capture exactly.
So in this post, they talk a lot about even using a fine-tuned model for performing
summarization.
In this case, they're talking about agent-to-agent boundaries and summarizing, for example,
message history, but the same challenges apply to summarizing, for example, the full contents
of token-heavy tool calls so the model knows what's in context.
I basically spent a lot of time prompt engineering to make sure my summaries
capture with high recall what's in the document,
but compress the content significantly.
I do think that the compression,
that was also part of the meetup findings of yesterday
where we were at the context engineering meetup
that Kroma hosted,
that you do want frequent compression
because you don't want to hit the context route limit.
I'm not sure there's much else to say.
Offloading is important, and you should probably do it.
There was also a really interesting link.
I guess somebody, I think Dex was linking it
to the concept of multi-agents,
why you do want multi-agents is because you can compress and load in different things based on the role of the agent.
And probably a single agent would not have all the context.
That's exactly right.
And actually, one of the other big themes my head and talk about quite a bit is context isolation with multi-agent.
And I do think this does link back to the cognition take.
So, which is interesting.
So their argument against multi-agent is literally called don't-dole multi-agent.
Correct.
and what they're arguing is a few different things.
One of the main things is that it is difficult to communicate sufficient context to subagents.
They talk a lot about spending time on that summarization or compression step.
They even use a fine-tuned model to ensure that all the relevant information,
so they actually show it a little bit down below as kind of a linear agent,
but even at those agent-to-agent boundaries, they talk a lot about being careful about how you compress information
and pass it between agents.
Yeah, I think the biggest question for me, coding is kind of like the main use case that I have.
And I think I still haven't figured out how much of value there is in showing how the implementation was made to then write, if you have a sub agent that writes tests or you have a type agent that does different things.
How much do you need to explain to it about how you got to the place the code basis in versus not?
and then does it only need to return the test back in the context of the main agent?
If it has to fix some code to match the test, should it say that to the main agent?
I think that's kind of like, it's clear to me like the deep research case because it's kind of like atomic pieces of content that you're going through.
But I think when you have state that depends between the subagents, I think that's the thing is still unclear to me.
That's one of the most important points about this context isolation kind of bucket.
So cognition argues, which actually I think is a very reasonable argument, they argue don't do subagents because each subagent implicitly makes decisions and those decisions can conflict.
So you have subagent one doing a bunch of tasks, subagent two are doing a bunch of tasks.
Those kind of decisions may be conflicting and then when you could try to compile the full result, in your example of coding, there could be tricky conflicts.
I found this to be the case as well.
And I think a perspective I like on this is use multi-agent in cases where there's very clear and easy parallelization of tasks.
Cognition and Walden Yan spoke on this quite a bit. He talks about this idea of kind of read versus write tasks.
So, for example, if each sub-agent is writing some component of your final solution, that's much harder.
They have to communicate like you're saying.
An agent-to-agent communication is still quite early. But with deep research, it's really
only reading. They're just doing context collection, and you can do a write from all that shared
context after all the subagents work. And I found this worked really well for deep research,
and actually anthropic report on this too. So their deep researcher just uses parallelized
subagents for research collation, and they do the writing in one shot at the end. So this works great.
So it's a very nuanced point that what you apply context isolation to in terms of the problem,
Yes, you can see this is their work,
matters significantly.
Coding may be much harder.
In particular, if you're having each sub-agent
create one component of your system,
there's many potentially implicitly conflicting decisions
each of the sub-agents are making.
When you try to compile a full system,
there may be lots of conflicts.
With research, you're just doing context-gathering
in each those sub-agent steps,
and you're writing in a single step.
So I think this was kind of a key tension
between the cognition take,
don't do multi-agents,
and the anthropic take,
hey, multi-agents work really well,
it depends on the problem
you're trying to do with multi-agents.
So this was a very subtle
and interesting point.
What you apply multi-agents to
manage tremendously and how you use them.
I like the take that apply multi-agents
to problems that are easily paralyzable
that are read-only,
for example, context gathering for deep research,
and do like the final, quote-unquote,
write, in this case, report writing,
at the end. I think this is trickier
for coding agents.
I did find it interesting
that Claude Code
now allows for sub-agents.
So they obviously
have some belief
that this can be done well.
At least it can be done.
But I still think
I actually kind of agree
with Walden's take.
It can be very tricky
in the case of coding
if sub-agents
are doing tasks
that need to be highly coordinated.
I think that's
a well-explained
contrast in comparison.
Not much to add there.
I think it's interesting
that they have different
use cases and different
architectures evolved.
I don't know
if that's a permanent thing,
that might fall to the bitter lesson,
as you would put it.
Yes.
We should probably talk about
some of the other parts
of the system that you set up.
Yeah.
Because there's a lot of interesting techniques there.
Let's talk about classic old retrieval.
So RAG is obviously,
it has been in the air for now many years,
obviously well before LMs
and this client pole wave.
One thing I found pretty interesting
is, for example,
different code agents
take very different approaches to retrieval.
Varroon from Winsurf
shared an interesting perspective
on how they approach retrieval
in the context of Winsurf.
So they use classic
co-chunking along carefully
designed semantic boundaries,
embedding those chunks.
So classic kind of semantic similarity
vector search and retrieval.
But they also combine that with, for example,
Grip. They then also mentioned knowledge graphs.
They then talk about combining those results,
doing your ranking,
So this is kind of your classic, complicated, multi-step rag pipeline.
Now, what's interesting is Boris from Anthropic in CloudCode has taken a very different approach.
He's spoken about this quite a bit.
Clock code doesn't do any indexing.
It's just doing, quote-unquote, agentic retrieval, just using simple tool calls,
for example, using grep, to kind of poke around your files, no indexing whatsoever,
and obviously works extremely well.
So there's very different approaches to kind of rag and retrieval that different code agents are taking.
And this seems to be kind of an interesting and emerging theme.
Like when do you actually need more hardcore indexing?
When can you just get away with simple, just kind of agenic search using very basic file tools?
Yeah, one of the more viral moments from one of our recent podcasts was Boris is part with us.
And Klein also mentioning that they just don't do code indexing.
they just use agentic search.
And that's a really good 80-20.
And then if you really want to fine-tune,
probably you want to do a little mix,
but maybe you don't have to do it for your needs.
Yeah, I actually just saw Klein posted, I think,
yesterday talking about that they only use crap.
They don't do indexing.
And so I think within the retrieval area of context engineering,
there are some interesting tradeoffs you can make
with respect to are you doing kind of classic vector store-based
semantic search and retrieval with a relatively complicated pipeline
like Veroon's talking about the windsurf,
or just good old kind of agenic search with basic file tools.
I will note I actually did a benchmark on this myself.
I think there's a shared blog post somewhere.
I'll bring it up right now.
Yep.
I actually looked at this a bit myself.
This was a while ago.
But I compared three different ways to do retrieval on all Langraph documentation
for a set of 20 coding questions related Langraph.
So I basically wanted to allow different code agents to write landgraft for me by retrieving from our docs.
I tested Claude Code and cursor.
I used three different approaches for grabbing documentation.
So one was I took all of our docs around 3 million tokens.
I indexed them in the vector store, just did classic old vector store search and retrieval.
I also use an elms.coms.
With just a simple file loader tool.
So that's kind of more like the agenic search,
just basically look at this LNDOT text file,
which has all of the URLs of our documents
with some basic description,
and let the LM, or the code agent in this case,
just make tool calls to fetch specific docs of interest.
And I also just tried context stuffing.
So take all the docs, 3 million tokens,
and just feed them all to the code agent.
So there's just some results I found
comparing Claude Code to Cursor.
And interesting, what I actually found,
this is only my particular test case,
but I actually found that LM. Text with good descriptions,
which is just very simple,
it's just basically a markdown file with all the URLs of your documentation,
and like a description of what's in that dock,
just that passed to the code agent with a simple tool just to grab files
is extremely effective.
And what happens is the code agent can just say,
okay, here's the question, I need to grab this doc and read it,
I'll read it, I need grab this doc, read it, read it,
This worked really well for me, and I actually use this all the time.
So I actually personally don't do vector store indexing.
I actually do LM. Text with a simple search tool with Claude code is kind of my go-to.
Cloud code, in this case, this was done a few months ago.
These things are always changing.
In this particular point in time, Claudecode actually outperform cursor for my test case.
This actually Claude code pilled me, and this was, I did this back in April.
So I've been kind of on Claude Code since.
But that was really it.
So this kind of goes to the point that Boris has been making about Cloud Code, about
and client as well.
You give an LM access to simple file tools.
In this case, I actually use an LM. Text to help it out
so it can actually know it's in each file.
It's extremely effective and much more simple and easier to maintain than building an index.
So that's just my own experience as well.
The skilled up form of LM.coms, which I really like and I use quite a bit is actually the deep wiki from Cognition.
So I made a little Chrome extension for myself where I, like any repo, including yours,
I can just hit eWiki.
And this is an LLMs.txte, kind of,
but also I read it.
This is a great example.
And actually, I think this could be a very nice approach.
Take a repo, compile it down to some kind of easily,
kind of readable.
Yeah, lm.
Dot text.
What I actually found was even using an LM to write the descriptions helped a lot.
So I have actually a little package on my GitHub
where it can rip through documentation
and just pass it to a cheap LLM
to write a high-quality summary of each doc.
this works extremely well.
And so that element of text then has LLM generated.
Yeah, this one.
This is a little repo.
It got almost no attention,
but I found it to be very useful.
So basically it's trivial.
You just point it to some documentation.
It can kind of rip through it,
grab all the pages,
send each one to an LLM,
and LLM writes a nice subscription,
compiles it into an LM.
dot text file.
I found when I did this,
and I fed that to ClaudeCode,
Claudecode is extremely good at saying,
okay, based on the description, here's the page I should load.
Here's the page I should load from the question asked.
I use it when I'm trying to generate element of text for new documentation.
But I've done this for Landgraf.
I've done it for a few other libraries that I use frequently.
You just give that to Cloud Code.
Then Cloud Code can rip through and grab docs really effectively.
Super simple.
The only catch is I found that the descriptions in your element of text matter a lot
because the LM actually has to use the descriptions to know what to read.
You know, anyway, that's just a nice little.
utility that I use all the time.
When we had a client that said the Context 7 MCP by Upstash, which is like an MCP for like project
documentation and stuff like that was one of the most used.
Have you seen, have you tried it?
Have you seen anything else like that that automates some of this stuff away?
Well, you know, it's funny.
We have an MCP server for Langrap documentation that basically gives, for example, Cloud
code, the NLM.
That text file and a simple file search tool.
Now, Cloud has built-in fetch tools, but at the time we've been.
built it, it didn't. But it's a very simple MCP server that exposes elm.com to, for example,
Cloud Code. It's called MCP doc. So it's a little, very simple utility. I use that all the time,
extremely useful. She basically can just point it to all the LM. Text files you want to work with.
Well, the MCP docs have an MCP server that you can search the docs with. So it's kind of throws all the
way down. But I guess my question is like, should this be like one server per project,
you know, or like at some point you're going to have kind of like a meta server. And I think
part of it is, you know, once you move on from just doing tool calling in servers to doing things
like sampling and kind of like, you know, prompts and resources and stuff like that, you can do
a lot of the extraction in the server itself as well. And again, it goes back to like your point on
context engineering. It's like maybe you do all that work, not in the context, but in the server.
and then you just put the final piece that you care about in the context.
But it seems like very early.
Yeah, this is actually a very interesting point.
I've spoken with folks from Anthropic about this quite a bit.
It is I found that storing prompts in MCP servers is actually pretty important,
but in particular to tell the LM or code agent how to use the server.
And so I actually end up doing kind of separate servers for different projects with specific prompts.
And also sometimes I'll have, you can also sort of resources.
So some of the lot of specific resources for that particular project in the server itself.
So I actually don't mind separating servers project-wise with project-specific kind of context and prompts necessary for that particular task.
Yeah, a lot of people actually may have missed some features of the NCP spec, and you do have prompts in there.
It's probably the first actual features that they have, which actually may be kind of underrated.
Like people kind of view MCP as just in tool integration,
but there's actually a lot of stuff in here,
including sampling, which is underrated too.
That's exactly right.
And actually, the prompting thing is pretty important
because even to use our little simple MCP dock server for Langraph docs,
you actually, I found it's better, of course, if you prompt it.
But then I had to put in the read me initially, like,
oh, okay, here's how you should prompt it.
But of course, that prompt can just live in the server itself.
So you can kind of compartmentalize it.
the prompt necessary for the LM to use the server effectively within the server itself.
And this was a problem I saw initially.
A lot of people were using our MCP doc server and the finding, oh, this doesn't work well.
And it's like, oh, it's a skill issue.
You need to prompt it better.
But then that's our problem.
The prompt should actually live in the server and should be available to the code agent.
Right.
So it knows how to use a server.
Right.
So that's maybe retrieval.
And that's a whole.
Retrieval is a big theme.
It obviously predates this new term of context engineering, but there's a lot going on in
the retrieval bucket.
it certainly is an important subset of context
engineering. I'm wondering if there's any other
trends in retrieval before you leave the topic.
You know, I think one other thing
I was tracking was just
Colbert and like the general
concept of late interaction.
I don't know if you guys do a ton
on that, but some
sort of in-between element between
full agentic and full
pre-indexing and two-phase
indexing maybe is what I would call it.
Any comments on that?
I haven't personally looked at Colbert very much.
I play with it only a little bit.
So I don't have much perspective there, unfortunately.
All right, happy to move on.
We could talk about me reducing context briefly.
Everyone's had any experience with this,
because if you use cloud code, you hit that 95,
you know, you've hit 95% of the context window
and you're about to, and cloud code's about to perform compaction.
So that's like a very intuitive and obvious case
in which you want to do some kind of context reduction
when you're near the context window.
I think an interesting take here, though,
is there's a lot of other opportunities
for using somersation.
We talked about it a little bit
previously with offloading,
but actually at tool call boundaries
is a pretty reasonable place
to do some kind of compaction or pruning.
I use that in Open Deep Research.
Hugging Face actually has a very interesting
open deep research implementation.
It actually uses, like,
not a coding agent,
but the code agent, agent implementation.
So instead of tool calls as JSON,
tool calls are actually code blocks.
They go to a coding environment
that actually runs the code.
and one argument they make there is that they perform some kind of summarization or compaction
and only send back limited context to the LLM, leave the raw tool call itself, which is often token heavy as we're talking about deep research, in the environment.
So it's another example. Anthropica and their multi-agent researcher also does summarization of findings.
So I think you see pruning show up all over the place.
it's pretty intuitive.
I think an interesting counter to pruning was made by Manus.
They make the point and the warning that pruning comes with risk, particularly if it's irreversible.
And cognition kind of hits this too.
They talk about we have to be very careful with summarization.
You can even fine-tuned models to do it effectively.
That's actually why Manus kind of has the perspective that you should definitely use context offloading.
so perform tool calls, offload the raw observations to, for example, disk, so you have them,
then sure, do some kind of pruning, summarization, like Alessio was asking before, to pass back to the LM,
useful information, but you still have that raw context available to you, so you don't have
kind of lossy compression or lossy summarization. So I think that's an important and useful
caveat to note on the point of summarization or pruning. You have to be careful about information
loss. This is something that people do disagree on, and I'll just flag this, on pruning mistakes,
pruning wrong paths. Manus says keep it in, and so you can learn from the mistakes.
So other people would say that, well, once you've made a mistake, it's going to keep going down
that path that there's a mistake, you got to unwind. Or you just got to like prune it and tell it,
do not do the thing I know to be wrong. So then you just do the other thing. I don't know if you
have an opinion, but I would call this out. There was someone that spoke yesterday.
have disagreed with this. That's actually very interesting. Drew Brunick has a nice blog post that
hits this point. He talks about this theme of context poisoning, and apparently Gemini reports on this
in their technical report. He talked about, for example, a model can perform a hallucination,
and that hallucinations is stuck in the history of the agent. And it can kind of poison the context,
so to speak, and kind of steer the agent off track. And I think he cited a very specific example from
Gemini 2-5 playing Pokemon, they mentioned in the technical report. So that's why. We're
one perspective on this issue of we should be very careful about mistakes and context that can
poison the context. That's perspective one. Perspective two is like you're saying is if an agent
makes a mistake, for example, calling a tool, you should leave that in so it knows how to correct.
So I think there is an interesting tension there. I will note it does seem that Claude Code will
leave failures in. I notice when I work with it, for example, it'll kind of have an arrow,
the arrow will get printed, and it'll kind of use that to correct. So, and in my experience,
and work with agents in particular, for tool call errors, I actually like to keep them in,
personally. That's just been my experience. I don't try to prune them. Also, for what it's worth,
it can be kind of tricky to prune from the content, from the message history. You have to decide
when to do it. So if you're introducing a bunch more code, you have to manage. So I'm not sure I love
the idea of kind of selectively trying to prune your message history and you're building an agent.
It can add more logic you need to manage within your kind of agent scaffolding or harness.
It's a classic sort of precision recall, but like sort of reinvented for context in an agenetic workflow.
Exactly. Exactly. Right.
Well, we're on a topic of Drew. Drew is obviously another really good author. He's coined a bunch of like sort of context engineering lore.
Any other commentary on stuff that, you know, you particularly like or disagree with?
So he and I did did a meet up on this. And I kind of like this quote from Stuart Brand. It was kind of comical.
If you want to know where the future is being made, look for where language is being invented and lawyers are congregating.
And it was talking about this, this idea of why buzzwords emerge. And he actually was the one who turned me onto this idea that a term like context engineering catches fire because it captures an experience that many people are having. They don't come out of nowhere. And if you scroll down a little bit, he kind of talks about this. He's a whole post about kind of, I think it's how to build a buzzword. But he talks a lot about this idea of kind of successful buzzwords are capturing a common experience that many of us feel. And I think that's kind of the genesis of context of engineering is also largely because,
many of us build agents.
There's lots of ways
that can be quite tricky.
And, oh,
contact engineering is kind of what I've been doing.
And you hear a number of people saying,
and then it kind of resonates.
And you say, oh, okay, yes.
That describes my experience.
So I think that's just an interesting aside
on kind of how language emerges
anthropologically in different communities.
I will co-sign this
because that's exactly what I use to coin
or come up with AI engineer.
A engineer.
No, exactly.
This is because people were trying to hire software engineers that were more optimistic with the AI,
and engineers wanted to work at companies that would respect their work, you know,
and maybe also come out from the baggage of classical ML engineering.
A lot of AI engineers don't even need to use PyTorch because you can just prompt and do typical software engineering.
And I think that's probably the right way, at least in a world where most of the frontier models are coming from closed labs.
I think an interesting counter on this is when you, for example, people try to create language that doesn't really resonate, that doesn't capture common experience, it tends to flop.
So, which is to say that buzzwords kind of co-evalve with the ecosystem, they tend to kind of become big and resonate because they actually capture experience.
Many people try to coin terms that don't actually resonate that go nowhere.
Do you have experience with that?
I'm the worst that naming things, but you do a good job, Sean.
Yes.
You nailed it, the few ones who put on Layton's face.
So that's right.
Cool.
Well, you know, I wanted to talk about context engineering.
Okay, so, sorry, I don't know if I sidetracked you a little bit.
No, that's perfect.
The meta stuff on Thali.
That hits a lot of the major themes.
I can maybe just talk very brief about one more.
We could talk about bitter lesson and some other things.
Yeah.
If you go back to that table, I just wanted to give Manus a shout
because I thought they had one other very interesting point.
Oh, the table that you had.
Yes, exactly.
We've talked about offloading,
reducing context, retrieval,
contact isolation.
Those are, I think,
the big ones you can see
very commonly used.
I do want to highlight Maness.
I thought they had a very interesting take
here about caching,
and it's a good argument.
When people have the experience
of building an agent,
the fact that it runs in a loop
and that all those prior tool calls
are passed back through every time
is quite like a shock
the first time you'll an agent.
You have one token every tool call
and you incur that token cost every pass through your agent.
And so Manus talks about the idea of just caching your prior message history.
It's a good idea.
I haven't done it personally, but seems quite reasonable.
So caching reduces both latency and costs significantly.
But don't most of the API is auto-cash for you?
I mean, if you're using like opening eye, you would just automatically have a cache hit.
I'm actually not sure that's the case.
For example, when you're building age, you're passing your message history back through every time.
As far as I know, it's stateless.
there's different APIs for this across the different providers,
but especially if you use just the Responses API, the new one,
it should be that if you're never modifying the state,
which is good for you if you believe that you shouldn't compress conversation history,
bad for you if you do.
If you never modify the state, then you can just use the Assisansapy,
everything that you pass in prior is going to be cached, which is kind of nice.
Anthropic used to require weird header thing,
and they've made it more automatic.
Yeah, okay, so that's a good,
out. So I had used Anthropics kind of caching header explicitly in the past, but it may be the
case that caching is automatically done for you, which is, which is fantastic if that's the case.
I think it's a good call-up for Manus. Yeah, Gemini also introduced implicit caching.
It's really hard to keep up. Like, you basically have to follow everyone on Twitter and just like
read everything. So that I must have a bullet bot for it. Yeah, yeah, yeah, yeah. Well, you know,
it's interesting, though. So APIs are not supporting caching more or more. That's fantastic.
I'd use Anthropics' explicit caching header in the past.
I do think an important and subtle point here is that caching doesn't solve the long
context problem.
So it of course solves the problem of like latency and cost, but if you still have 100,000
tokens in context, whether it's cached or not, the LM is utilizing that context.
This came up, I actually asked Anton this in their context for a meetup or in their context
for rot webinar, and they kind of had mentioned that the characterization of context
rot that they made, they think they would expect to apply whether or not using caching. Caching
shouldn't actually help you with all the context rot and long context problems. It absolutely
helps you with latency and cost. I do wonder what else can be cached. I feel like this is definitely
a form of lock-in because you ideally want to be able to run prompts across multiple providers
and all that. Yeah, caching is a hard problem. Like I think ultimately you control your destiny
if you can run your own open models
because then you can also control the caching.
Everything else is just a half approximation of that.
That's right.
That's exactly right.
That is overall broad context engineering.
I don't know if you have any other takes from like the meetup yesterday or questions.
No, I think my main take from yesterday was like a quality of compacting.
I think there was like one of the charts was using the automated compacting of like open code
and some of these tools is basically the same as not.
doing it on like the quality of what you get from the previous instructions and I think Jeff at this
chart is like curated compacting is like 2x better but I'm like how to you know it's like how do you
do curated compacting I think that's something that maybe we can do a future blockposts on I think
that's interesting to me like how do you compact especially coding agents things for like it can get
very very long I think for things like deep research is like once I get the report it's fine you know
But for coding, it's like, well, I would like to keep building.
I found that like even when you're like writing tests or like you're doing changes,
having the previous history, it's like helpful to the model.
It seems to perform better when it knows why it made certain decisions.
And I think how to extract that in a way that is like more token efficient and still unclear.
I don't have an answer, but maybe like a request for for work by people listening.
Yeah.
You know, that's a great point.
It actually echoes some of Walden Dan's points from cognition.
also that the summarization compaction step is just non-trivial.
You have to be very careful with it.
Devin uses a fine-tuned model for doing summarization within the context of coding.
So they obviously spent a lot of time and effort on that particular step.
And Manus kind of calls out that they are very careful about information loss.
Whenever they do pruning, compaction, summarization,
they always use a file system to offload things so they can retrieve it.
So it's a good call out that compaction is risky when you're building agents.
and very tricky.
You know, I think there were a lot of,
there's a lot of previously a lot of interest in memory.
And I'm always, I was thinking about the interplay between memory and context engineering.
I mean, are they kind of the same thing?
It's just a rebrand.
Are there parts of memory?
And, you know, you guys recently relaunch Langman.
That's also a form of context engineering.
But I don't know if there's, there's like a qualitatively philosophical difference.
Yeah.
So that's a good thing to hit, actually.
I made me think about this on two dimensions.
writing memories, reading memories,
and then the degree of automation on both of those.
So take the simplest case,
which actually I quite like,
Claude Code, how do they do it?
Well, for reading memories,
they just suck in your CloudMDs every time.
So every time you spend up Cloud Code,
it pulls in all our CloudMDs.
For writing memories,
the user specifies,
hey, I want to save this to memory,
and then Cloud Code writes it to CloudMD.
So on this axis of, like,
degree of automation,
It's kind of like the zero-zero, it's very simple, and it's kind of very like Boris-pilled, like
super simple, and I actually quite like it. Now, the other extreme is maybe Chattebtee-T.
So behind the scenes, Chat-TB decides when to write memories and it decides when to suck them in.
And actually, I thought Simon at A Engineer had a great talk on this, and it wasn't about memory,
but he hit memory in the talk. And he mentioned, I don't even if you remember this, but it was a failure
mode in image generation because he wanted an image of a particular scene and it sucked in his
location and put it in the image. Like it sucked in half like half moon bay or something and
suck in an image. And it was a case of memory retrieval gone wrong. He didn't actually want that.
So even in a product like Chat TBT that spent a lot of time on memory, it's non-trivial.
And I think my take is the writing of memories is tricky, like when actually should the
system write memories is non-trivial. Reading of memories.
actually kind of converges with the concept of general thing of retrieval.
Like memory retrieval at large scale is just retrieval, right?
I kind of view them as.
It's retrieval in a certain context, which is your past conversations, which
That's right.
You know, it is different than retrieval from a knowledge base, different than retrieval
from the public web.
By the way, this is a second's write up on his website on here where he was just trying to
generate it, and then suddenly it shows up.
There you go.
Actually, it's a subtle point.
I don't know exactly know what Open Eye does behind the hood.
with respect to memory retrieval, my guess is they're indexing your past conversations and using
semantic vector search and probably other things. So it may still be using some kind of knowledge base
or vector store for retrieval. So in that sense, I kind of view it just simply as, you know,
in the case of sophisticated memory retrieval, it is just like a complex rag system in the same
way we talked about with like Varroon and building windsurf. It's kind of a multi-step rag pipeline.
So I kind of view memories, at least the reading part, as just, you know, it's just retrieval.
And actually, I quite like clause approach.
It's very simple.
Just the trivial.
Just suck it in every time.
I would also highlight the semantic differences that you've established,
you know, episodic, semantic, procedural, and background memory processing.
We've done an episode with the letter folks on sleep time compute.
I think these are just like, if you have ambient agents, very long-running agents,
you're going to run into this kind of context engineering, which is previously the domain of memory.
And I would say that the classic context engineering discussion doesn't have.
have this stuff. Not yet. So actually, there's an interesting point there. I did a course on
building ambient agents, and I built this little email assistant that I used to run my email. I actually
think this is made of a sidebar memory. Memory pairs really well with human the loop. So for example,
in my little email assistant, it's just an agent that runs my email, I have the opportunity
pause it before it sends off an email and correct it if I want, like change the tone of this
email, or I can literally just modify the tool call to have a little UI for that. And,
And every time when you have these ambient agents, you edit, for example, or you give it feedback,
you edit the tool calls itself, that feedback can be sucked into memory.
And that's exactly what I do.
So actually, I think memory pairs very nice with human loop.
And like when you're using human loop to make corrections to a system, that should be captured in memory.
And so that's a very nice way to use memory in kind of a narrow way that's just capturing user preferences over time.
And actually uses an LLM to actually reflect on the changes I made, reflect on the
prior instructions in memory and just update the instructions based upon my edits.
And that's a very simple and effective way to use memory when you're building ambient agents
that I quite like.
There is a course which you can find on the GitHub.
And yeah, I mean, you know, you guys have done plenty of talks on instant agents.
That's right.
But I think it's a very good point that memory is often kind of confusing when to use it.
I think a very clear place to use it is when you're building agents that have human
loop because human loop is a great place to update your agent memory with your.
preferences. So it kind of gets smart over time and learn stream. It's exactly what I do with my little
email assistant. So Harrison, I'm sure, I think he said this publicly, uses an email system for all his
emails. He gets a lot as a CEO. I get much fewer, so I'm just a lowly guy, but I still use it.
And that's a very nice way to use memory, is kind of pair it with human in the loop.
Yeah, totally. I've tried to use the email system before, but like, you know, I'm still still very
married to my superhuman. Yeah, fair enough. That's right. That's right. That's what the
coverage that we planned on Contex Eng.
You have a little bit on
a bitter lesson that we could wrap up with.
Yeah, that's a fun theme to hit on a little bit.
I'd love to hear your perspective.
So there's a great talk from
Hyeong Wang Chung, previously Open AI now at
MSL on the bitter lesson
and his approach to AI research.
The take is compute 10xes every five years
for the same cost, of course. We all know that.
The history of machine learning has
Yeah, exactly this slide.
Exactly.
History of machine learning has shown
that actually capturing this scaling
is the most important thing.
In particular, algorithms that are more general
with fewer inductive biases
and more data on compute
tend to beat algorithms more,
for example, hand-tuned features,
inductive biases built in,
which is to say,
just letting a machine learn how to think itself
with more compute and data,
rather than trying to teach machine
how we think tends to be better.
So that's kind of the bitter lesson piece simply stated.
So his argument is this subtle point that at any point time, when you're, for example, doing research,
you typically need to add some amount of structure to get the performance you want at a given level of compute.
But over time, that structure can bottleneck your further progress.
And that's kind of what he's showing here, is that in the low compute regime, kind of on the left of that x-axis,
adding more structure, for example, more modeling assumptions, more inductive biases, is better
than less. But as compute grows, less structure, and this is exactly the better lesson point,
less structure, more general tends to win out. So his argument was we should add structure
at a given point in time in order to get something to work with the level of compute that we have
today, but remember to move it later. And a lot of his argument was like people often forget to
remove that structure later. And I think my link here is that I think this applies to AI engineering too.
And if you kind of scroll down, I have the same chart showing my little, exactly. This is,
this is my little example of building deep research over the course of a year. So I started
with a highly structured research workflow, didn't use tool calling. I embedded a bunch of
assumptions about how research should be conducted. In particular, don't use tool calling because
everyone knows tool calling is not reliable. This was back in 2018.24. Decompose the problem into a set
of sections and parallelize each one, those sections written in parallel into the final report.
What I found is you're building an LM applications on top of models that are improving exponentially.
So while the workflow was more reliable than building an agent back in 24, that flipped pretty
quickly as LMs got better and better. It's exactly like was mentioned.
in the Stanford talk, you have to be constantly reassessing your assumptions when you're building
A applications given the capabilities of the models. I talk a lot about here the specific structure
I added, the fact that I used the workflow because we know tool calling doesn't work. This was back in 2034.
The fact that I decomposed the problem because that's how I thought I should perform research.
And this basically bottlenecked me. I couldn't use MCP as MCP got, for example, much more popular.
I couldn't take advantage of the fact that tool calling was getting significantly
better over time. So then I moved to an agent, started to remove structure, allow for tool
calling, let the agent decide the research path, a subtle mistake that I made, which links back to
that point about failing to remove structure, I actually wrote the report sections within each subagent.
So this kind of links back to what we talked about with sub-agents in isolation.
Sub-agents just don't communicate effectively with one another. So if you write report sections in each
sub-agent, the final report is actually printed this joint. This is exactly Alessio's
challenge and problem about using multi-agent. So I actually hit that exact problem. So I ripped
out the independent writing and did a one-shot writing at the end. And this is the current
kind of version of Open Deep Research, which is quite good. And this is kind of the thing that's,
at least on deep research, meant the best performing open deep research assistant, at least that's
open source. So it was kind of my own arc, although we do have some data results with GPD5 that are
quite strong. So, you know, the models are always getting better. And so indeed, our open source
assistant actually takes advantage and rides that wave. But I actually kind of experienced,
I felt like I actually got bitter lesson to myself because I started with the system that was very
reliable for the current state of models back in mid-20204, early 2024. But I was completely
bottlenecked as models got better. I had to rip out the entire system and rebuild it twice,
rechecking my assumptions, in order to kind of capture the gains of the model. So I think, I just want to
flag, I think this is an interesting point. It's hard to build on top of rapidly expanding
models, rapidly improving model capability. And actually, I really enjoyed from A.A.
Engineer, Boris's talk on Claude Code, and they're very bitter lesson-pilled. He talks a lot
about the fact that they make Cloud Code very simple and very general because of this fact.
They want to give users unfettered access to the model without much scaffolding around it.
But I think it's an interesting consideration in A-A-engineering.
that we're building on top of models
that are improving exponentially.
And one of the points he makes is a correlator
of the bitter lesson is that more general things
around the model tend to win.
And so when building applications,
we should be thinking about this.
We should be adding structure necessary
to get things to work today
by keeping a close eye on models improving rapidly
and removing structure in order to unbottlemeck ourselves.
I think that was my takeaway.
So I really liked the talk from Hyeong-won Chung.
I think that's worth everyone listening to.
And I think a lot of lessons apply to AI engineering.
I think this is similar to incumbents adopting AI, putting AI in existing tools.
Because you already have the workflow, right?
So you already have all the structure.
You just put AI.
It becomes better.
But then the AI native approaches catch up as the models get better.
And then there's no way for existing products to remove the structure because the structure is the product.
Yes.
And that's why then you have, you know, cursor and windstor.
serve or better than VS code for like AI Native thing just because they didn't have to deal with
removing things and why cognition is like, you know, again, it's like it doesn't even think about
the idea as like the first thing. The ID is like a piece of the agent. And so I think you see this
in a lot of markets, which is like, hey, again, if you have a workflow and you put AI, the workflow is
better. Like the workflow is not the end goal. I think we're now at a place where like you should just
start without a lot of structure just because now the models are like so good. But I think the first
two and a half years of the market,
there was kind of like the stance of like,
should I just put AI into the workflow that works?
Should I rewrite the workflow?
But the workflow is not that good
because the models are not that good.
But I think we're past that point now.
That's an amazing example.
Actually, if you show your chart again,
there's another interesting point in your chart.
In the earlier model regime,
the structure approach is actually better.
And so an interesting take on this,
Jared Kaplan, the founder of Anthropic,
is a great talk at startup school
from a couple weeks ago.
And he mentions this point about oftentimes building products that explicitly don't quite work yet
can be a good approach because a model under them is pretty accidentally and it'll kind of unlock the product.
We saw that with cursor.
Part of the cursor lore is that it did not work particularly well, Cloud 3-5 hits, and then boom, it kind of unlocks the product.
And so you kind of hit that near the curve when the model capability catches up to the product needs.
But in that earlier regime, the structure approach appears better.
So it's kind of this interesting, subtle point that for a while, the more structure approach appears better,
then the model finally hits the capability needed to unlock your product, and suddenly your product just takes off.
There's kind of another corollary to this, that you can get tricked into thinking your structure approach is indeed better,
because it'll be better for a while until the model catches up with less structured approaches.
Your chart looks very similar to the windsurf chart. I got to bring it up because I was involved in the writing of this one.
Isn't this similar? There's a, there's a stealing.
you know and then like boom you go slow it's this is almost like bitter lesson but in like uh enterprise
that's right for me okay the lines are important but to me the bullet points are the main thing
if you understand the bullet points then you can not you can actually learn from the the mistakes
of others right there is one and spicy take on on this which is like you know how much is
land graph aligned with the bitter lesson yes obviously you guys are obviously aware of it so it's not
not going to be a surprise. But I do think that making abstractions easy to unwind is very important
if you believe in a bitter lesson, which you do. No, no, this is super important, actually. And I actually
talk about this at the end of the post. Yeah. There's an interesting subtlety when you talk about
Asian frameworks. A lot of people are anti-framework. I completely understand and sympathetic to those
points. But I think when people talk about frameworks, there's two different things. So there can be
a low-level orchestration framework. There's a great talk, for example, at
from Shopify, they built this orchestration framework called Roast internally.
And it's basically Langraph. It's some kind of way to build kind of internal orchestration,
workflows with LMs. And Roast, Langraph provides you low-level building blocks,
nodes, edges, state, which you can compose into agents, you can compose into workflows.
I don't hate that. I like working low-level building blocks. They're pretty easy to tear down,
rebuild. In fact, I used, for example, Langraph to build open-deep research. I had a workflow. I
rip it out. I've rebuilt the agent. The building blocks are low-level, just nodes, edges, state.
But the thing I'm sympathetic to is there's also, in addition to just kind of low-level orchestration
frameworks, there's also agent abstractions from framework import agent. That is actually where
you can get into more trouble because you might not know what's behind that abstraction.
I think when a lot of people kind of are anti-framework, I think what they're really saying is
are largely anti-obstraction, which I'm actually very sympathetic to. And I don't particularly
like agent abstractions for the exact reason. And I think Walden Yans made a good point. Like, we're
very early in the archive agents. We're like in the HTML era. And agent abstractions are problematic
because you don't know what's necessarily under the hood of the abstraction. You don't understand
it. And if I was building, for example, you know, open deep research with an abstraction,
I wouldn't necessarily know how to rip it apart and rebuild it when models got better.
So I'm actually wary of abstractions.
I'm very sympathetic to that part of the critique of frameworks.
But I don't hate low-level orchestration frameworks that just provide nodes, edges.
You can just recombine them in any way you want.
And then the question is, why use orchestration at all?
And actually, I use Landgraf because you get some nice use.
You get checkpointing, you get state management.
It's low-level stuff.
And that's the way I happen to use Langraph.
And that's why I like Landgraf.
And that's actually why I found, like, a lot of customers like Landgraf.
it's not necessarily for the agent abstraction,
which I agree can be much tricker.
Some people like agent abstractions.
That's completely fine as long as you understand
what's under the hood.
But I think that's a very interesting debate
about frameworks.
I think the critique is it should be made
a little bit more on abstractions
because often people don't know
what's under the hood.
For those who are looking for resources,
it was a bit hard to find the shopify talk.
Yeah, it's unlisted now.
I don't know why it's unlisted,
but it's a nice talk.
I found it through this Chinese ripoff.
of the talk.
Yeah, it's actually hard to find now.
I think there should be a browse comp
where you find obscure YouTube videos
because that's something I'm very good at.
It's kind of my bread in mind.
It's good.
And what's funny is,
this talk follows exactly the arc
we often see when we're talking to companies
about Langraph.
It is people want to build agents and workflows internally.
Everyone rolls their own.
It becomes hard to kind of manage
and coordinate and review code
in this context of large organizations.
It can be very helpful to have
standard library or framework that people are using
with low-level components that are easily
composable. That's what they build with Roast.
That's effective what Land-Graph is.
And that's why a lot of people like Langraph.
I actually thought the talk on MCP
that, I believe it was
John Welsh. Yes, I think
that was like a super underrated talk.
I tried yelling about it. No one listened to me.
But like, you know, if you listen this far into the podcast,
do us a favor.
Did actually listen to John Welsh's talk?
It's actually very good.
It's very good.
He makes a case for a lot of the reason why people actually, for example, enterprises,
larger companies like Langraph, which is the fact that when tool calling got good at it
with an Anthropic and, you know, sometime mid-last year, he actually makes this point explicitly.
So he mentions, okay, so you're anthropic, tool calling gets good in mid-2024, everyone's building
their own integrations, it becomes complete chaos.
And that's actually where kind of MCP came from.
Let's build a kind of a standard protocol for accessing tools.
Everyone adopts it.
much easier to kind of have off and have review and you minimize cognitive load.
And this is actually the argument for standardized tooling, whether it be frameworks or otherwise,
within larger orgs, is practicality.
And his whole talk is making that very pragmatic point, which is actually why people do
tend to like frameworks, for example, in large organizations.
And then ship it as a gateway.
This is the other big thing that they do.
That's right.
Lance, you've been so generous of your time.
Thank you.
Any shameless plugs?
Calls to action, stuff like that.
Yeah, if he'll made it this far, thanks for listening.
We've a bunch of different courses I've taught,
one on ambient agents, one on building open debris research.
So I actually was very inspired by Carpath.
He had a tweet a long time ago talking about building on ramps.
So he talked about he had his micrograd repo.
A few people looked at it, but not that many.
He made a YouTube video, and that created an on ramp
and the repo skyrocketed in popularity.
So I like this one-two punch of building a thing.
like OpenD Research, then creating a class
so people can actually understand how to build it themselves.
And I kind of like that, build a thing,
create an on ramp for it.
So I have a class on building open deep research.
Feel free to it's for free.
But it walks through a bunch of notebooks as to how I build it.
And you can see the agent is quite good.
We even have better results coming out soon with GPD5.
So if you want, kind of an open source deep research agent,
have a look at it.
It's been pretty fun to build.
And that's exactly what I talked about in that bitter list and blog post as well.
Awesome, Lance.
Thank you for joining.
Yeah, a lot of fun.
Great to be on.
Thank you.
