The AI Daily Brief: Artificial Intelligence News and Analysis - How to Use /Goal to Do More With AI
Episode Date: May 31, 2026A practical primer on /goal, the new AI primitive showing up in Codex and Claude Code. NLW explains how /goal differs from a normal prompt, why it matters for longer-running agent tasks, what makes a ...good goal, and how to think about using it beyond coding for audits, research, vendor reviews, market landscapes, and other knowledge work where the AI needs a clear finish line and evidence of completion.Sign up for AI Executive Catchup: https://aiexecutivecatchup.com/Brought to you by:KPMG – Research from KPMG and the University of Texas at Austin shows the highest-impact AI users treat AI like a reasoning partner — and those skills can be taught at scale. Learn more at kpmg.com/us/SophisticatedOutsystems - Stop wondering how AI will change your business and start building the agents that will lead it - http://outsystems.com/Scrunch - The AI customer experience platform - https://scrunch.com/Zenflow Work - Agents for knowledge work - https://zenflow.free/Blitzy - Want to accelerate enterprise software development velocity by 5x? https://blitzy.com/AssemblyAI - The best way to build Voice AI apps - https://www.assemblyai.com/briefRobots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Our Newsletter is BACK: https://aidailybrief.beehiiv.com/Interested in sponsoring the show? sponsors@aidailybrief.ai
Transcript
Discussion (0)
Today on the AI Daily Brief, a primer in using the slash goals primitive in codex and cloud code
and how to use it to level up your use of AI. The AI Daily Brief is a daily podcast and video
about the most important news and discussions in AI. All right, friends, quick announcements before we
dive in. First of all, thank you to today's sponsors, robots and pencils, section, super intelligent,
and blitzie. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief,
or you can subscribe on Apple Podcasts. And if you want to learn more about
sponsoring the show, send us a note at sponsors at AIdailybrief.aI. Today we're talking about something
that a lot of power users of AI are incredibly excited about, which is slash goals. So let's dive
in. Today we were doing another very operator-centric episode. Recently, I did a show about
codex maxing, effectively a set of tips and best practices on how to get the most out of OpenAI's
Codex. Now, in many ways, while that episode was specific to Codex itself, a lot of the
interaction patterns, you could also follow in other harnesses like ClaudeCode. The Codex
maxing piece was built off of a blog post by OpenAI's Jason Liu. Jason wrote up about nine
techniques or interaction patterns that he had discovered allowed him to get the most out of
codecs, not just for coding, but for other types of knowledge work as well. And some of those
tips represented fairly different types of patterns. One of them, for example, is the idea
of durable threads or mono threads, where instead of using some sort of infrastructure like a project,
where you have multiple threads all related to the same topic that share a memory base,
you instead use a single thread relying on the harnesses compaction tool to make sure it always
preserves the relevant context. You also saw in that codex maxing post a number of ideas about
how to effectively reduce the latency between the human providing guidance to the model and
the model getting things done. I think in some ways, in fact, that you could kind of summarize the overarching
direction of what Jason was exploring as a way to move past the turn-based paradigm of AI.
In other words, the standard way of interacting with chatbots that we've all gotten used to
over the last few years where you give it a prompt, wait for it to do a thing, review the thing
it did, develop and provide it your feedback, and wait again for the next thing that it does.
By using features of codex like the side panel where you can inspect artifacts as they're being
built, voice input to more freeform give feedback, with a lot of additional context
because you're talking through it,
steering to insert that feedback even as Codex is still working,
and some other features like remote control and heartbeats
to make sure that this can happen even when you're not sitting at your desk,
all of what it amounts to is a new more parallel way
of working with agents through these harnesses like Codex.
Now, when it comes to Codex, however,
there has been one feature that has been lurking in a lot of the conversation
throughout the month.
It is, in fact, one of those features that once introduced,
becomes normalized across all of the competitor set,
with other companies adopting it even if they weren't the first to do it.
I'm talking, of course, about slash goal.
Back at the beginning of May, the Codex teams Tebow wrote,
slash goal might be the most consequential thing we have shipped in Codex.
The value of good instructions has never been higher.
Pavel Hearn explained,
you state the outcome, the model loops, self-evaluates, and stops when it's done.
Now, this idea of looping is a key part of this.
You might remember how we talked about the Ralph Wiggum loop,
which is basically an early hack-it-yourself version of this,
that figured out a way to get an agent that you initiate on a problem,
to keep working against that problem over and over,
without human steering having to be involved,
effectively extending the window of how long it can work without your immediate interaction.
Former OpenAI co-founder, who is now at Anthropic Andre Carpathy,
also has been spending a lot of time with looping, such as his auto research loop.
At one point, he said,
LLMs are exceptionally good at looping until they meet specific goals.
Don't tell it what to do, give it success criteria, and watch it go.
Pavel concluded his tweet,
The Skill That Wins is Engineering the Intent,
why it matters, strategic context,
and how the success will be measured,
so the agent can make better autonomous decisions.
Now, over the next couple weeks,
people really started to click in on goal.
Gregor Zunich writes,
Slash Goal is one of the best things OpenAI ever shipped.
Alex Finn wrote,
Slash Goal is the most underrated feature in AI right now.
Ollie Lemon called it basically autopilot for complex AI tasks.
Trying to describe how Slash Goal worked for
non-technical folks, he wrote. One, you type slash goal and describe the end result you want. Two,
the AI starts working. Three, after every step, it checks itself. Am I done yet? Four, if no,
it keeps going. Five, if yes, it stops and tells you. And honestly, people found so much utility
so fast with this that just a couple weeks later, Claude Code shipped the same feature,
and in recognition that it was better to participate in a new primitive, rather than trying to own it,
They did the super smart and mature thing of just calling it slash goal in cloud code as well.
Microsoft's Nicholas Bustamante wrote,
I'm glad to see Slash Goal becoming the new primitive for long-running tasks.
The model does not naturally persist across turns, context windows, sandboxes,
process crashes, or days of work, so it needs the help of the harness.
He continued, I also love how simple it is.
An initializer agent turns fuzzy user intent into durable workspace structure with a plan.md file.
then worker agents make bounded progress against that structure, and a judge agent decides whether
the stated completion condition is actually met or it will keep running. Once again, the abstraction
is moving up the stack. In 2024, you wrote your own wild loop. In 2025, you wrote prompt files
and hooks, i.e. Ralph Wiggum. In 26, the loop is becoming a product primitive.
Sean Wang, aka Swix, wrote that this represented an increased level of autonomy,
from slash skill, which was preset prompts to slash plan, which was human refined inputs,
to slash goal, which was AI evaluated outputs.
Now, as this new primitive has taken hold, lots of people have started to try to write guides and
tip documents.
One of those came from the open AI developers themselves, and that form the basis for the guide
which will be going through for the rest of the show, how to use slash goal.
This is not comprehensive, and it honestly still slants more technical than I was trying to get to,
but hopefully especially for those of you who are thinking about how to apply this
knowledge work as opposed to just actual software engineering tasks, you'll feel a little bit more
like you have a handle on this once we're through. Now, as you might imagine, I did use Codex to build
this presentation, so if you see any lingering meta text, i.e. where it converts instructions into marketing
copy, or it's just in general a little overly verbose, I'm blazing all of the credit for that
squarely at the feet of 5-5 in Codex itself. Now let's start by defining the difference between a
prompt and a goal. And an important point here is that slash goal is not
a bigger prompt. It's a fundamentally different type of a thing. In summarizing the OpenAI
Guide and some other primers I gave it, the way the Codex described goal was as a finish line contract,
what should be true, how success should be checked, and what has to stay intact along the way.
If a prompt involves asking for a result, the harness model combo doing the immediate work,
the harness slash model reporting that work and waiting for your feedback, repeat, goal is instead
a continuous loop, that one works towards the durable objective that you've given it,
Two, checks current evidence against the finish line as it's defined, and three, determines
whether to continue, whether the task is complete, or whether to stop because it's honestly blocked.
And part of the recognition behind slash goal is that there's lots of types of work that's sequential
in a way where the work can't know its next step until the last step has taught it something.
Now, because it's from their developers, the OpenAI document centers on tasks
like profiling, patching, benchmarking, reproducing, flaky tests, migrations, bug hunts, and research
audits, with the common thread being that although each has a specific target, the path to get
there changes as codex gathers evidence. If you didn't have a system like Slash Goal, you'd be
sitting there waiting to see what it's said after each intermediate step, only to say something like
keep going, now check this, now rerun that. Slash Goal effectively pushes that Keep Going button
for you. Now, despite all these examples being about coding, goals can apply to any objective that
has and requires some sort of auditable persistence. So what does a type of work need to have to be a
good candidate for goals? First, it has to have a durable objective. In other words, the target
should remain true across each turn. The target itself is not going to change over time. A second
aspect of work that's good for goals is an uncertain path to success, one where codex or
Claude Code may need to inspect, compare, rerun, revise, or investigate before knowing what the next
best move to make is. Finally, that objective needs to have really strong, clear, finish line
evidence, where completion is not dependent on vibes, but instead on tests, sources,
artifacts, citations, basically some sort of proof that is inspectable by the AI, where it can
self-judge successfully if it's actually done. Simply put, a goal defines completion for a particular
body of work. And by using slash goal in Codex or ClaudeCodecode, you're engaging in a particular
type of work where you've shifted from telling the AI what to do to instead telling the AI what
you want to have done when you're through. And while goals are a way to increase autonomy,
they're not about cutting out the user entirely. They are still highly user controlled.
You define the outcome. The goal can be paused, resumed, cleared, or completed. Basically,
life cycle authority stays bounded to the user and the system in evidence that the user has
provided. There's a set of commands including slash goal pause, slash goal resume, and slash goal clear
that if a user finds the path that codex is going down seems to be wrong, or the rubric for success
needs to change, they can intervene without having to throw away everything that's been done so far.
Now, one pattern we talked about when we were talking about codex maxing was the idea of the
importance of durable threads, or some people have called it the monothread pattern, where
instead of a project with a shared set of memory, the unit of context is the thread at
That's how goals work as well.
The thread itself is where everything accumulates.
This is not taking advantage of global memory or project instructions more broadly.
The objective itself travels within that specific thread.
One thing I keep seeing in Enterprise AI, companies hedging across every cloud, every model,
every framework, or paying a GSI for a pilot that never ends.
The team's actually shipping, they've picked a lane and they move fast.
That's one of the reasons I like today's sponsor Robots and Pencils.
gone all in on AWS. They're an advanced tier and AWS pattern partner and they ship production
AI co-workers in 45 days. That's led to them doing some of the more interesting work I've seen on
AI co-workers. And by that I'm not talking about chatbots. I'm talking about actual agentic
systems that sit inside a business architecture and do real work. That kind of focus matters if you're
an enterprise leader trying to get something real into production or an AWS rep trying to move a
customer from interested to deployed. Request an AI briefing at robots and pencils.com. One conversation
with robots and pencils and you'll know.
Here's a harsh truth. Your company is probably spending thousands or millions of dollars on AI
tools that are being massively underutilized. Half of companies have AI tools, but only 12% use
them for business value. Most employees are still using AI to summarize meeting notes. If you're
the one responsible for AI adoption at your company, you need Section. Section is a platform that
helps you manage AI transformation across your entire organization. It coaches employees on real
use cases, tracks who's using AI for business impact, and shows you exactly where AI is and
isn't creating value. The result, you go from rolling out tools to driving measurable AI value.
Your employees move from meeting summaries to solving actual business problems, and you can
prove the ROI. Stop guessing if your AI investment is working. Check out section at sectionaI.com.
That's SEC, T-I-O-N-AI.com.
Open AI and Anthropic are both launching enterprise AI consulting efforts because everyone is realizing
that the challenges and the capabilities of AI, the challenge is getting individuals and the organization
actually ready to use it. The truth, though, is that all the forward-deployed engineers in the world
aren't going to help you if you don't actually have a coherent strategy based on an understanding
of your actual AI readiness. Super Intelligent Maturity Maps give you a chance to see where
you stand relative to the industry on deployment depth, systems integration, data access,
outcomes, people, and governance. And from there, our customized AI planning assessments can
help you figure out what you need to do to improve your readiness and how to sequence it.
Go take your own maturity maps quiz at B-super.a.i and set us a note if you will,
want to go deeper.
Weekends are for vibe coding.
It has never been easier to bring a passion project to life, so go ahead and fire up your
favorite vibe coding tool.
But Monday is coming, and before you know it, you'll be staring down a maze of microservices,
a legacy cobal system from the 1970s, and an engineering roadmap that will exist well
past your retirement party.
That's why you need Blitzy, the first autonomous software development platform designed for
enterprise-scale codebases.
Deploy the beginning of every sprint and tackle your roadmap 500% faster.
Blitzies agents ingest your entire codebase, but you're not.
plan the work, and deliver over 80% autonomously.
Validated, end-to-end tested premium quality code at the speed of compute.
Months of engineering compressed into days.
Vibe code your passion projects on the weekend.
Bring Blitsey to work on Monday.
See why Fortune 500s trust Blitsey for the code that matters at blitzie.com.
That's BLYtZY.com.
Now, writing a good goal is more than just having an outcome, although that's part of it.
When it comes to the outcome itself, it's really important that evidence can decide,
success or completion. Evidence can be tests, citations, matrices, logs, rubrics, artifacts,
but there's more to writing a good goal as well. A good goal prompt is going to provide boundaries
like which files, tools, or data can be used, and it's likely going to explain things like
when the harness should actually stop and explain that no defensible path remains. OpenAI's tip document
says that the strongest goals usually define six things. The outcome or what should be true when the work
is done, the verification surface, which is the test benchmark report artifact command
output or source material that proves it, the constraints, in other words, what must not
regress while codex works, the boundaries, which files, tools and resources codex can use,
the iteration policy, how codex should decide what to try next after each attempt,
and the block stop condition or when codex should actually stop.
But what about scope? How broad or narrow should a goal be?
Early experiments do suggest that there is sort of a Goldilocks zone, where you can be too narrow,
i.e. fix this one line, or you can be too broad, i.e. improve the whole system, with the challenge
of being too narrow, being that even if that's the thing that you actually want to change,
it doesn't give the system enough flexibility to discover where the real issue is, especially if
it's in some related dependency or upstream in some way. Whereas on the other end of the spectrum,
if it's too broad, it's much harder to provide the kind of concrete evidence that's going
to allow Codex or Claudecoe to know if it's actually successfully accomplished the
task. Just right is obviously in between those two extremes. Relatedly, defining the output artifact
can be the difference between a successful goal run or not. A weak artifact in the same way that
prompting too loosely can produce underwhelming results. If your slash goal artifact is
write docs for this feature, the inspectable output of the work might not actually provide the
best evidence surface as opposed to a stronger artifact goal like produce a docs page that
explains the life cycle, command surface, and two examples. Verify that the page builds locally
and all referenced commands match current CLI behavior. Now, you're probably noticing that a lot of
this terminology is still really anchored in the realm of developers. Well, how do we start to figure out
what types of other non-software engineering knowledge work might be a good fit for the slash goal
primitive? One of the ways to think about it is when the output is not just an answer, but an audit
that might be a good place for a goal. A good non-coding goal,
is going to produce a ledger of what was checked, what was supported, what was contradicted,
what was weak, and what remains unknown. If that's the type of output that is valuable for
your task, it might be a good fit for slash goal, even if it's not a coding task. Now, one of the
interesting things as you branch from software engineering to knowledge work is how to think
about where the definition of success comes from. Broadly speaking, there are two paths. In some cases,
there will be an externally definable rubric. That could be existing published criteria,
official docs, a third-party data set, an existing set of logs or transcripts,
or some project-specific document like RFP questions. In many cases, however, and this is where it
starts to get really blurry, as I was thinking about different projects that were going to be a good
fit for slash goal, I noticed that sometimes I, as the user, needed to provide the rubric,
And I think that this is going to be one of the most common patterns in those types of knowledge work
use cases, where the user supplies the criteria for success. Think about, for example, hiring criteria.
It's not going to be some external source of what you should be looking for. It's going to be
you articulating in ways that are knowable by the AI and can be tested against by the AI.
What are the hiring criteria that matter to you? A similar example is vendor scorecards.
You're not looking to some external standard for what the vendor should be, at least not entirely.
you're probably looking for the AI to mirror what you specifically or your company specifically
are looking for in the vendor. Same can be true for editorial standards, lead qualification rules,
investment diligence priorities, etc. In fact, you can almost work backwards from here and notice
that when you have a knowledge work task that implicitly comes with some rubric or criteria of success,
that might be a good place to look to see if it is a good fit for the goal primitive.
Now, for the sake of this particular episode, I'm not going all the way through an entire use case,
but I did want to provide a set of examples that I think might be good fits or good areas to look
as you are thinking about how you can experiment.
So 10 areas of knowledge work that I think might be a good fit for slash goal include
literature reviews, market landscapes, vendor evaluations, due diligence, claim audits,
policy research, interview synthesis, timeline reconstruction, spreadsheet audits,
and even strategy memos, if the goal of the work is to take a whole bunch of messy inputs
and put them into a more structured format.
Double-clicking on three examples, claim audits strike me as a really clear fit,
that even if that's not a use case for you, hopefully gives you some more insight into the type of
structure you're looking for.
So imagine a prompt slash goal, audit this memo claim by claim.
Verify each claim against the provided sources and reputable external sources, which, by the way,
you'd probably want to provide. And with a table labeling each claim as supported,
contradicted, partially supported, or unverified, with citations and uncertainty notes.
So you're seeing here that output of an audit trail, you're in that Goldilocks zone where
you're articulating well enough what you want is the output, and it works because every
conclusion the AI makes can be traced back to evidence. Now, what about a market landscape?
Isn't that just sort of a normal AI research question? Well, imagine that the goal is create a market
landscape for X market, verified by cited company pages, filings, analyst reports, pricing pages,
and product docs, and with a comparison table, confidence levels, and gaps where evidence was unavailable.
So what takes this out of the realm of a general research project and into the realm of a
slash goal project is that idea of moving to an audit as the process and output.
The artifact that you're trying to go for is a comparison table that shows you what can be
verified, what's inferred, and where the evidence runs out.
Similarly, a slash goal-shaped literature review is one where you're living with complexity and
diversity, highlighting rather than flattening conflicting evidence and disagreement.
Imagine a goal. Provide an evidence-backed literature review on X topic. Build a source matrix
covering methods, sample sizes, findings, limitations, and conflicts. End with confirmed themes,
disputed findings, and open questions. Basically, this pattern is going to work wherever
evidence can be inventoried and presented in complete form.
My suspicion, though, is that a lot more of the way that knowledge workers are going to use this,
at least in the short term, is in this area where there are user-provided rubrics,
whereas a prompt can be good for a single pass-slash-goal can execute an entire review process.
So something that might be well-suited for a prompt as opposed to a goal
would be review these five applications against this rubric, cite evidence and suggest interview
questions.
It's a small set of inputs, straightforward criteria with one comparative read.
slash goal would allow that to become the architecture for an entire process that involved extracting
evidence, applying the rubric, checking consistency, revisiting borderline cases, flagging missing
information, and producing a continuously updated document as more entries come in.
Still, it's really important to note that as you start to dig into this, not every task will
end up making sense to be a goal.
There will be lots and lots of times, perhaps even the majority of times, when the traditional
interaction pattern is completely sufficient for what you're trying to achieve. Sometimes that will be
because the outcome objective is small enough, but other times it will be because the criteria for success
won't be as clean or definable as the slash goal primitive needs to do a good job. And this is why Jason's
tips about codex maxing remain important even in the slash goal era, because a lot of times you're not
going to want to be as fully disconnected from the process as slash goal allows you to be. Effectively,
there is a spectrum of interaction autonomy between you and the harness with different methods,
making sense for different types of things you're trying to achieve.
Goal is a really great tool to begin to play with, and I think it is worth spending some time
experimenting, even if it's with something outside the mainstream of your work, just to get a
sense and a feel for what it can achieve for you and what it requires of you. As we get a little
bit deeper into this paradigm, remember we're only a couple weeks after it's been fully introduced
now, I'm sure we're going to have a lot more examples of how and where it is both working
and not working in and around non-coding use cases and knowledge work, and so at some point
and I'll come back and do an update based on all of that.
For now, though, that's going to do it for today's episode of the AI Daily Brief.
Hope this one is helpful.
I'm excited to see where your goals lead you.
Appreciate you listening or watching, as always, and until next time, peace.
