Latent Space: The AI Engineer Podcast - ⚡️GPT5-Codex-Max: Training Agents with Personality, Tools & Trust — Brian Fioca + Bill Chen, OpenAI
Episode Date: December 26, 2025From the frontlines of OpenAI’s Codex and GPT-5 training teams, Bryan and Bill are building the future of AI-powered coding—where agents don’t just autocomplete, they architect, refactor, and sh...ip entire features while you sleep. We caught up with them at AI Engineer Conference right after the launch of Codex Max, OpenAI’s newest long-running coding agent designed to work for 24+ hours straight, manage its own context, and spawn sub-agents to parallelize work across your entire codebase.We sat down with Bryan and Bill to dig into what it actually takes to train a model that developers trust—why personality, communication, and planning matter as much as raw capability, how Codex is trained with strong opinions about tools (it loves rg over grep, seriously), why the abstraction layer is moving from models to full-stack agents you can plug into VS Code or Zed, how OpenAI partners co-develop tool integrations and discover unexpected model habits (like renaming tools to match Codex’s internal training), the rise of applied evals that measure real-world impact instead of academic benchmarks, why multi-turn evals are the next frontier (and Bryan’s “job interview eval” idea), how coding agents are breaking out of code into personal automation, terminal workflows, and computer use, and their 2026 vision: coding agents trusted enough to handle the hardest refactors at any company, not just top-tier firms, and general enough to build integrations, organize your desktop, and unlock capabilities you’d never get access to otherwise.We discuss:* What Codex Max is: a long-running coding agent that can work 24+ hours, manage its own context window, and spawn sub-agents for parallel work* Why the name “Max”: maximalist, maximization, speed and endurance—it’s simply better and faster for the same problems* Training for personality: communication, planning, context gathering, and checking your work as behavioral characteristics, not just capabilities* How Codex develops habits like preferring rg over grep, and why renaming tools to match its training (e.g., terminal-style naming) dramatically improves tool-call performance* The split between Codex (opinionated, agent-focused, optimized for the Codex harness) and GPT-5 (general, more durable across different tools and modalities)* Why the abstraction layer is moving up: from prompting models to plugging in full agents (Codex, GitHub Copilot, Zed) that package the entire stack* The rise of sub-agents and agents-using-agents: Codex Max spawning its own instances, handing off context, and parallelizing work across a codebase* How OpenAI works with coding partners on the bleeding edge to co-develop tool integrations and discover what the model is actually good at* The shift to applied evals: capturing real-world use cases instead of academic benchmarks, and why ~50% of OpenAI employees now use Codex daily* Why multi-turn evals are the next frontier: LM-as-a-judge for entire trajectories, Bryan’s “job interview eval” concept, and the need for a batch multi-turn eval API* How coding agents are breaking out of code: personal automation, organizing desktops, terminal workflows, and “Devin for non-coding” use cases* Why Slack is the ultimate UI for work, and how coding agents can become your personal automation layer for email, files, and everything in between* The 2026 vision: more computer use, more trust, and coding agents capable enough that any company can access top-tier developer capabilities, not just elite firms—Bryan & Bill (OpenAI Codex Team)* http://x.com/bfioca* https://x.com/realchillben* OpenAI Codex: https://openai.com/index/openai-codex/Where to find Latent Space* X: https://x.com/latentspacepodFull Video EpisodeTimestamps00:00:00 Introduction: Latent Space Listeners at AI Engineer Code00:01:27 Codex Max Launch: Training for Long-Running Coding Agents00:03:01 Model Personality and Trust: Communication, Planning, and Self-Checking00:05:20 Codex vs GPT-5: Opinionated Agents vs General Models00:07:47 Tool Use and Model Habits: The Ripgrep Discovery00:09:16 Personality Design: Verbosity vs Efficiency in Coding Agents00:11:56 The Agent Abstraction Layer: Building on Top of Codex00:14:08 Sub-Agents and Multi-Agent Patterns: The Future of Composition00:16:11 Trust and Adoption: OpenAI Developers Using Codex Daily00:17:21 Applied Evals: Real-World Testing vs Academic Benchmarks00:19:15 Multi-Turn Evals and the Job Interview Pattern00:21:35 Feature Request: Batch Multi-Turn Eval API00:22:28 Beyond Code: Personal Automation and Computer Use00:24:51 Vision-Native Agents and the UI Integration Challenge00:25:02 2026 Predictions: Trust, Computer Use, and Democratized Excellence This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Transcript
Discussion (0)
Okay, we're here at AIE Code.
And we have two of our speakers.
Bill and Brian, welcome.
Hi.
In space.
Thank you for having us.
Bill, Brian, I know you're the listener for a little bit.
Oh, yeah.
What's your take on lane space?
Like, how does it, what role does it perform in your function at opening?
Yeah.
I mean, first of all, love the name.
Okay.
I'm a massive latent space context management versa.
How the story behind the name would be the chance.
Yeah.
So it's start, we never had late space as a sense.
of name at the start. It was called L-space. Interesting.
And one of my readers donated the domain name, Layton.
Dot space. He's like, you want it from like, yeah.
Awesome. So Layton just like came accidentally.
See, it was in the ether, but like I didn't have the domain.
Yeah, so I just, I just like called it L-space. El Space is like a visceral.
Nice. Yeah, no, it's, it's amazing. I love it because it's, you're like always on the
cutting edge and it goes into a lot of detail about all the things that like I should be keeping up with
It's part of my job and there's so much to keep up with, right?
So there's only so many sources of really good high quality information for what's like happening on a deep level.
Well, you guys have your own podcast now.
So I'm like, you know, like a pre petition.
Yeah, well, I still listen to yours and I still think yours is really good.
So you guys, I guess, are representing like a startups team, Codex, all the things.
You just launched Codex.
Yeah.
The Clare's Max.
Yep.
But a genit.
Yeah.
yesterday.
Yep, we're going to name names.
Yeah.
I do.
People do make fun.
I think Thibbo was like, yeah, you know, we're good at a lot of things but not need me.
I was like, well, why call it, Max?
Was there any like internal discussion?
Yeah, I mean, it's complicated because it needs to be differentiated from the previous one.
And the idea is like Max can run for a really long time.
We can go 24 hours or more.
I've actually like sort of had it gone for more than that.
And the name is, you know, is inside codex on the wind?
is that how do you
what you say
a really long time
24 hours
oh I
on my
oh that's
I think that was
on the web
inside of Kornats
I'm not sure
but I've actually done
on my local computer
for
for quite a bit longer
than 24 hours
over the course
of a couple days
but closing my laptop
and it'll be opening it
but but the name
you know
you could come up
with something like
pro
but pro is sort of like
slower more thoughtful
max is about
sort of like speed
and
maximization
like maximalist
So for this mono, it can run for a long time.
But it can also actually, for the same types of problems,
it can actually get through the right answer it faster.
So I can, it's simply better and fast.
Yeah.
So I think the part of what you guys are speaking about
is the training that goes into something like, yeah, that's right.
Bigly people just kind of wave their hands to say RL.
But like, what specifically have you learned about what's a good party's all of son?
So I got to, I mean, this sounds weird to say,
but I was lucky enough to be really close to the training team while GP5 was training.
And one of the big things that we focused on, Bill was there too,
we focused on personality, right?
So it's really important to build trust with developers for like how a model works.
And if a model doesn't act the way that you expected to do or if it doesn't work alongside of you as well,
you're not going to really trust it.
You're not going to get as much out of it.
So for coding, we thought, okay, well, what is the best personality for a coder, for a pair
programmer, for somebody you trust? And how do we, like, eval against that? How do we come up with
behavioral characteristics? And we came up with things like communication. It needs to keep you
impressed of what's going on while it's working. Planning. Like, come up with that strategy,
do some searching around, like, figure out context gather, figure out what to do before you
just dive in, if it makes sense to. And then, you know,
check your work, right? And so these are just best software engineering practices that turn out
to be behavioral characteristics, and we can measure the model's performance on those behaviors
and grade it that way. Yeah, I will say that another key aspect to how we train to the model is
you work really, really closely with some of our cop coding partners. And a lot of those folks
that lead on the bleeding edge, and so they have a lot of understanding of work or two.
particularities than ye, and we really focused on sort of those areas and really
drive deeply as into those. Yeah, that's right, especially tools, right? So like different
harnesses have different tools. Some people have context, like semantic search. Some people have
different ways of doing code edits. And initially, you know, our models are trained the way they
were trained to use tools. And that kind of bakes in a habit. And so we've been getting the models
better at using different types of tools.
Yeah, it's a lot to follow up on, but I'll go
tools first and then I'll go back on the personality
base. But the engineers wise,
I think the communication by the
5-Podex just came out was,
well, this is the model trade for
our Potex, not necessarily your
choice, right? Has that message
change for other startups
using the 5-Codex model?
Right, no, so Codex is, just to be clear,
codex is the frontier coding model
that we have that is optimized
for its harness. The codex,
team is very focused on creating a coding agent and they wanted to work perfectly
inside of the shape of the harness and API that we have. So they're completely unbounded.
It's open source. Yes, that's open source. And the model is available in the API. So that's what
they focus on. Yeah. And then the conflict is, well, you just said other startups have other tools
and obviously, I know that. It is possible. Like, one thing to mention here is, I think we can
probably disemangled a little bit on sort of the codex.
apart from the sort of the mainline models.
The codex models are sort of focused on the agents itself, right?
Like the codex agent itself, the model has been trained with an agent specifically in mind.
It actually turned out to be somewhat even sometimes easier to integrate
because we come into it with an firm opinion on what the sort of best way of using it look like.
And so some folks that we work with actually really appreciate that we come into it with that opinion.
Well, for the other ones that has more of a,
general or specific tools that they definitely need.
The mainline model is the one that's more general in the sense.
And that's sort of what Brian was referring to when he's talked about Jupyty 5's tools.
Yeah, so the 5-1 non-codex is more general across the board.
It can respond to things that are, it's much broader than just code it.
It has coding capabilities that are also mirrored in codex and they work together to keep that
true it up.
But since it's more general, it does have more steerability.
to different types of tools.
And when you're implementing tools,
the model can get bogged down
if it hasn't seen a tool that it's used to,
and it might take more time thinking about how to use it
or make more mistakes.
So our recommendation is if you're wanting to go
bleeding edge coding focused,
pay attention to the Codex line
and the Codex SDK and the Codex models
because that's the one that's really aimed at that.
You'll have to do some work to look at how
we're implementing our tools inside of Codex to maximize its capability without logging it down.
But, like, people are having success, like, bending it in ways that maybe we haven't thought of.
If you come to mind, I always want to pry if...
Sure. Yeah.
Do you have any examples?
You say bending in ways you haven't thought of it.
Yeah, so I think, um, so Codex is trained, uh, with terminal tools, uh, in mind.
And so what we've thought would be the case is you all essentially only have to strip out,
you have to strip out all of the tools except for the terminal tools.
But we found some like partners of ours like the discovery that what you can do is that
you can actually still have a lot of the tools just named in the same way as a terminal tools
as well as having the same input and output.
And all of a sudden the tool called performance jumped up by a lot.
Yeah.
And Codditz loves rip grip.
So if you make a ripgrip tool and tell it to use,
it, it'll use it. So if you call it
Grep, it actually does
a little bit worse, but if you call it
RG, it actually does really well.
Right. Yeah, yes. This is
something that we
ourselves only discover. This is one of the coolest
things about, like, model training
is literally, like, they develop
habits. Just like a person does. Like, if
you're, like, working on some podcasting
tool, right, you're really good at editing. And then
somebody makes you use a different one, it's going to slow
you down, you're going to get kind of bogged down to make a mistake.
Sure, but I would
I don't know if, like, yes, that's very humid,
but I would, I don't know if I'll call it cool
because they're supposed to generalize.
Well, right, that's the end, the end goal, yes, of course.
And so that's what we're doing with the five series of models,
that they're way more general.
And Codex is focused on maximizing coding,
and those are the sort of two horizons that we're working on.
Yeah, awesome.
I want to go back on personality.
I know you hate that word sometimes.
Eat it.
It means different things to do.
Yeah.
And when it comes to people who are like very, very keen on like model research, model personality is much more like,
I think really what your topic would say is like it warms your friend you guys for your, I agree with understanding people's emotional state, whatever.
And so it's this is really jarring when that is also applied to Toto agents where like, well, I got a talk to Vichie.
Like, Silicon Valley HB ultra is also saying hands on, but it could be paid on do it.
the freight. Awesome. I think the other thing is also, but what doesn't matter, because you said a lot of things about, like, commenting is that you're going to user engagement or that. Doesn't matter if it's so quantized anyway, right? Like, you're going for 24 hours. You're closing your laptops. You have like the extra high parameter now. Doesn't matter. Exactly. So here's, we're in this world right now where we're in between a situation where people don't quite have, like the models don't quite have the trust of senior engineers or engineers.
is doing very important work.
And so we found, our customers have found,
that people really want to follow along with what it's doing
so they can interject or stop it,
or at least understand what it's thinking
so they don't waste all the kinds of time
doing a rollout that they have to throw away.
So with the five series, because it's more general,
and it's just about as good as coding as Codex for a lot of things,
we've taught it to be more communicative.
And so it has preambles before tool calls.
It'll say things like,
I'm about to go look for this.
Yeah, and you can steer that really well.
I actually really like it.
I have, I've created like a personality.
I tweeted about this.
I created a personality for my coding agent because I really like my tools to be kind of like fun to work with if I'm in there with them.
And so I have it, it's got this like, it gets really excited if we do something together.
And like, because I want to wake up in the morning and be like, oh, I'm going to go work on this project with my buddy, 5-1, right?
But some people don't like that.
And also, for like you said, long-running agentic tasks, that can get in the way.
You're burning tokens that don't really matter if it's running in the cloud.
So 5-1, you can turn that off.
You can prompt it not to do that.
But the Codex model can't actually do that.
And it relies on the reasoning summarizer to give you that update.
I guess more broadly, why should people know or think about in terms of what will be as to with voting models in general?
More broadly than just like you be your book experience release, just like, what?
What trends are you see, what discussions are active?
Our talk today is folks on talking a little bit about sort of the trend that we're sort of
seen.
Is the abstraction layer really moving, starting to move upwards from the model layer,
where it's the age of layer.
As I said, we trend our models starting to be a little bit more opinionated, especially
with regard to a Boeing model like Codex.
And the models are really good at doing certain things, widened inside of a certain
harness
assert typing search.
And so we're actually
packaging that up more closely
so we're actually shipping this
entire agent altogether
than you can actually build on top of that agent.
That's one of the patterns that we're seeing here is
rather than focusing on optimizing
with every single model release,
you're actually just be able to plug
in an agent like codex into your
platform and be able to use an app box.
Yeah, and you're seeing Zed use this
GitHub,
VS code, lets you just like package
to hold
agent to work inside of it.
That way, like if you're building a coding tool, like said, and you don't feel like having
a whole team keep up with every single model release and every single API change and how
to update the harness to do different cuts of sandboxing and all that kind of stuff, you can
just build one layer above.
And that is actually super powerful because coding is just like one agentic behavior.
It turns out it's a really nice one to start with because you can measure the performance
sometimes easier with a lot of other ones.
but it also gives the model the capability, right?
So we started out with like chatbots.
Like you're having a conversation.
Let's give the chatbot a tool to use.
Okay, so now you have an agent that can like run commands.
Well, let's give the chatbot agent a codex to use.
So now if it doesn't have a tool, it can make a tool that it needs to solve a problem.
Right?
So that's like another layer of abstraction and it's not just coding.
You can write software that has an agent that can split.
spin up a codex instance, and write a custom plug-in for your software for that customer's API, right?
And so now your software is self-customerizable because it has its own team of people inside that can do integrations at launch.
Yeah, solving integration engineering is a CI.
Yeah, one thing I'm binding at this conference so far, even early, like the first Tener Oaks.
I think people are starting to really explore sub-ages, ages that are more abstracting, ages that use agents.
and we used to call it multi-agent.
I don't know why it was on now.
I don't know if there's any thoughts on your end about this,
where you can tool-call.
I guess a very basic example is what you just say,
which is that the agents can create another instance of Bodex
that creates a tool and then drop me.
Just use the tool.
Is there a case for skating like some agents?
TGISERI, A you go?
Yeah, I think so.
I mean, Codex Max was designed for that.
So it has its own compaction and context management.
Codex-Mex manages its own context window.
And so it can run basically forever
without you having to worry about it
while it's inside of the Codex harness.
And that lets you do a lot of different things.
You can essentially have it handoff
its own context to other sub-agents, right?
So letting it sort of like spawn different agents
to do more of its work in parallel
and all kinds of things like that.
So it's built for that.
We're just sort of like starting to see the indications of like what that means.
But that's I think the future and we're really excited about that.
Yeah.
It's really, I think like as I said, the trend that was sort of observing here,
really moving up the attraction layer to the agent to the agent layer really allows you to do a lot of cool things like brand new spaceship,
spending a few agents, creating new abstractions as things as the long running agent workflow.
continues, and right now, we're building
all the primitives
as well all bottles, specifically
with animites. Yeah, and it's really
about moving the threshold up
further, right? Like I was saying before, like
I now trust, like, Codex to do
some of my hardest work. I haven't written a single line of code by
hand in months, because I know what I
can trust it to do. You're the
Forbes person that said that in the last way for us.
Yeah, no, it's real. I mean,
I've actually launched something. There's an open source
project that I did. There was a Codex upgrade pack,
for migrating from completions to responses
that was totally written by
Codex. And I didn't write a single
line of that code. And now it's out
there. It's open source.
I should most of the folks at Open AI.
Well, initially when Codex first launched, it was
around 50% of folks that Open AI
started using you, but now up they go, with those
folks that open app. That's very true. We use it
every day. The way that we do it
is we're really good at eVals.
Right? Like, in order to develop trust
and build a product that can do
more than you design it for, which is
what we're talking about here. You're making an agent that can
solve its own problems.
You have to get really good at figuring out
how to build those guardrails and e-vils around
what is it doing, what is it allowed
to do, and check it in production. So we
have all of this platform tooling now
around agent traces and rollout traces
and coming up with e-vows for that
and building, you know, graders and
all the things you need to sort of maximize
the pipeline. So you can let it go
and then be like, okay, I don't really like
the way it did that. Great it. Have it met a
prompt itself so that next time
it actually does a better best practices.
One of the biggest is
you use in terms of
which is the organizational capabilities
that OPI see messaged is
a prior to. We see more about that.
Like, why is that suddenly a big priority now?
Obviously, I think there was
a lot of this OPA I always did
internally about it's, but now it's like
a team that is more
over-facing and then you be able to this random era.
The path to your AGI
really goes to VE those
and well, I'm sorry.
That was a little...
It's so true.
Repeated way too many times.
But I think there are a lot of academic e-vows, right?
There's like sweep-ends, there's other, like, you name it.
But I think there's a slightly lack of evals off the real world
on sort of what people care about the most.
And we want to make sure that whatever we're developing, model-wise,
as well as product-wise, are aligned
and are actually making the most amount of use.
sole impact on this world. And applied evals is really in that direction, capturing all of those
sorts of real-world use cases and things for us to hill climb together. I like to think of it as
like we have, I mean, people say it's a PhD and an API, right? But if you, you know, you hire a PhD
student, they don't know how to do the job. You have to give them a job description. Okay, that's a
prompt, right? So now you have your policy and then you have them do the job and they're going to kind of like
flail around, right? So they need.
mentorship, they need guardrails, they need evals, performance reviews on how to do their job,
the best practices. And so what we're doing is we're trying to put our models out there and see
what they're good at, what they're not good at talking to our customers. They're like,
oh, we could really use your model for more things if it could do this one thing. Here's our e-vow for
it. Or help us build those evals with you so that we can see where we're deficient and go back
and train the model to be able to do that job in the way that we wouldn't normally get to see it
form. Yeah. How do that?
do you through multi-turn evals?
So I think that's the really hard thing that, I mean, sometimes you need multi-turn
if it doesn't get around on the first go, but it could just get around the first goal,
then it's more longer multi-turn, right? So then what?
Do you want to take, I have some ideas.
Oh, yeah, you go. I mean, I've built a few myself.
I don't, this is, this is sort of like my personal work.
I think this is like an area that people are just now getting into, right?
We have LM as a judge.
you can use LLM as a judge to look at an entire
trajectory and see, okay, over the course of all of this,
like how well it did it perform, what did it do?
And then you could maybe like walk it back a step
to the part where you don't like,
and then you could have the model run the next step
with the instructions, graded it on that,
and then have it improve itself.
Oh, I don't like the way that you...
We do this all the time inside of harnesses.
It's like, that was a good answer,
but I don't really like how long it took you to get there.
So can you give yourself better,
instructions we're doing that next time, it'll write something and we'll add it in there,
and then suddenly it's better, right? So that's one way of doing it. Yeah, I think multi-turn evils,
most of the companies or startups that we work with, like these days, the agent runs and then
multi-turn way, right? And so, therefore, if you can build an agentic harness that works in a
multiple turn way, you can eval it. And then there are like also academic benchmarks,
already does this in some ways, like Cowbench,
and now we have like Tau Square Bench
that does this like particularly well,
and we'll definitely certainly take inspirations from that.
I have this idea. I call it like a
job interview eval.
I haven't finished it.
But really, like, if you're evaluating a coding agent,
what do you want it to be able to do? You want it to be able to take an
underspecified, imagine you were interviewing a developer.
You give them a problem. Hey, like, go implement a string reverse
or whatever. And then it's like up to them to like ask for
okay, well, I need more information.
What are the constraints here?
Like, what is, and then you judge them on that.
And then they start implementing it.
You give them some modifications.
You grade them on that.
You can imagine, like, building, you know, with an LLM, like a rollout that is
comfortable and the model responds and that you can kind of grade the whole thing.
Yeah.
One thing I would love, and this is like the feature request part of the podcast, is
batch multi-turn evel API.
You know, so batch API is single turn, but you can't really batch multi-turn requests.
Is that already doable?
Batch multi-turn requests.
I don't believe it.
You can't do it yet, but yeah, I think that's like a really valid.
Because you need e-vals to be cheap as possible.
Yes.
They're not that time-sensitive.
And you want to run it overnight when the things are cheap.
Yes.
Well, feedback taken.
Feed-dustaken, man.
But that's the thing.
Every day we're trying to break the platform better.
And right now, evals is certainly part of it.
we make product feature updates as we talk to people like you.
Yeah.
They're like, hey, can you do this?
I mean, it's super like, yeah, if I'm going to throw thousands of runs at this thing,
you know, I should probably spend some time worrying about costs.
Speaking of which, what are you trying to, you though?
I mean, Devin and Cascade.
You know, and, uh, I, so I have a personal side project where I want to make Devin for non-coding.
Oh.
Because I really love Devin so much.
Like, they slack.
My kind of semi-hot take that I'm floating around.
because just to see how it feels
is I think Slack is the ultimate user interface
for work, right?
I don't want to read email.
I just read Slack all day.
I interact with my email agent through Slack.
So basically I'm building a dev-in for email.
Yeah.
Well, that's the thing is, like you can use
you can use Devon to do that, right?
Like a coding agent, like Codex, a CLI,
it used to be back in the old days.
Like I started out in the 90s working at IBM
as a system administrator
and I had to write my own custom software and bass scripts and whatever to actually solve real real problems every day.
And so I had this like, you know, toolkit of the scripts that I made, right, that were like organizing file directories or doing like other random things that weren't necessarily writing code.
Yeah, yeah.
And so you can get for not-40 use cases to just like sort through your email using like Elm or something, right, in the terminal or like have it generate like snippets of video clips from YouTube that you can,
watch later or things like that.
You know, I never thought about that, but I do that
all the time as part of lane space.
Yeah, I should probably invest
in that tooling. I had, I had Kodax go
through my really messy directory of
all of these experiments that I was running and
completely organized them and, like,
put them in the shape, and it was so wonderful.
I used it for something that's more boring
organizing my desktop.
Yeah, you know, we have a lot of files on the
desktop, and Kodex is really good.
People think...
Kodi-M-N-G-0-1416.
JPG or that thing.
Yeah, well, just find all the images and put them in one folder.
I think that even that, that's something codex can do.
I think that's one of the big themes that are also seeing, like, coding tools of breaking
out of coding and just like everything.
They're personal automation.
Exactly.
Because the way, if you can think about before graphic user interfaces and browsers, like,
how did we interact with a computer?
What did so through a terminal?
And we did so by writing commands and writing code and stringing them together inside of
the terminal.
So what you think about it is those coding agents are actually a computer use agent,
but for the terminal.
Yes, yeah.
They're actually incredibly general.
I would say that coding agents today are still not vision native enough.
Like, you have to try to get it to use vision.
And oftentimes it fails still.
We should use vision a lot for.
Yeah, I would say, you know, I was going to end the episode with asking for your 2026 predictions.
Like we sit down this time next year, what do you want to see?
You know, what do you hope to see?
I'll just kick it off of the easy one.
Yeah.
More computer use.
And I think like where you say things like, oh, we'll have a coding age and build its own
integration to your application.
A lot of applications don't have APIs, don't have NCPs.
The only thing you have is a UI, right?
Yeah.
Because their legacy or because they don't want you to take the data.
But while the data is yours, you just have to, like, in a non-provision way, take it by the user.
Yeah.
Yeah.
And I can continue just by sort of like saying that that's definitely going to be something, I think, is going to be something that will be capable of in 26.
And but also the other thing that I am sort of really like looking forward to are codex being able to do more, right?
We're already starting to talk about our codex or like our coding agents can sort of use computers in novel ways.
We're going to be able to sort of see more general and general use.
like that coming along as well, and more sensible ways for you to build with those sub-agents
as well. I really want to see the trust level go of even further, right? Like, at opening
I get to work with some of the most amazing developers I've ever worked with in my life. They're
incredible, like some crazy tech leads. I wish every company, no matter whether there's like a
small dev shop in Alaska where I worked for a while or opening I, be able to have on their team
like capabilities that you would only be able to get at like a top tier firm, right?
So like, so all of my teammates at all of these places could turn to a coding model
be like, hey, how do we do this like crazy awful refactor that we have to do
to get us to support this new customer that we have?
Or like, wow, there's so much of a mess here.
Or like, what's the best way to actually implement this new technology
and have it be so trusted and so right and so smart that like, you know,
we can actually perform better than we could normally get access to it.
Yeah, see?
I think that's going to be any any of friendly calls or something?
Oh, yeah.
We're Brian and Bill at OpenAI, and yeah, feel free to find us on our Twitter, socials, whatever,
and then let us know how you're building.
Yeah, and we love working with startups, and anytime you have feedback about,
you really wish the model could do this or the product can do this,
and you could unlock some massive capability, just let us know.
Yeah, amazing. We'll do.
That's it. Thank you, guys.
Nice.
Thank you.
