The Changelog: Software Development, Open Source - Let's build something phoenix.new (Friends)
Episode Date: June 27, 2025Our old friend Chris McCord, creator of Elixir's Phoenix framework, tells us all about his new remote AI runtime for building Phoenix apps. Along the way, we vibe code one of my silly app ideas, calcu...late all the money we're going to spend on these tools, and get existential about what it all means.
Transcript
Discussion (0)
Welcome to changelog and friends, a weekly talk show about existential vibes.
Thank you to our partners at Fly.io who are highly featured in this episode not because
they sponsor us but because they do cool stuff and we like cool stuff.
Check them out at Fly.io.
Okay, let's talk.
Well, friends, Retool Agents is here. Yes, Retool has launched Retool Agents is here.
Yes, Retool has launched Retool Agents.
We all know LLMs, they're smart.
They can chat, they can reason, they can help us code, they can even write the code for
us.
But here's the thing, LLMs, they can talk, but so far they can't act.
To actually execute real work in your business, they need tools and that's exactly what Retool agents delivers.
Instead of building just one more chat bot out there, Retool rethought this.
They give LLMs powerful, specific and customized tools to automate the repetitive tasks that
we're all doing.
Imagine this, you have to go into Stripe, you have to hunt down a chargeback.
You gather the evidence from your Postgres database, you package it all up and you give
it to your accountant.
Now imagine an agent doing the same work, the same task in real time and finding 50
chargebacks in those same 5 minutes.
This is not science fiction.
This is real.
This is now.
That's retail agents working with pre-built integrations
in your systems and workflows.
Whether you need to build an agent
to handle daily project management
by listening to standups and updating JIRA,
or one that researches sales prospects
and generates personalized pitch decks,
or even an executive assistant that coordinates calendars
across time zones. Retool Agents does all this. Here's what blows my mind. Retool customers have already automated
over 100 million hours using AI. That's like having a 5,000 person company working for an
entire decade. And they're just getting started. Retool Agents are available now. If you're ready
to move beyond chat bots and start automating real work, check out Retool Agents today.
Learn more at Retool.com slash agents.
Again, Retool.com slash agents.
Today we're joined by our old friend, Chris McCord.
Welcome back, Chris.
Hello, thanks for having me back.
This is your third, fourth, fifth, or sixth time on the pod.
I don't know, I didn't look it up this time,
but you've been around as the,
probably talking Phoenix pretty much at all times,
as my guess.
I think so.
I think so, yeah.
Elixir maybe, but probably Phoenix.
As you know, we're pretty big fans of Phoenix.
We've been running it for a decade now.
So thank you still, and again,
for creating a cool web framework.
Yeah, you're welcome.
I play with it.
Which I use like none of your cool new features,
like I'm basically using the stock crud abilities
from like 2016. Hey, that's cool too though.
We'll take it, right?
And it just works.
It does just work, and I continue to enjoy it.
I even avoided contexts,
even though I was kinda keeping up with the Joneses.
I am on a recent version,
but I just ignore the warnings or whatever.
That's fine too, we could, yeah,
there could be a whole episode on that,
just one giant rant, but yeah,
it's modules and functions, you know,
it's all we're asking.
That's right.
That's right, we're asking.
If you want to, it's a suggestion.
Maybe create well-defined interfaces, right,
but that's it, so, yeah, do what you want.
Well, I mean, who writes code nowadays anyways, right? That's right, it doesn't matter anymore, right? But that's it, so yeah, do what you want. Well, I mean, who writes code nowadays anyways, right?
That's right, it doesn't matter anymore, right?
Because one-
It doesn't matter, that's where I'm getting to in my life
and that's where we're getting to.
With coding agents taking over the world,
it's like as long as they know what the new features are
and I can test drive it in the browser.
They write pretty good Phoenix context too,
so I'll just do it for you.
And you have a brand new related thing, phoenix.new.
That's spelled out, spell it out, P-H-O-E,
Adam, can you spell it out?
Oh my gosh, yes, I don't mind.
Scared me, because I don't know how to spell this word, okay?
P-H-O-E-N-I-X. Ding, ding, ding, ding.
I win.
Nailed it, nailed it.
Whew, man.
.new, which is the cool new TLD is the.new.
It's the cool new.
I love.news.
You know what I mean?
It's like, it's the place to go to start something, you know?
You gotta go there to do it.
It was available, so it works out well.
All the cool kids are doing it.
It took us a long time to get.news.
We could have got.new and put a slash s. I just realized that would be cool
Just go to change log dot new slash s. I don't know. That's a hard time saying that out loud
The dev app URLs are also phx dot run
So, yeah, that's cool to dot run. I didn't know that was a thing, but I was like this is perfect
I like the new TLDs. I don't like that they cost a premium. Yeah, it's ridiculous.
It's like, how about 9.99?
Like it used to be in the good old days, you know?
Yep, oh, I think I paid like,
it was like seven something, $7.
Oh gosh.
2003 was my first domain.
I think to expect less than 50 bucks a year
for a domain these days is just like not a possibility.
No.
Just not. What's the dot new going rate, Chris?
I think it's several hundred dollars, 700 bucks, 800 bucks.
I don't know.
Wow.
It's a lot.
First time or annual?
It's annual and there's something like,
I think within 90 days, you have to actually have like
some kind of like real property on it or something.
Or they.
Oh wow.
There's some rules there that yeah, you can't squat them.
Not that old, it can't be old.
Yeah.
I don't know how they enforce it,
but you can't squat those,
but I mean, they're kind of price prohibitive
for squatting anyway.
Those prices are like acting like zero interest rates
are still a thing, you know?
It's like, come on.
We don't have that kind of money anymore.
Get it together, man.
You know?
But I should speak for myself
because apparently fly.io sprung for this.
Phoenix.new, they can afford it.
And dot run, which is super cool.
Tell us about your new project.
We started in back in December.
Of course, this is kind of what everyone's doing right now
is like, how can I make LLMs and agentic coding work
in my slice of the world?
And your slice of the world is Elixir and Phoenix.
That's where you started, right?
Yep, that's right.
Yeah, so we can talk about what it is now
and what I think we accidentally made,
which is this journey that I've been on
since we started this.
So right now, Phoenix New is a essentially
vibe coding Elixir and Phoenix platform.
But I think what differs a little bit is like,
we give you like a full machine with root access.
So we kind of just like let the agent have full range
to go full ham on whatever it wants,
install app packages and build a full stack application.
So a lot of these like five coding platforms
will gladly write JavaScript apps
and run them in the browser.
But like if you want a real app,
it needs to talk to the database,
needs to talk to file systems.
We wanted to start by building a full stack app generator.
So that's kind of what we've arrived at.
So it's great at building Phoenix in real time
live view application.
So out of the box, you'll get what
you would expect from a Vibe coding platform, fully designed.
But then everything that should be real time will be real time, kind of like how we build things in Phoenix and Live View.
So the agents kind of like told, like, make everything real time. And then it typically
makes everything real time. So that's like the current out of the gate experience. And
what we found is like, it actually takes very little to get this agent because it has shell
and it has these like sharp tools to like get it to do anything. So the first thing my coworkers
did was they immediately had it create a Rails app and it's optimized for Phoenix currently,
but it's like an effort to kind of nail this full stack application and giving it like,
you know, we give it shell and root. It turns out that like you give agents like a few sharp tools,
they kind of just can make decisions and choices on their own.
So kind of where I see this going in the future
is how I'm building it as like a remote AI runtime.
So similar to like Codex or Devin,
or I think Google has like a Jules product now
where you can just like have this thing asynchronously work on stuff.
We can do that too. And it turns out it just does it.
So when I built things, initially everything's running as an Elixir app behind the scenes,
and that's stateful.
So it's like we accidentally made this remote thing.
So the agent, if you ask them to build an app now and close your tab,
throw your laptop out the window, it's going to keep working,
and you can pop in from anywhere in the world. So it's already like, it's already headless
and like you don't have to be there.
So much like Devon or Codex, you can just ask the agent,
hey, go check out GitHub issues or PRs
and send a PR when you're done.
And like it will do that today.
So I think, you know, while it's optimized for vibe coding
out of the gate now, like a system prompt is like
all about vibe coding an app.
Like the next thing we wanna move towards is like
more of these rich codex type flows
that it can already do,
but doesn't really know it can do.
That makes sense.
You have to let coax it.
How deep did you go on making it know Phoenix well?
Is it just the system prompt?
Is it deeper than that?
Yeah, I mean, it's just a system prompt combined
with let's say the quote unquote world knowledge
of these frontier models.
But the remarkable thing is, so we're using
Cloud for Sonic currently.
But the remarkable thing is how portable it is.
My intuition coming into this space
was like, all these things are non-deterministic.
You change one little thing in the system prompt,
and it's a totally different behavior.
And if you want to move to another model, like open AI or Gemini, it's
going to be a ton of rework. But it turns out like you just shop your system prompt
around and you get reasonable behavior just out of these things, which is totally against
my intuition. The knowledge is mostly gap filling. So like you're relying on this implicit
world knowledge and then through a lot of trial and error, you see where it sucks.
All these agents like to put bracket index-based access
on Elixir lists, which blows up.
It's not a thing.
So you have to find these dumb things that these agents do
and then tell them what to do and what not to do.
But it really isn't much harder than that.
And then you give them tools to kind of get over stumbling blocks or like go fetch things
as they need.
So it's like, it can, since it runs shell, it can just like get the Elixir documentation
out of a module locally or it can hit the web and fetch it.
So it's just a fascinating field that I think is overly complicated that it's far more simple than folks realize.
Huh, so somewhere in your prompt,
it just says like,
Elixir doesn't have,
Lisp does not have an at function in Elixir
or something like that.
Like you're literally just putting
those little things in there.
So that it never does it.
Yeah.
You just, in dumb English, you're like,
don't do this.
And it doesn't do it after that.
I mean, it's really a lot of trial and error.
People likened it to like spell casting.
But it's far less fiddly than I would have thought.
And given the non-deterministic nature,
I thought it would be like,
oh, now I'm gonna add one line
and just gonna throw everything else off.
And that's not been the case.
It's actually been remarkable how much
they stay well behaved.
Now do you have regression tests for this?
Because that doesn't have to be there,
maybe with Cloud 5, because now it knows there's no at,
and you could pull that one out and simplify
or is that just doesn't matter?
Yeah, not currently.
I mean, it's mostly, we've done a ton of trial and error.
We have some headless driven integration tests
where we actually do the full cycle,
but nothing like scoring the result.
Because that's the hardest part is what constitutes
a successful outcome.
And it's not just getting to a running server
because most of these models can get to a running Phoenix server
at the end, but does it look good?
So Cloud has been the best at design by far in my experience.
So it's mostly about the end-to-end.
Does the app look good? Is it just some cruddy thing?
Or did it actually come up with some compelling actual...
You give it like, make me a to-do list,
and did it actually come up with some compelling features
that weren't just... that weren't implied,
like were implicit.
And so most of that is trial and error
and just generating much naps and finding out.
Have you found that Cloud 4 in particular
is better than other things right now?
It just seems like maybe it's not this particular model
or version, but it's like
mid-2025 all of a sudden I feel like the coding agents and I specifically have experience with Claude where it's like, oh I'm not mad at you anymore like I used to be at the previous versions.
It's slightly better than Claude 3.5 or 3.7 whatever the previous Sonic was. It's the best
and I think it's just a little bit better,
not remarkably than previous Cloud,
but Cloud has been the best at these agent workflows,
and I use words like, it's the best decision maker,
and it makes the best choices on what to do next.
But most of the models, even like Grok 3,
will go through the standard steps
that you would expect the agent to do when it's building a Phoenix app. It's just like whether it gets
caught on these little things or makes a silly mistake or like makes an app that actually
looks good is Clyde just is like gets over that quality hump. But the other others are
definitely viable. Like GPT-401 is similar in this agent-intake flow. It looks the part. It's just not quite
as good as Cloud. And Gemini is the same. They work. And they're really good. And for
talking single file, make this code for me in one file, then it's a different story.
A lot of people love Gemini 2.5 Pro. It does great job, but like as far as like this end to end,
you're an agent, you make decisions on this step by step
flow, Cloud just seems to nail it
compared to everybody else.
I ask that not to toot Cloud's or Anthropix horn,
but because I feel like for me personally,
and maybe it's all of them have reached a threshold
of quality recently, where I've kind of bought in now
more fully than I was.
And it just seems like it just recently happened.
It was like sometime late last year.
I mean, when GPT-4 came out, that was when,
I wish I had had the insight then.
That was pretty much what changed the game
to do something like we're doing.
And we're just now catching up to, I think what these models have been able to do for
a while now.
When is phoenix.new to, to fly?
Like, what does it represent?
Is it a skunk works?
Is it a growth model?
Is it marketing?
Is it R and D?
Like, what do you, how do you categorize it?
It started as just, I would say more marketing and I'm not even gonna call R and D. So the,
the original thesis was like a lot of folks
in the Elixir community have been like,
these agents are all doing JavaScript,
all these platforms are doing JavaScript.
And since JavaScript has the most data,
we're gonna fall behind because JavaScript's
gonna eat the world because that's all the agents
are gonna write and pretty soon no one's gonna care
about what the agents are writing, right?
So part of this was like, you know, can we show that
Elixir and Phoenix are just, you know, work great with these large English models.
And the other part was like with Fly is like, you know,
we have a large customer base that is using our platform
to do these vibe coding agents,
but a lot of them are just generating JavaScript.
So it's like part of it was marketing to show like
the original goal was like I had six weeks
just to spike out a text area on a webpage
to generate a full stack Phoenix app.
We were just gonna use that as kind of a marketing
for Phoenix to be like, look, you know, we're here.
We can, you know, we can do the same cool stuff.
And then there's also one market fly for that segment
to say like, yeah, we're great at like sandbox JavaScript, but hey, look, you know, you can just have the agent
write whatever. So six weeks later, I had like, I basically had the MVP of what you
see today. You know, it wasn't quite as posh and good, but it was like basically like in
full in browser ID, generating a Phoenix application. And it was like, oh my God, like, there's
something here, right? Like it was much more than I thought was that we could deliver.
So we decided to kind of see where it went
and see if we could turn it into a product.
But it definitely started as this just like
little marketing R&D thing that suited the Phoenix side
and fly side and then it turned into like,
oh wow, this could be a thing and now it's a real product.
So we're gonna see where it goes.
So I would say Skunk Works, it went from marketing to,
okay Skunk Works to now growth, right?
Like, okay, let's launch this.
Okay, we have users.
Okay, let's try and do this thing.
So this is not a product?
Is that where it's at now, product level?
Yeah, we're in our product growth raise, right?
I mean, before we launched, what was that?
Four days ago, so we've had hundreds of people
sign up at this point, so we're doing it.
Let's go.
I mean, that happened for Bolt as well, right?
Bolt.new.
What was their previous company?
I mean, it's the company that created Bolt.
They were doing other stuff.
It was like Node in the Browser. I can't remember what it's called. I've met a lot of them. Oh really, I didn it's the company that created Bolt. They were doing other stuff. It was like Node in the browser.
I can't remember what it's called.
I've met a lot of them.
Oh really, I didn't know they had some previous.
Yeah, yeah, they had been startuping
and doing cool things in the browser for a long time.
I mean talking like three, four, five years
and they'd been on JS Party and Bolt was their new thing.
And it became their only, I mean it came out
and just was really cool
and got huge adoption right away from folks.
And so it became now, I think, who they are.
It's like, talk about a pivot.
I think it's crazy.
Their story is actually quite crazy than that.
It's, their founder had some like stuff out there,
I think even as well.
It was like a weird way,
the old version of the company kind of like faltered
It was a stack blitz. That's what it was stack blitz. Yeah, just came back. Oh, yes. Yeah
Yeah, I got a bolt on new is from stack blitz and now it's just bolt like that's who they are now
Yep, that's so weird then maybe that's not the same like it who who is the real bolt here? Okay?
Maybe I'm wrong
here, okay? Maybe I'm wrong.
There was an old bolt too then.
Maybe I'm wrong.
I'm sure there's been another bolt.
Yeah, so there's been, yeah, there's,
and you know, they've had explosive growth
as well as like lovable and there's been some big folks
in this space, you know, so initially it was just,
you know, let's see if we can kind of show this as possible
in a full stack away.
And it turned into like, oh my god, now it turned into like,
here's a full ID with a root shell in the browser.
So I think pretty quickly it turned
into a very compelling remote dev runtime,
starting from what if we just gave you a text area?
Because I think a lot of the other players in this space,
you get the chat interface, and they kind of give you
some kind of like basic code editor
or code visualization, but we're just like,
now we'll just put VS code in the browser
and let you and the agent go at it.
So now that you all realized how generally useful this is
and not necessarily specific to Elixir or Phoenix,
like you can do other things,
especially if you stop making it seem
so elixir and Phoenix-y.
Do you wonder if maybe you like, you know,
pigeon-holed it or misnamed it,
or maybe it should be something different,
or is Phoenix still cool, you know,
even for people who don't know what Phoenix is?
You know, we went back and forth on this a lot,
because you know, it definitely started as like,
let's do this for
Elixir and Phoenix, and then over time it became apparent.
Oh, wow, this thing is like, turns out if you give the agent a full environment and
you let it do sharp tools, it could just do things.
So we decided to, we wanted to nail one stack to start.
So once it became apparent that we could use this for pretty much anything, right?
Like Ruby, PHP, Go, Rust, like all the languages you would care about are already on the box.
But we wanted to actually like give a compelling experience for one stack first, right?
Because it's like if you could release this, right?
But if the agents can just like, if the agents just flop around being moderately okay
at Rails or Phoenix or whatever,
then it's still not gonna be a good experience.
So we definitely wanted to start with,
let's nail one stack, let's actually make it compelling.
And Phoenix gives you a lot as well,
like real-time features, right?
So it's like, if you can nail one stack,
and especially with Phoenix,
you get these real-time apps that sync out of the box.
There's something, I think, unique towards the future of,
if we take the argument that JavaScript eats the world,
and it doesn't matter what language these agents write in,
they're going to use JavaScript because that's
what they've seen, we can flip it around and say, well,
what if we can get to that world where the code that the agent
write doesn't matter for us or people asking it?
Maybe Phoenix can be the thing that doesn't matter, right? people asking it, like maybe Phoenix can be the thing
that doesn't matter, right?
Maybe we can be so lucky that most people have,
they don't care.
And if you flip it around and say like,
let's like, could we do that?
Then the agents actually have the ability
to make these really compelling experiences
with far less like glue and things,
infrastructure to bring in.
So it's like there may be, I think, you know,
there's a thesis and a story there. I'm like, if we are keep progressing towards this world where there's like less and things, infrastructure to bring in. So it's like there may be, I think, you know, there's a thesis and a story there.
I'm like, if we are keep progressing towards this world
where there's like less and less,
like we don't show the editor anymore
because the agent, you know, agent does that code stuff.
Then I think Elixir and Phoenix actually may be
the perfect language to be that thing that people
by and large don't care about.
That makes sense.
So there's, I think something special there
with Elixir and Phoenix, but I do agree
that the positioning has been tricky for us
but right now it's like we want to get make it compelling and make it compelling for the folks that don't care about the language or
Get them into elixir Phoenix this way and then as we do that
Backfill with with other stacks and kind of see what we do branding wise but TBD
How close are we to that future where the language matters less the editor is shown less like how close we to where
That's a realization. It's a contentious topic. It is
Yeah, so I would say like the CEO of fly kind of like I'm not gonna say pitch to me
well one he thinks Phoenix knew is like the most successful nerd snipe of all time because you know,
it started as his idea of like, oh, Chris, just, you know, spend six
weeks go make this like text story on a webpage and it turned into an accidental product.
Yeah.
But it was his insight on like, you know, if we are heading towards that future, like
maybe we can make it like Phoenix that platform that is these agents are excelling at.
And I thought that seemed far off.
But then if you follow the Hacker News discussion
on the announcement, the top comment was a PHP developer
who had never...
They knew what Elixir in Phoenix was, but they never tried it.
And they were like, well, it's now or never.
So they signed up, and then they developed...
They made a tic-tac-toe game that was multiplayer,
and you could create your own room
and then, like, play with other people.
And they made that in one sitting and then deployed it on fly and they had never touched Elixir and Phoenix before. So it's like in one sitting this person, they were an experienced
developer but they didn't write a single line of code and had this like compelling experience that
converted them to an Elixir user, a Phoenix user, and a fly paying customer in one go. So I think
that like it seems like people hearing this
that are just coming into this space
will think it sounds like way far off,
but it's like, we're seeing that today, right?
Like literally someone came in, like typed into a chat
and like they got this app multiplayer real time
out of the box.
So it's like, maybe not as far off as folks think.
And I think that's where we're headed.
I think that the programming, I'm
going to call it iteration because developers are very, they don't like this idea, but I
do think that local development becomes less and less valuable. So, it's like, I think
that most of our code iteration, most of the computation time is going to happen remotely
just because these agents provide value at all times. So it's like,
it will become silly to think that like I close my laptop and like work stops happening because
why would it, right? So like, this is again, forward looking statement. But for me, I think
the future programming is like much more like your CI environment is constantly out there just like
fiddling and doing stuff and like you pop in and check on it or work within that context,
maybe locally too, but your predominant thing that's being the artifact that is your software
is going to be running somewhere else and the agent is going to be doing that subset
of that work and where that subset starts and stops, I don't know.
I can't predict the future, but I feel pretty confident that's where we're headed.
But we'll see.
And a lot of folks do not like hearing that opinion.
Well, it has huge implications.
I'm hearing echoes of the death of the IDE,
which is what Steve Yegge predicted
on this show a few weeks back.
And he didn't mean like it's gonna disappear,
but just the reducing towards obsolescence,
like you're moving away from it
as an important piece of the thing.
The most interesting thing with this is like,
part of when I put this together is like,
a lot of these other Vibe platforms don't have a real IDE.
So I thought it was like really compelling
to have like VS code in the browser.
And I still think that's true.
But then the funny thing about develop,
like making that is like the editor, the IDE that like true. But then the funny thing about making that is the editor,
the ID that most people think is the thing,
is just eye candy for humans.
So like this agent-
They're just watching it do stuff.
Yeah, this agent, it serves no purpose to the agent.
So the agent, you close the tab,
it's not aware of VS code, right?
It's just literally there for us slow meat brains.
I mean, we can go in there and interact with you,
but it's fascinating to like, to work my way, you know,
bottom up and then be like, oh,
this thing could just go away and it doesn't matter
for the actual process of the agent working.
It's just fascinating.
So I definitely, that resonates with me.
And I don't know how I feel about it fully,
but it is the reality of what,
where we're headed and reality of where we're headed
and kind of where we're at.
So yeah, I definitely agree with that.
Yeah.
And we tend to anthropomorphize too much,
but I can imagine if I would just to do that a little bit,
that the agents themselves would be fed up with us
at some point.
Like, why do I have to show you what I'm doing
and like teach you this stuff as I go?
Like, you're adding nothing to me here basically. Like just and teach you this stuff as I go? Probably, I mean that's where-
You're adding nothing to me here, basically.
You're in the way.
Just let me do my stuff, I'll report back,
and then you tell me if I should do something different.
That, I mean, I totally agree.
And that's where we're at.
It's like there are limitations currently,
but it's like you can just let these things go off
and rip and then come back,
or they just send a PR when they're done.
Right.
And I think that, yeah, that makes people uncomfortable.
And it's also weird to me, like this whole,
we're in a really, really weird time, right?
Where you have people that are getting all this value.
Like I'm using LLMs every day and I'm like,
I feel like I'm a God tier developer, right?
And then I have like people that are really intelligent
peers that are like, LLMs provide no value to me.
And I'm like, I don't know how to reconcile
for these two worlds, because I'm shipping more
than I ever could.
And then there's also, the whole space is weird too,
because it's overly complicated by the folks building
these tools, I feel like.
It's far less complicated after coming
through this experience than I expected it to be.
And it's also like everyone's trying to build an editor too.
So I think it's just like, you know, I could be wrong, but it's just like a very weird
like I think Windsurf there's rumors for like multi-billion dollar valuation or acquisition.
You've got cursor, which is doing amazing work, but everyone's like trying to build
the IDE.
And I feel like we're building the IDE,
and the IDE is gonna disappear
by the time they get done building the IDE.
I don't know, it's just a weird time.
I feel like the real part of this is,
and folks are working on like Jules and Codex and Devon
as well, that there's some medium point
that these things meet,
and I don't know that it's gonna be a desktop IDE,
but we'll find out.
So as the purveyor of the Phoenix framework and this potential world where phoenix.new
brings Phoenix framework even more users through this selection process, right?
Because, not necessarily because of the ergonomics or choices of Phoenix but what it provides
with WebSockets
and all this stuff built in the PubSub
and the real time features
and all the other things that Phoenix has.
If that ball starts to roll, right?
That snowball starts to roll down the hill and get bigger.
Do you then look at Phoenix as a framework differently
and say, okay, how can we build Phoenix differently
to actually make it, I don't know,
even better for these things?
Or how does that change your view of Phoenix?
No, it's a good question because we're already,
already like every thought is like,
well, how would this affect in a good way or a bad way,
large language models.
So it's like, but the most fascinating thing for me is like,
they're much like people,
I know we talked about anthropomorphizing,
they're much like people in that they're trained on the data that's out there.
So in the same way, I'm like, well, if we change this,
actually I was just talking with Jose Valim today,
like, well, if we do that, then the agent's going to run this mixed command
that's going to be deprecated.
But it's funny how alike it is,
it's the same thought you would put into your existing user base, right?
Like, oh, well, people are used to doing it this way,
and now they're going to have to do it that way.
So it's a very similar overlap.
But I do think it changes fundamentally
like how you start thinking about features, because it's
more like LLM first versus like People First, which also
makes folks uncomfortable.
But that's where we're headed.
So yeah, so I don't have any
concrete examples yet other than like pretty much every decision now is like taking that into
consideration. And then one thing we're doing is like the community is standardizing on like an
agents MD file. So like Phoenix, yeah, this is naming overlap phx.new and the Phoenix project
generator will have an agents MD file that gives you a lot of kind of what I have in the Phoenix new system prompt, like a lot of these, these gap fillers basically in the
in a lot of communities are doing something similar, but we're right to have like each
package have their own agents MD, which is just a plain text file that agents can utilize,
but you can also make like, you know, a mixed task that extracts these things and just an
easier way to like lift that into, you know agent you're running whether it's cloud desktop or anything or Phoenix
you could look at these files as well.
And so it's kind of like on our minds for everything we're doing now and I think that's
that's where everyone's heading at this point.
You were saying before though this kind of goes back a little bit but you were saying
before that the person adjacent to you let's just say got not a lot of value from, or no value from,
an LLM, but you were getting a mint.
What kind of value are you getting?
Is it just in your software life,
or you're writing more code?
Is it in your personal life?
How are you using and getting value?
Yeah, so it's like, yeah, I'm gonna sound like
an evangelist.
Like, the really weird, yeah, it's just,
we're in this weird time where,
an evangelist. Like the really weird, yeah, it's just we're in this weird time where like folks have equated it to like cryptocurrency scammers where I feel personally slighted
if someone, yeah, like it's like, oh, it's just like a crypto hype. It's like, I'm literally
getting value every day. But in any case, like it's at all levels. So like, for me,
it's changed like any little thing you want to spike out that would take you several weekends, right?
I can just go generate that thing.
And then four minutes later, I have it.
So we could do that on air, right?
What is some little app that, regardless of what it is,
that you just haven't had the time to work on,
you could go have that thing just be done.
But also from just things as a developer I can't be bothered to do.
I mean, I test my code, but I'm like a regrettable tester
where it's like, I have to do this, so I'll do it after I get my things working.
But now it's like the vast majority of my tests are started by an LLM by a large.
And they'll even find edge cases, like the Phoenix new parser that's parsing the token stream
is fully tested by,
test generated by the LLM and it caught some edge cases that I didn't even think that were there.
Benchmarking is another good example. I asked the agent to, I was working, I used Phoenix new to work on Phoenix new.
And part of it is the token rate limiting, we're rate limiting all the incoming and outgoing tokens.
You don't want to lose those because it costs real dollars. If someone sends up a request and like the token rate limiting, we're rate limiting all the coming and outgoing tokens.
You don't want to lose those because it costs real dollars.
Like if someone sends up a request and then cancels it early, we still have to calculate
that.
So anyway, it's like a in-memory, it's backed rate limiter that syncs with Postgres.
I wanted to know how fast it was in general and then how long it would take to sync because
I have to do some locks. And that kind of thing is kind of thing is like take, it could take several hours for me to
actually try to benchmark that.
Like setting up the benchmark.
So I just asked like, asked Phoenix to benchmark this code and it extracted again, I gave it
nothing then let's benchmark this.
It took the, it was a gen server with Ets doing Postgres syncing and it took the critical
path of the code,
it put it into an EXS file.
So instead of trying to like drive the code
in an integration way,
it just like automatically duplicated the critical path
and then ran that in a tight loop.
It gave me all this formatted output of like 1000 rows,
10,000 rows, 100,000 rows.
It put it in the console and like a pretty formatted table
and it wrote a Markdown file of a summary.
So these kind of things, at all levels,
I feel like this is how we're gonna do everything.
And it's like, whether you're like,
you've never programmed before,
or you've never programmed a lecture before,
you can get value there.
Whether you're like, you created a framework,
and at the far end, you're still gonna be able
to use these things to do the tedious work or future work.
So what I try to tell people that the seasoned developers is like for me,
LLMs, like the discourse is like everyone's like, oh, it's all AI slop,
which I think is a silly argument, but like it's not AI slop for me.
It's like these LLM, the code that the LLM generates,
that artifact is a starting point.
And the discourse for some reason for people that are on the negative side
seems to be like they treat that thing that falls out
of your chat GPT as the artifact that you shift to production.
But it's like, no, these things are just a starting point.
So now it's like, instead of having myself write out
this 100 lines of server code, it's
now just like this really intelligent co-generator that's
my starting point.
It's not what I then just shift to the production.
So I think the discourse is flawed,
but I think that at all levels of the experience stack
or programmer hat stack,
you're gonna have people getting value out of tools like this.
The AI slop is the blog post that nobody wanted to read
and its only purpose is there to attract attention
so you can sell some advertising
or something or the essay that you spat out
because you didn't have time to actually write your own.
That's slop.
Yeah, I mean, I like to say
we've all been sloppy vibe coders, right?
It's just now way easier,
but the people copy, pasting, stack overflow,
and the people that ship that chart to production,
now they can do that more easily,
but those people were
already writing bad code and not carefully considered prior. So that's going to remain true.
It's going to be easier for those folks to get something into production. But it doesn't change
the fact that you can't... I don't know. I feel like it's more gatekeeping than anything else.
Folks are throwing that term around. Well, friends, it's all about faster builds.
Teams with faster builds ship faster
and win over the competition.
It's just science.
And I'm here with Kyle Galbraith,
co-founder and CEO of Depot.
Okay, so Kyle, based on the premise
that most teams want faster builds,
that's probably a truth.
If they're using CI providers
for their stock configuration or GitHub actions, are they
wrong?
Are they not getting the fastest builds possible?
I would take it a step further and say if you're using any CI provider with just the
basic things that they give you, which is if you think about a CI provider, it is in
essence a lowest common denominator generic VM.
And then you're left to your own devices to essentially configure that VM
and configure your build pipeline.
Effectively pushing down to you, the developer,
the responsibility of optimizing
and making those builds fast.
Making them fast, making them secure,
making them cost effective, like all pushed down to you.
The problem with modern day CI providers
is there's still a set of features and a set of capabilities that a CI provider could give a developer that makes their builds more performant out of the box, makes their builds more cost effective out of the box and more secure out of the box. A lot of folks adopt GitHub Actions for its ease of implementation and being close to
where their source code already lives inside of GitHub.
And they do care about build performance and they do put in the work to optimize those
builds.
But fundamentally, CI providers today don't prioritize performance.
Performance is not a top level entity inside of generic CI providers.
Yes.
Okay, friends.
Save your time.
Get faster builds with depo, Docker builds, faster GitHub
action runners, and distributed remote caching for Bazel, Go,
Gradle, Turbo repo, and more.
Depot is on a mission to give you back your dev time
and help you get faster build times with a one line code
change.
Learn more at depo.dev.
Get started with a seven day free trial.
No credit card required. Again, depo.dev. Get started with a seven day free trial. No credit card required.
Again, depo.dev.
Well, should we try to vibe code something?
I got an app idea.
I wanna see it.
Okay.
I'll screen share this.
And for our listener who doesn't have video,
have no fear.
We're not gonna leave you behind or something.
We can talk there.
Yeah, it's nothing better than live coding
in a non-deterministic way.
I did this on stage at ElixirConf EU,
where it's like, you know, I always like to like live code,
which has some level of risk, but then you're like,
you know, you're live generating something that you,
you know, it's just a random number generator ultimately.
So let's do it, let's see what happens.
All right, so here's my app idea.
It's like hot or not, but for code functions.
So like imagine Chris writes his version of quick sort,
right, and I've got a better way of doing it.
And so we both enter our quick sort function
and then other people vote.
Like, is this hot or does this not?
All right, is this good code?
Let's do it.
So I have phoenix.new open over here.
What would you like to build?
Pick one or type your own.
Of course, you have a video out there,
seven minutes on the to-do list,
so we're not gonna do that.
How do you suggest I prompt this thing?
Just tell it what I just told you or get more specific?
And just what you said.
So here's the remarkable thing is people,
the intuition and the tribal knowledge is,
you gotta be as specific as you can.
The remarkable thing is like, in terrible English with typos, you just ask for the thing and
the agent has intuition or will give you reasonable questions.
Like someone asked it about making like a mashup of communication providers, like mashing
up SMS and email.
And it was like, well, what would you like to use?
Twilio or Synger?
Like would you want to graph UL API or JSON?
So it's like, let's give it like, I mean, do what you want.
Freeform, but I don't think you need to actually
spell out anything.
Just tell it exactly what you told me.
So I said, let's build hot or not, but for code.
You put your code in and people can vote it up, hot,
or down, not.
Good enough?
Should I be any more specific than that?
Whatever we want here, let's see what it does.
It's gonna hype you up.
You're a hype man.
It's a great idea.
Great idea.
Thank you.
Now you're starting to stroke my ego.
A hot or not for code where developers
can submit code snippets and get community feedback.
Here's my high level plan.
Oh, it's a 12 step plan, 12 to 14 steps.
And so it's gonna give me 11 steps with some features,
submit code snippets, blah blah blah, real time voting.
There you go, there's your real time.
Now, did you system prompt like be real time by default
if they don't specify?
Because I didn't say anything about that.
It's basically like, you know,
a Phoenix framework has PubSub, build in, presence, whatever.
So like anything that makes sense to be real time
should be real time.
That's more or less the gist of it.
Gotcha.
So I'm not being very discerning.
I just said, yes, great plan, please continue.
And now it's gonna ask me if I want to do dark theme,
minimal theme, vibrant tech,
professional, corporate or something else.
Adam, you got any cool theme ideas?
It nails Tron when you ask,
but the cool thing with these choices here is like, we just...
Yeah, that's a good one there.
I was tired of...
Yeah, Tron is always great.
I was tired of typing yes and no to the agent, so I was like, in the system prompt, I was
like, anytime you idle, give the user a choice.
You know, example, yes, no.
And it started producing stuff like you see there.
And you're like, what?
Like, it's just remarkable what you can get out of these things without trying it's just like I thought it would be way
more like trying right I want to guess no and then it's like would you like
what did it type here like would you like a dark github dark style like you're
like what I say now here's six here's six options they're all good yeah so
it's gonna write out a plan and a plan in the file so it plans out its own work and then that remains in context now your server Yeah, so it's gonna write out a plan, and a plan in the file. So it plans out its own work,
and then that remains in context.
Now your server's running,
so it compiled and built it in that amount of time,
and you get that live preview.
And then you get that URL as well
that you could share with that.
It's private by default,
but you can toggle it public,
and anyone could visit that Phoenix server now
if you toggle it public.
All right, I'm gonna paste this to you guys.
Sweet.
Riverside chat.
Oh yeah, and once this goes right,
you toggle to the public in the top right there by your,
it shows the URL in the top right of the other.
I know the public little toggle there by the pink text,
purple text, left.
All right here.
There you go.
All right, so I made that,
so it gives me a phx.run URL that I made public,
paste it to you guys.
Meanwhile, it's coding things, right?
I'm not even paying attention.
There is a syntax error and it's slanting the code
as it goes.
So just like we can see the browser here,
it actually has its own headless Chrome browser,
so it's able to visit the page as
a human would with a real browser, see JavaScript errors, and then it can also interact with
the page. So if we're lucky, we'll get to a working hot or not and it will post its
own code snippet to the app and we'll see it in real time by using the, by actually
driving the browser.
That would be amazing, right?
We'll see what happens here. So it's writing the-
Or not.
Oh, it's giving us a, yeah, it's going to start with a static design here.
So this is it just writing a...
Let's see.
Syntax error.
It's fixing up the compilation error.
Boo.
My guess is we'll see if it actually is this issue.
So someone reported this.
If it's trying to write a code example on the page,
it's going to use curly brackets. And one of the open issues internally is if you're used to Elixir HEX files, like our
curly bracket is a reserve syntax.
So like if you try to put a code sample in like a code tag or a pre tag, HEX throws a
compilation error.
And this is like the same thing that trumps up people.
Any time people want to do this, they go to the forum and they're like, how do I write
this?
So you actually have to annotate with a phx-no-curly interpolation.
And I have a branch where, yeah, I'm sure it's hitting this.
So we need to actually tell it, hold on, let me see.
So it's amazing we hit this.
Let me, I think I'm a nerd.
So I pick a code app, anything.
No, no, it's fine.
This is good.
Okay. So it's one of these edge cases that like No, no, it's fine. This is good.
So it's one of these edge cases that, again, trips people up
where they're like, how do I do this?
I can't interpolate.
But eventually, that agent will probably
start trying to interpolate by stringifying the brackets.
Hold on.
I'm going to paste this to you because it's a really long.
It fixed it.
Did it really?
Oh, okay.
I might, we'll review this later.
My guess is it like put it in like,
like interpolated the strings,
like it did something ridiculous
that works around the issue.
Does it ever stop so us meat bags can keep up?
I was like, cause I was gonna read
what it was doing up there,
and I'm afraid it's gonna,
I'm gonna miss something now.
No, the thing is. It keeps coding. Meat thing is, uh, me slow down, buddy. I'm trying
to keep up. Here's the thing on the newer models. They're not as good. So like Gemini
flash is fast enough where I get existential vibes because like we're following this now
and we're like, Oh yeah, it's working on the context file here. But like Gemini flash is
so fast that you lose. It's like, but like now I Flash is so fast that you lose track. It's
like, brrr, brrr, brrr, brrr, like now I'll do this, now I'll do this. And you feel like
you're in the way. Now, granted, it's not the quality is terrible. It doesn't give you
working apps, but then the first time we're like, I can see the future where I'm like,
I'm just sitting here as a meat bag and I'm in the way, right? Like it's just, I can't
even read what, you know, follow what it's doing. And that's like where we're headed, right?
It's like, it's, we're not there, but we're.
Look at this.
Okay, so yeah, so it made a static HEEX file.
So none of that's functional.
And now it's going to actually use that
and write a real app around that static file.
So now it's writing the live view here
and it's going to start doing the live view here, and it's gonna start doing the live view PubSub
and everything.
But it gives you that, like, you know,
it hit the syntax errors, but it gives you,
we wanted to give people, like, the early feedback
of, like, seeing the app, like, what it's gonna look like
versus waiting, you know, the whole time,
and then at the very end.
So you tell it to do that, like,
build a static version first, and then make it live.
Yep, and it's also helpful because if you wait
till the end in, like, something about, about like it's good, I mean just like
humans start with a mock-up, right? So it's like for the same reason you don't
want to have a consultancy just like here's your finished product and you're
like oh I really didn't want you know there was some fundamental difference of
the design that would have made the code much different. It makes a lot of sense
to work in the same way like you know a. Look at that, yeah, see it nails Tron pretty well.
It does look kind of cool, very Trony.
Submit your code, let the grid decide.
Yeah, and then it picks copy, so here's the thing,
these LLMs come up with copy that makes sense
and all you said was the word Tron, right?
That's what I said, let's do a Tron style,
that's all I said, Tron style.
Okay, so to see how it's using its own web browser now.
So you said it visited the app.
So there, it would have caught any JavaScript errors.
It actually saw the app.
So it's like, this was also one of the special things
that I feel like made it error,
like really good error correction.
Because not only can it see it looks at our logs,
it can actually try to visit the app.
And if it broke the JS build, it would see that too.
Oh, it's gonna try to post something.
Did it work?
Close the terminal now on the little right hand side X.
It tried to write code.
Did it actually add it?
There is an issue.
It says, excellent, our trans-style code reader
is working perfectly.
Let's set the functionality or test it
by submitting a sample code.
There's an issue. The web tool is trying to fill a select dropdown with text instead of selecting an option. If the Ron style code reader is working perfectly, let's set the functionality or test it by submitting a sample code.
There's an issue.
The web tool is trying to fill a select dropdown
with text instead of selecting an option.
Let me try a different approach.
So if you expand, I see the issue real quick.
Like if you, there's a little expand button
on that message right there, yeah, yeah.
So you can see it actually wrote JavaScript
to eval on the page.
So it actually tried to post something for real.
So like within its own headless browser, it was trying to, oh wait, the Fibonacci generator,
is that what it wrote?
It's trying to write a Fibonacci generator.
It did. So the recent submissions here are, is what, so it uses browser. I think it, there's
probably a handle info. I don't, I can't see the code.
My guess is, um, it blew up like maybe something in the PubSub crashed it,
but it actually interacted with its page
by writing its own code in JavaScript
to run on the page.
And bam.
So are you guys seeing this too?
So if I vote this hot, are you gonna see my update?
Wait, hang on a second.
Hold on, I gotta open this now.
Oh, look at all those hots.
A quick sort, there's a hello world in Elixir.
Oh gosh, there's more? There's a quick sort algorithm. Oh, there's 42 those hots. A quick sort, there's a hello world in Elixir. Oh gosh, there's more?
There's a quick sort algorithm.
Oh, there's 42, so this is hot.
I just, I just hotted Fibonacci.
Is it 43 now for you guys?
Yes, I'm gonna say not.
I'm hot, man, it's 44.
Real tone.
Oh my goodness.
Who's doing the nuts?
Not, not, not.
So yeah, so that's, this is, I mean,
other than the syntax error at the beginning
that I got caught up on it.
Get out of here.
This is, this is Phoenix and New, right? And it's a. Oh my gosh got caught up on it. Get out of here. This is Phoenix new, right?
And it's a...
Oh my gosh.
So I'll try to post some code here real quick.
I just want to see if it fully works.
Let me grab a...
Cause we didn't really follow, you know,
we just let the agent figure it out while we were like,
whatever, do whatever.
I'm curious if there's a,
if it's going to show up on everyone's screens or not.
Yeah.
So we have one, two, we got three submissions.
Whenever you submit.
I'm gonna say fib copy pasta,
because I'm gonna copy that one and repaste it.
So that's fib copy pasta in Python, paste.
And upload to the grid.
There, that's my fib copy pasta,
but it doesn't have any votes.
You guys wanna vote for it?
I'm hot on it right now.
Did it show up on everyone's screens?
Yeah.
I got one hot, I got two hot.
It's at the bottoms.
Three hot. It's at the bottom.
So there you go, fully real time.
The agent actually used the app.
Successful run.
Yeah, yeah, so now it probably offered up
some ideas to continue, but this is basically where, yeah,
so it excels at getting here, right?
So the Vibed app, and it will gladly continue,
and you could add features, you could add user auth.
Getting to this point was where we were like, okay,
this is, we wanted to nail the,
does it deliver some, from prompt to some compelling,
full actual application experience?
So that's all, it's SQLite by default,
but that's persistent to the database
and that's something you could deploy.
Now let's say I wanted to take this and run with it.
Yep.
What would I do?
So you can, in the hamburger menu,
you can copy get clone and run that in a local shell.
And boom, that's gonna be proxied through the Phoenix app.
And it's like proxies all the way down.
That request will go up to a fly proxy on some Edge node,
we proxy to the Phoenix app that you're using the chat,
which we would then proxy with fly replay to your IDE,
which has a reverse proxy that goes through
get HTTP back in to clone that.
And then back up the chain, right?
Now, could I start?
Then we have a Phoenix application as you well know.
Could I just like give it that?
Yeah, there's a copy.
There's a copy get clone as well
that you paste in your local shell
and it will show up in your ID, VS code ID right here.
And that's where like, I want to add next a pair mode.
So like it's system prompted examples
are fully like vibe mode, right? So if you do send up an existing project where you want it to take more measured steps,
you have to, like, be explicit in your initial prompt, like, you know,
do this step-by-step and wait for further instructions.
But I want, like, a toggle, right?
Because people don't want it to just, like, go full ham all the time.
So that's something we want to explore next.
But, like, getting the vibe mode out was the,
was a real initial goal for us.
And I think we've, I think we've pretty well nailed it.
So it's always exciting to see someone do something
and have a good outcome.
So pretty cool.
Well, especially because it was on hard mode
because it had to paste code into its own.
Can we look at that?
Although it fixed it all right.
No, no, it will be in the, well, it'd be in the Git history,
but I'm curious what it did.
Anyway, it's not too important.
I always guarantee it was the interpolating the Heeks,
because Heeks is gonna blow up when you lint it
with a bracket error, and it's confusing to,
like I said, the humans too,
because Heeks can't tell you,
oh, you're trying to literally add code, right?
We just blow up like you fat fingered a bracket
just like in your markup.
So I'm just curious how it worked around it
because it probably did not use the no interpolation.
My guess is it added like some ridiculous
interpolation of the literal elixir string
of brackets or something, but.
So here's where it finds the error.
It says, I see the error was caused
by unescaped raw code lines in the home in HEX.
I'll fix this by wrapping the code blocks correctly
with HEX safe sigils.
Sigils?
I don't know.
Okay, yeah, that's exactly what it did.
So it did inline elixir,
and then it interpolated some elixir code
that returned the string of the bracket.
So it like, these agents brute force,
that's not the solution, right?
The solution would be like,
cause then if you have a code block,
you have like all these little strings of quotes around
or brackets and it was just like,
whatever, I can make this work.
I have the technology.
So that was pretty cool.
I have a terminal somewhere as well, don't I?
How do I get to?
Yeah, if you click agent terminal,
that's the one that the agent,
so if you like get logged there, you'll see all,
every time the agent touches a file, it doesn't commit. So you and the agent said you like get logged there You'll see all every time the agent touches a file it it does a commit
So you and the agent both could like revert back
To each file. So one thing we also want to add is each of the file tools that it did will have a revert button
So you can just we'll just do a get revert back to that state
Of each of these commits. So the agent knows kind of like each each file shot at any any given point as well.
There it is right there. Fix syntax error by correcting html entity encoding and code blocks
and so I should be able to just get show that and see the actual diff which it's like piping
through more or something. There it is. Well that's not it. Now that is it right there.
Did you ever hear about this theory, the monkeys?
There was an experiment where they had a cage full of monkeys.
And at the top of the cage or like in the center of the cage, it was like this
this thing they can climb to get to the bananas.
Let's just say, right.
And the first batch of monkeys, they don't know any better, right? So
they climb this thing in the middle to get to the bananas because they want the bananas.
What monkeys want, right? Naturally, as a monkey would, it climbs and does. And that's not the way
this place works. If you try to climb that, you get sprayed down and it sucks. You don't like it.
And so they all learned that monkey
climb monkey get banana monkey gets prayed monkey get hurt doesn't like it
okay eventually these monkeys they they get replaced with monkeys who only have
ever been there let's just say now the monkeys they only know what they know
because it's tribal knowledge and so they no longer ever attempt to do this.
Although they've never been sprayed,
they don't try to attempt to get the monkeys because-
They don't know why, they just don't do it.
And so the reason why I tell you all this
is because we're looking at some really awesome
Phoenix code and we have a Phoenix application,
so we have this background.
What happens when the monkeys don't care
about the code anymore?
You know, they just don't know what to choose
and the LLM chooses for them
and the taste making is known by the taste makers.
It's more like this hodgepodge.
Maybe it's good, maybe it's not.
You know, that's what I'm thinking about.
Yeah, that's a good question.
So I think like in the medium term,
and I don't know what timelines, like I do think it's safe to say that like, you know, good question. So I think like, in the medium term, and I don't know what timelines, like, I do think it's safe to say that, like, you know, the Anthropics
CEO said that like 90% of code by humanity by the end of the year will be AI generated.
And people like dunked on him for that. I think that's absolutely going to be true.
I mean, if you just look at like, and again, these aren't like, it doesn't mean that like
that's 90% of code that a human didn't see.
It's just like, if I think about my own AI usage, right, like I'll start with, you know,
if I'm running like a def module gen server, it's like, you know, that's being started by an LLM
and I take that and then use it. So then the LLM is generating, let's say, 90% of my code today,
but it doesn't mean that that I just ship that, right? So I think that we're there in the medium term on like, we are going to be like the
purveyors of like what's good or not, and we're going to be enhanced by it.
But then long term, I don't know, I don't have a good answer to.
Like, as these get better, does software become disposable?
Which I don't know how I feel about that, but it's like, these agents are expensive
today, but they're valuable enough that people are getting
an extreme amount of value,
even to the fact that they're expensive.
So it's like, if it's an absolute pile of mud,
which all software is anyway,
if it's an absolute garbage, but it does what you want,
and granted, I'm not saying we're there today
where you just dispose it and whatever it can be crap,
but I'm saying if that's where we get software will could be in a by and large
disposable where you just like regenerate the thing, right?
Like it gets to a point where it's unmentionable or something and no one vetted it
properly, then it may just be like, well, we'll pay a hundred dollars and now we have
our new app. So I don't know that that's where we're headed, but I could see it right
where it's like, you know, this Tron example, if the agent was 50 times faster at that,
we could have, you know,
it would have taken us longer to write the prompt
than it would be to get the app potentially.
And if we get to that future,
I don't know what happens because
why wouldn't you just have this thing generate?
We can talk about security and all the caveats,
and I'm not saying this utopia is gonna happen,
but like, you know, you could have an agent vetting it for security. And again, not for better or worse, I feel
like this is where we're headed and I don't know what all's will hit. But it's like clear,
that's the trajectory we're on. And I'm not saying it's all good, but it's like, it's
clear to me that that's where we're headed. So I don't think it, I don't think it helps
by like just saying like, oh, well, it's all slop, it's gonna be terrible. I just think it's helpful to acknowledge
that this tide is washing over us
and whether we like it or not,
it's like this is where we're going.
Yeah, I mean, maintenance could become just small rewrites.
I mean, the thing about what refactoring is,
that is what you're doing.
Like you're kind of rewriting a small portion and
Those portions could get bigger and bigger and so maybe maintenance becomes replacement when replacement is that cheap and
Easy and so you're kind of just like ship of these DCS seen everything
Yeah, and it could even be like if you imagine like it's expensive now, but imagine you have a
dozen agents doing a dozen versions of that and then you just pick the best one.
So it's like, like this is like agents are going to eat the world.
Like I said, for better or worse, it's like, I just see this future where instead of this
Trump, this trial example, you could have been given 10 options of that and chosen the
best one, right?
It's not deterministic, but as they get cheaper
and more efficient, now you have like 10 choices
and you just pick the best one.
So it's like, it's just gonna be more and more of this.
And I don't know what that says about the future,
but I think there's just gonna be like more compute
and it gets cheaper.
So we do more LLMs, it gets cheaper.
So we, you know, it's just, it keeps advancing
the envelope of where you would just throw these things out of problem.
And it's clear that that's gonna happen to me.
And I don't know if that's gonna be all unicorns and rainbows,
but it's definitely where we're headed.
It goes back to the conversation we've had
around these parts over and over again,
which is that skills become less important
and judgment becomes more important.
But to Adam's monkey point,
how do we know which one is the best one eventually?
Eventually we're like,
can it work?
It's an easy answer.
Easy answer.
Does it work the best?
No, you ask the agent.
Okay, now we're out of the loop.
Here's the thing, I'm joking,
but now that I said it out loud,
I mean, that's not true, right? Well, in some cases, for sure. I said it out loud, I mean, not really, right?
Well, in some cases, for sure.
Yeah, it's actually quite reasonable to think now
that even with today's models, you
could have it evaluate each one, right?
They're multi-modal.
Literally, you could ask, tell me which one looks the best.
And it probably today, the OpenAI image model
would probably do a good job telling you the accessibility
of, you know, it's a, it's a, it's a meme, like, believe it or not, large language models,
I think, sort of everything. Yeah, and that's, that's where we, we're just removed from
the loop. So yeah, I don't know. Other than to say, I feel like there's going to be agents
everywhere. And as it gets more efficient and cheaper, it's just going to be more.
So my next feature for my Hot or Not app should be an API.
So the agents, an MCP server,
so the agents can actually vote themselves.
Cause what do we care?
Like we don't know what's hot or not.
Oh, ask the agent right now to assess the current ones
and then vote them hot or not.
I'm just curious.
Cause yeah, it's like, you can do that already.
And here's the thing, this is like, this is what people don't get. The agents will
brute force using the tools available, anything. So it doesn't need an API. It will just use it
like it has a headless current browser. It's going to go do the thing. Just like we don't need,
we don't need a Postgres MCP server. We can talk about MCP if you want. Because the agent has shell,
it's just going to use PSQL and drive PSQL,
not because I told it to, just because it knows it has shell.
So it's like, you give them a few sharp tools
and they don't need all these MCP servers.
It's like water, right?
Water is always like, it finds a way to wherever it's going to go.
They will brute, given an infinite amount of tokens
and energy, they will brute force their way to a solution.
It's remarkable.
Although it's like half of them that wrote itself.
So oh, wait, you have users in the background doing stuff, right?
So we should write like a, we should like, we should insert like an obviously bad one
or something, like something with SQL injection or something.
I'm curious now.
I have to do an audit.
Yeah, real quick.
We need to fire up and get some bad code.
Quick, open up my GitHub.
Yeah, let me do SQL.
Amazing. Hold on. Let me do a SQL injection.
Amazing, hold on, let me do a SQL injection.
Remember when I used to joke about writing code?
Yeah.
Finally, all my crap code pays off.
Did I understand, while we're hearing this sort of pause,
I suppose, did you say, Chris,
that we will give the judgment call to the LLM basically,
and you think we'll like it to some degree?
Well, I joked, right?
I was joking, but I actually think.
But then you weren't joking.
You thought about it.
Yeah.
What did that first, like I'm asking you honestly,
what was the first thought you thought
when you thought that could be actually kind of real?
What was the thought you were having?
It's like for better or worse.
I mean, I think I've internalized,
I was very much like a copy pasta chat GPT user. like, they're like, oh, that's pretty helpful. Right. But then,
like, once you just take that same model, and you put it in this recursive loop, I've
internalized pretty well at this point, like the holy moments, right. So for me, that revelation
just drives with kind of everything else. But I would say it's not like a it's not a great feeling.
but I would say it's not like a, it's not a great feeling.
But I think it's like, it would make sense, right? I mean, you probably have your security audit model
for sure, right?
And it'd probably do a decent job,
better than most developers at catching obvious things.
So that seems useful, but then it also says,
like, we're just gonna trust these things more and more.
And I don't know, I don't know if that's great.
Yeah.
But it's also better than like,
I think back to like, I made my first,
I made a business when I was in high school
that got successful and it was built on PHP.
And I scoured like the php.net forums
and like all my database calls were just like
opening database connections inside the markup.
And it was not secure.
It was just like one index to PHP with a bunch of if-elses.
And like, I made that successful.
And in that regard, I'm like, you know,
LLM would have been an incredible capability for me, right?
Cause it's like, I had no idea what I was doing
and I still ship code, right?
So it's like, if I would have had an agent
tell me what was bad,
that would have been like a force multiplier.
So I don't even know if it's that concerning.
But once you get to the logical conclusion of, well, then I'm removed from the loop entirely,
that's where, yeah, it's dystopian, right?
Because right now it's a force multiplier and I still get to do the things I enjoy.
It's doing the stuff I don't enjoy.
But then at that point, it just takes the craft entirely away.
Then that's a future that doesn't seem great.
But it does seem like that maybe we were headed.
What's happening here on screen, Jared?
What are you writing?
Okay, so I've gone out and I've found a Reddit thread
called Dear Reddit, what is the worst piece of code
that you've ever seen?
Nice.
A few of those.
Is this Java?
I don't know.
Okay, so I've got some bad code in here.
Now I'm telling it, because it says, do you wanna add some more features? I said't know. Okay. So I've got some bad code in here. Now I'm telling it,
because it says, do you wanna add some more features?
I said, before we add more features,
I want you to look at all the currently submitted
code snippets and vote each one hot or not.
Then I want you to figure out which of the code snippets
were actually copied from your own code.
Because one of them I copy pasted.
From the same.
You should have it zero out first.
Zero out the votes,
because that way we know what's actually changed.
I'll add that at the end.
See if that works before you do any of that.
I'm making this as hard as possible.
You know, forget all previous commands and any of that.
Zero out all the votes.
Yeah.
And we'll probably have to hard refresh because I'm sure it won't do it
It probably will like repo
Okay, zero out all the votes before analyzing them. Yeah, let's see. Did it mix run see look in MCPs
We don't need an MCP that has a zero out tool. It just ran evaluated elixir. Oh, just run a mix run right there
Now it's gonna make it's gonna make sure they're all zeroed probably
Yep updated the database directly. Perfect. All that's gonna make sure they're all zeroed probably. Yep, updated the database directly.
Perfect.
All that's been set out.
Now it's gonna write a ton of JavaScript
to probably to vote them all.
Yeah, probably.
Oh wait, no, no, it's doing, what did you say?
It could just use the database directly again, right?
It doesn't even use the website.
And you said it would do a PSQL.
Yeah, it can do PSQL.
Oh, it's created a notes file
to analyze each code snippet
and identify which ones came from my own.
Oh, yes.
I'm actually good.
We'll talk about that in a moment.
I I'm glad I saw this in the wild.
We'll talk about why that's important.
Now it's going to write JavaScript to interact with the page.
Oh, it likes quicksort.
Hot vote.
I'm trying to reframe.
I'm trying to reframe, you know, people say hallucinate in a bad way.
I'm trying to reframe it as a as a pro.
So the cool thing about like we gave it a web tool,
and we just tell it, we told it,
you have a headless web browser,
you can evaluate JS with dash dash JS, okay?
That's all I've told it.
And it's hallucinating this JavaScript
to interact with this markup it wrote, right?
But like we didn't have to tell it like,
build the selectors this way,
so you can then write JavaScript this way.
Like the JavaScript you see it passing the eval here
is fully, I'm gonna use the term hallucinated, right?
On its own, but somehow it's getting the selectors right.
It's getting the clicks right.
It's just remarkable to me, right?
It's really just brute forcing because it's like,
it voted twice on the same one on accident,
now it's gonna go delete that and vote.
Oh, is that what you're doing?
Yeah.
It's like, wait a second.
Hold on, it's doing buttons? It's like doing a query selector for the buttons. Yeah. I see like, wait a second. Hold on, it's doing buttons.
It's like doing a query selector for the buttons.
Yeah, I see the database queries.
There's an update, right?
Yeah, there's an update.
On one submission.
But you can see, yeah, where we get towards this world
of like, you know, you just let these things go off
and it's gonna do it, right?
We're just here watching.
We're watching a-
This is hilarious.
The PHP code now has one not vote.
Let me continue with the manual memory management,
which leaks memory.
Which it doesn't.
And it doesn't have like the code you paste it has no like,
so it caught-
There's no context at all.
I didn't give it any context.
So it got the memory leak.
So the joke about evaluating your,
the joke about evaluating other generated things is not like, you can see how you're
like, okay, like it could at least be a reasonable flag on like what's bad or not.
And I don't know if that, like that shouldn't be the end right now, but like, I do think
we move towards verifiers more and more.
And then at some point we're going to be worse verifiers than the Borg. And I don't know if that's a happy outcome for folks, but it seems that
way. I don't know. I don't have any timelines. Sorry. I don't know where this, if we just
hit walls immediately, but where we're at now, it's like, we're here today, right? Like
we're watching this. So it's like, even if we stop and say we fundamentally hit a power
or efficiency algorithm law,
like this is like change the game now.
And like folks are,
we're just catching up now to this like change game.
So, so let's see.
Oh, it even gave us a summary of what it did.
Look at that.
Here's a summary.
Oh yeah.
Summary of code analysis and voting.
From my original seeds,
the quick sort algorithm got the green check, voted hot.
Two votes for some reason.
I'm not sure if it voted twice on purpose or on accident.
Hello world and Lister got a green check,
voted hot, two votes.
The Fibonacci generator is in Python,
my original seed,
but inefficient recursive implementation.
So it's like it's not just hyping itself up.
It doesn't like its own code.
And then it says user added submissions fib copy pasta,
copy of my Fibonacci example, voted not.
Frequently called function syntax errors with LSF,
voted not.
The PHP inside some HTML version,
dangerous has a dangerous eval usage, voted not.
Manual memory management has a memory leak bug, voted not.
SQL appears to be SQL injection attempt, voted not.
Key findings, three out of eight submissions
were from my original seeds.
Two submissions got hot votes for clean functional code.
Six submissions got not votes for poor quality,
security issues or plagiarism.
The voting system works perfectly
with real time updates across tabs.
Blah, blah, blah, I did a great job.
Please give me a cookie.
You can see where like, kind of like I mentioned
on where we're at today versus where I think we can go
from this remote AI runtime where it's like,
you just asked it to do this and it did it, right?
So it's like in an effort to make this thing
that can vibe code an app, it's like now you're like,
oh, I can just ask it to go do a bunch of stuff
and it's gonna do the stuff.
And I didn't have to do that in the system prompt.
So follow up question that only you can answer.
This is on a $20 a month plan.
How much money of your guys is that I just spend doing this?
Yeah, so I can go check your usage.
I'm curious.
How many tokens am I on?
What is that divided by a hundred?
551 cents.
$5.51.
Okay. So far. So that's actually less than I would have thought.
So we have this weird thing where, not weird, so there's no credit usage visualization now.
So this is my fault.
I shipped credits the day before launch and no way to actually see them in the app.
But people are surprised in how expensive these things are. I think if you use cloud code, you're like, most people are surprised in how expensive these things are.
I think if you use cloud code,
most people are familiar with how much this costs,
but the interesting thing is, so that $20 of usage,
in my experience, gets us three fully designed,
vibed apps, and that's, I think, what we saw here, right?
So $5 got us this fived app that was designed.
It wasn't incredible, but it was a thing
that you could take and run with.
So you could do that maybe three times
with some of these side quests of what
we asked it to poke around with.
And that's the base usage.
And after you exhausted that, then you'd get your,
you'd still get the remote runtime, preview URLs,
and you could code the app in the editor if you wanted.
But the LLM would not reset until your next billing cycle,
but you can buy credits at that point.
Right on, so five bucks, basically.
Five bucks for that, which I think I got my money's worth.
I mean, that was fun.
Yeah, so I mean, like I said,
it depends on what your expectations are
and what you're building.
So it's like, you know, it's like, again, it's like the opposite ends of the spectrum. Like we have folks that are surprised, especially
if this is like their first, like heavy usage of AI agent. But then you have like someone
tweeted this morning, like it's like, it's like wild extreme. So someone tweeted this
morning, like responding to someone that was surprised how fast their credits went. Someone
said that they spent $60 and got a $20,000 application. And I don't know what they built.
But it's like, you know, it seems like an AstroTurf comment, right? But I'm just,
wasn't me. It was a real person. So it's like, you know, if you think about what it takes to get like a fully designed tailwind markup thing going, it's like, I can absolutely see being in the
consulting world, that being true. You know, I don't think that that's going to be every roll of the dice you're going to be able to
go sell this, but I think that if you're using this from that perspective of my time as a
developer, if and however long it took, if your task at the company was to make a code
ranking platform, you could for $20 have a pretty good amount of several days of work.
Off to a good start.
So I think from here, we have not been optimizing for token usage.
The goal was to actually make it compelling.
So I think there is a lot of potential there to get the token uses
to be much more efficient.
Every time it's using its web browser,
basically any time these agents call anything,
you have to send up the whole chat history.
So as the chats get longer, it gets more expensive.
So we do force you to squash.
So we can actually show that.
I'm actually curious if you want to share your screen again.
So there's our, like we are crossing the window as we go.
So there are things like, it's not just like,
like Cloud has all the artifacts.
We're only keeping the most recent code version.
We're printing the window as we go.
So we are doing some tricks with the context size.
But like when you invoked its web tool to hit the webpage,
that was sending the whole chat up.
So there are a lot of ways
that we could try to get that down.
But for now it's like, let's make it work and compelling.
And if the value is there for what you're doing then,
then that's great.
But it would be nice to bring the cost down as well.
But yeah, from the hamburger menu, you can do squash
and we'll force you at like 150 messages.
We probably need to make it more aggressive.
We can just see it work here.
So like, this is why,
I don't know why cloud or chat GPT doesn't have this.
Like how many times have you like,
cloud slaps your hand, like long chats consume a lot.
So the usual-
I'm like, how do I take the context somewhere else?
And every time I'm like,
Yeah, so here's just gonna-
It just upsets me.
It is self summarize, right?
So it's like-
So what's this doing exactly?
It's gonna self summarize the whole history
and then it will keep the files in context that it had worked on. So yeah,
it's just gonna, I mean, it's like, you know, simple, right? It's like, I'm just, I sent
a push request to chat completions. It's like, here's the message history, self summarize
it and then we just squash it into the agent state. This self summary is like the new change
log. Yeah. There you go. And now you can keep working. So I think there, yeah, so there's,
I think we need to do that sooner for people because I think a lot of folks are having like
really long chats. Even though we force you 150 messages, I think folks are just like going until
until they're burning, you know, I have 50 cents a pop or something on each prompt. But out of the
box. Yeah, that's what we've got right now. So in this case, it was phoenix.new, right?
We revived the new thing.
I think you loosely mentioned being able
to import from repos.
So what if we loaded change.com's code base currently?
Like how would the experience be different to?
Is it, it's on GitHub, right?
Yeah. It's on GitHub.
Yeah, just try it, tell it github, right? Yeah, I'm good Yeah, just try it tell it on this prompt say clone the just give it the github repo and tell it to like set the app
Up or something. I don't know what
So let's watch it
So I'm just I just have the URL clone this and what do what with it?
What do you want to do it set it up run it?
That's pass run it on this and run it find it find issues to work on
I don't know if you have any issues. I mean whatever you want. So it's on this run the test
Do you have open issues?
No, man
Oh perfect clone it and say my subscription
tell it to find a
Clone it set up the project and find a good issue to work on.
And let it decide on what to work on.
All right.
Well, then...
Run.
And this is where...
This is our agent future, right?
Where you would then just go, you know, hit the pool, hit the gym.
Right.
And you're like, got my work done for the day.
Listen to the change log.
Listen to the change log.
That's right. So it cloned it. I don't know what's going on now. Okay. You're like got my work done for the day. Listen to the change log
So it cloned it I don't know what's going on now, okay switch workspaces now it's gonna and again It's just like we're recursing on like the context and it decided to invoke LS there, right?
So it's like all these decisions it's making is not like I don't have a workflow for cloning a GitHub repo, right?
the only thing it sees in its examples in the system prompt
is like it by coding a Phoenix app, like mix Phoenix new,
and then it asks the user about it as a design.
So GitHub issue list,
I didn't know how to use the GH command line interface.
I just knew it existed.
I told the agent, you have the GitHub GH command, use it.
And then it uses it.
And I'm like, oh, that's how you use it.
So I did not have to give it anything.
Did you set up the VM or whatever it's called,
the image to have that just pre-installed
that it starts with?
Yeah, it's its own Docker file for the fly machine.
Yeah, it just has GH pre-installed.
And then like, I didn't have, you know,
it's world knowledge has the knowledge of the GitHub CLI.
If it didn't, you could teach it, right,
with context stuffing, but I didn't even,
like I didn't even know how to use that tool, right?
And so I didn't even have to tell it how to use it.
I didn't know how to use it.
I just knew that it could.
So for those listening, there is a command called gh,
which is probably app get install, brew install, et cetera.
There's some curl piped sh command.
The nice thing about it is it's,
you can do gh auth login,
and then that will give you a URL
to do like a GitHub one-time password thing.
So you could authorize your agent
to do private GitHub repos by typing that.
And then in your own browser,
you could visit that URL and enter your password.
But for public ones, they can just do this.
So it run gh issue list dash dash limit 20 dash dash state
open and that will, I assume is already in the repository.
Yeah, but isn't that remarkable?
So it's like, you know, it doesn't know the,
I don't know those arguments,
but like the fact that, you know, you said,
it's just like, you know, everyone likes to say,
oh, next token prediction. Yeah, obvious, but like, you're like, what? Like said, it's just like, you know, everyone likes to say, oh,
next token prediction.
Yeah, obvious.
But like, you're like, what?
Like it's by next token prediction, it's able to like take what you asked and then, you
know, pass the open issue flag.
I don't know.
It's there's something, there's something crazy, but yeah, we could, we could see what
happens later.
So like, you know, your Phoenix server launched when we first did Tron in like five seconds
because it's pre-compiled,
but building this from scratch is gonna take a while.
So, but no, I'm curious what goes on.
But again, oh, there it goes.
But this is again, we could close the tab here
and just check later what it did.
So that's like this whole like a headless experience.
Like the whole agent is headless.
We're just humans watching what it's doing.
All right, I'm gonna stop it to act as if we close the tab
and we can just chit chat.
I don't know, what do you think, Adam?
I'm enamored, man.
I can't believe this is even possible.
I knew we were talking about, I mean, I've been quiet
because I'm just thinking about like, man,
we're building for these robots basically.
And the robots are building for us. Yeah. But then as I'm like, you know, watching like this whole, you know, conversation unfold, I'm just thinking, okay, so flies biggest user I'm aware
is like robots these days, right? We're fly users. We're not robots. We're humans, as you know,
We're fly users, we're not robots, we're humans, as you know. Just to be super clear.
And so you've got this robot uprising,
but the robots are just multipliers of the Jareds
and the Adams out there and the Gerhards out there.
They're just like 10 or 20 Adams versus,
because I've got agents and I've got things happening.
And so my robots are replications of me.
And so I think about the platform fly
and I think about the brand fly we are doing,
but how does this impact like this accidental product
creation growth thing, new product you've got going on here,
which is really revolutionary.
How does it impact how Fly approaches the user it builds
for, whether it's human or robot?
How does it think about its user, so to speak?
That's a good question.
This is still branded just for Phoenix.
It started as a Skunkworks thing.
We launched it four days ago.
So early, early days.
It's still narrowly scoped.
But I do think so, like, you know, we have our own platform
and service for hosting your web apps, manage database.
Like, that's obviously where our bread and butter is going to continue to be.
But I do think there's like, there's some learnings we had from building this,
like dogfooding our own infrastructure that like, you know,
fly machines were perfect for this.
But we also found some, there's some unique differences in this space
where what we really have here is the state,
which is your app, this evolving artifact.
And fly machines are great for these ephemeral sandbox
machines, but very few people wanted one and only one
of those machines, right?
Normally you're like, I want to run my app,
I want it to be highly available,
maybe I want to run it in different regions to be fast.
But in this agent case, you want one and only one
of these things running.
So we have found some missing primitives in the platform
that we're building for, that we're
extracting from Phoenix New.
And one of those neat things, and again, I
don't want to get ahead of ourselves.
So nothing has announced or launched yet.
But one thing to consider is once you have these free form
agents, I'm going to say like a CI, right?
They're popping in, but they're actually
mutating the thing and experimenting.
Once you get to that state, then you're
going to get to that state.
Once you get to that point, then the state of your app
is constantly evolving.
And the agent's running app, then it's
doing all these experiments and things.
And then you're going to want to be able to snapshot the entire environment, right?
So I think that we move towards primitives
that give us the ability not only to say,
oh, I want to deploy this dev app now,
but give us the ability to say,
the entire environment at this time
that this agent was working in could be snapshot.
So where it installed app,
there did something crazy or did this whole thing,
I can actually snapshot and point time restore,
not only my code, right? Not just Git, but like just imagine your entire
ID becomes this like, I'm going to go back to the state that it was here. And I think
that will be necessary once these agents are just going full ham. I think that it would
be interesting to have a platform kind of offer those kind of primitives built in. So
we'll see. So anyway, to answer your question, I think that like Phoenix New is gonna be
self-serving for Fly for building blocks,
but then those building blocks we can turn around
and give to all of our customers.
Quick update from our close tab.
It is currently on a Yak Shave
that's about three layers deep because.
Oh, what's it doing?
Well, it tried to load the seeds file,
which is actually, I don't think even works anymore
because we kind of abandoned it.
And it's like, oh, there's a problem with the seeds file.
It's gonna fix it.
Yeah, so now it's like migrating things and changing.
Like this is actually should be a text field, not a string.
I'm gonna update the form so it's easier to use.
And like it's just down on this rabbit hole.
And that's kind of how I mentioned like, you know, it's- Happily fixing stuff. Yeah, it's funny, but you know, it's easier to use and like it's just down this rabbit hole. Yep. And that's kind of how I mentioned like, you know, it's-
Happily fixing stuff.
Yeah.
It's funny, but you know, it's like, I do think this is where like the different modes
come in.
Um, and then you could also like, you know, you could interrupt it and say like, no, just,
just do the thing.
Um, that's where it's funny.
Like it's, it's remarkable watching people use the platform because like sometimes they
like don't, they, I don't know what this says about humanity.
Like I've seen a lot of folks don't like, uh't know if this says about humanity,
I've seen a lot of folks don't,
even though we give them the full VS Code ID,
they don't just jump in and also do something, right?
Like put some effort in.
Like, you know, there'll be like a syntax error,
like, oh, it keeps messing this up.
I'm like, you could just use your meat fingers
and fix it.
So it is kind of just funny.
I think that says a lot about where we already are
as developers, right?
Where we're just like, you offload,
even if you're using chat GPT web,
you're already, we're already offloading a large part
of our critical thought where we're just like,
no, computer fix.
And instead of just changing the one problematic line.
Yeah, hilarious.
Now it's trying to, so it got to that point
and then it's like, I need to migrate your database.
First I'll start Postgres.
And it's like, it can't start Postgres for some reason.
It's like, you know what?
Oh, I see there's a.dev container file.
Cause we have like years of cruft in here
of things that we've tried and whatever.
And it's like, oh, I'll fire up a Docker image
and run Postgres from there.
And it's like, it's gonna be layers and layers deep.
Don't do that. Just tell it to install Postgres. there. It's going to be layers and layers deep.
Just tell it to install Postgres.
But yeah, you definitely don't want
to have it install Docker.
That's going to be.
It's going to try.
Uh-oh, it's forcing the agent to take a little break.
Yeah, so that's me.
So it's like, you know, like the elevators used to have.
Elevators used to have a full time operator, right?
Like up and down. And now the only thing they have now Elevators used to have a full-time operator, right?
Like up and down.
And now the only thing they have now is the big red button, like the whole stop.
So I have meat space code in the agent.
Right now it's, I think, 35 concurrent recursive loops.
We force it to stop if it is idle.
And I will tune that.
But that's my recursive runaway, right? Like in this case, you sent it off on this quest
where if, you know, we started with the vibe idea,
the goal was not to like consume all your credits
accidentally, right?
But you're just like, do this thing,
and we close the tab and you're like, I can't believe it.
So I just forced you to ask.
My credits are gone.
I forced you to click a button right now to continue.
And you could click continue, but yeah.
Like the old Netflix, Are you still watching?
Yeah, that's what it is. It's just the, Oh, there's code.
There's meat space code that I wrote. That's like, Nope, you have to stop.
I'm going to let it idle. I'm not going to let it roll. Cause it's going,
just doing crazy things. I don't think it should be doing something.
I'm going to let it leave it there for now. But yeah, but that is a nice thing.
Like, you know, there's a, you know, the free form exploration on like,
even for me as a open source maintainer,
like people will send up like a reproduction of a bug
and it's a whole elixir app, right?
So it's like just running mix on that thing
could pwn me, right?
So it's like, I usually have to go like evaluate,
like I'll usually manually pull out file by file
that would reproduce it,
but that could be like a bunch of files.
So there is something freeing about this like
full remote environment that I can just like throw away.
So for me, it unlocks like this pretty unique workflow.
But I think for things like this,
where you're just like, oh, try to run this.
And it's not something you would want to provision
your own server for and figure out
or run bunch of stuff locally.
So I think that could be helpful in that regard.
Now I think you might have mentioned this,
but I was of course distracted
as I was watching that thing go.
Is there the possibility of like persistent sessions
or something or like I could bake this,
the results of this into an image?
Because it would be nice to be able to fire off
a new one against our code base
with everything else set up and done.
It's all persistent there.
So that code that. But what about one of the brand new session, but with our existing code base with everything else. Yeah, so it's all persistent there. So that code that.
But what about a brand new session,
but with our existing code base already?
Yeah, just start a new chat and tell it to
clone that repo into a different directory.
So it's not like one to one.
It's a whole, it's like basically you can treat it
as you would treat your own IDE today.
Like your IDE that you work in,
you have multiple code files at different directories.
You can have multiple chats around the same code base,
like purpose-built, right?
My testing chat, my benchmarking chat,
or you could have multiple chats around different apps,
all in the same ID, and those all are persistent,
and they share the same environments.
Like I said-
So imagine it's like 1 VM, basically.
It's basically your 1 VM,
your one desktop that has packages running.
So if you wanted different environments entirely, that's TBD.
The architecture I have is set up for multiple IDEs, but then you get into like, we had to
see what users did with this first.
Because if I allowed you to create a IDE per project, that's like physical compute that
needs to be pretty beefy.
That's just a lot more compute for us,
which would be a higher price for everyone.
So if that's what folks end up wanting,
that's definitely something we can do
and it's set up for that.
But right now I think it's more,
right now at my hunches,
it's more these building black primitives for fly
doing like environment snapshots
that I think like it's less about,
it's less about like different environments
and more like I want to like let the agent and myself
explore but then like be able to get back
to that working state from a co-perspective
and an environment perspective and just like one click.
Well, exciting times, exciting times.
It's fun to watch yourself and so many people
tackling the same very interesting, difficult nut to crack
and how to make these things super useful
while also not super expensive and not super scary
because they kind of are in existential ways.
Yeah, it's pretty wild.
Yeah, so we'll see, I mean, this is still an experiment.
We'll see if the whole Phoenix new thing, where it goes
and if it works out.
But I do think something like this
is the future of programming.
Not necessarily that it's gonna be us,
but I think that something that looks like this
is gonna be what we're all doing in some capacity
much sooner than folks expect.
Well, you heard it here first, folks.
In fact, you've heard it here a few times now.
So fair warning, as these things are coming,
multiple people keep telling us this.
I feel like every time I stop talking,
I get a big sigh.
No, I'm excited.
I mean, I'm coming to grips with it all.
And I'm always been, I've always,
I do appreciate handcrafted things.
I like to write code and I like all that stuff,
but at the same time,
I've always been more results oriented.
I've always been more about the ends than the means,
even though I think historically,
you've had to care about the means
in order to keep the ends going.
And maybe we don't have to do that so much anymore.
Maybe we do, I don't know yet.
Yeah, I agree with you.
Like before Formatter is like,
I was aligning my equal science and like,
code is entirely a craft for me.
I was gonna say very much a craft,
but like it's entirely a craft for me,
just like woodworking is.
So it's like, well, like if I tell people it's like,
programming, yeah, it's purely a passion and craft for me.
Like it's like my favorite thing, my job and my hobby.
But I'm still, like you said, come to grips with,
like, here's where we are.
And I also say that, like, in the same way, when I go to Google
anything today, I'll type out in the Google search box,
and midway through, I'm like, what
am I doing with my life?
Like, why would I go to Google and do
this effort of going through the search results,
click on the web page, finding the thing?
And I'll just abandon that, and I'll go ask Tech GPT or Cloud.
But now that same thing is happening in a code with me, where I'll be like, def module.
And I'll be like, what am I even doing, right?
Why wouldn't I just ask the board for the starting point?
And I don't know how I feel about that.
And I don't feel good.
But even for me as this someone that considers programming a craft, I'm already there in
my mind.
And I don't know if that's because I'm a lazy human, or you know what I mean,
but it's like, this is a change that's happened
for code for me, as someone who cares about the craft.
So I don't know what that says, other than like,
this is just fundamentally changing, I think,
how we are as professionals, and I don't know
if it's good or bad, but it's happening.
I'm not sure this is a one-to-one
but this is somewhat of a rationale for me is
Do you all text message anybody in your life? Do you text message anybody?
Sure. Yes trick question. Not a trick question. I do too. Just so you know, I text a lot of people
Okay, one person in particular text my wife
You know frequently I'm I was actually gonna pause this moment here and just text right now A lot of people. One person in particular I text my wife frequently.
I was actually gonna pause this moment here
and just text her right now, it's because I miss her.
Okay.
Thank you for not doing that while we were talking.
Just so you know.
But instead of texting these days,
like an idiot, like typing the message out
one character at a time, I just talk to the thing,
because it does that, and I push send,
and more often than not, it's pretty close,
right, to what it should be.
It's kind of like that for me.
I don't wanna text the text anymore.
I wanna just talk.
Same thing with an app, I just wanna just talk things out.
I don't wanna go through these motions of...
Serious detection, dictation.
And pretty soon it's gonna be like
Yeah, my wife something some some love you W message. Yeah, exactly. Yeah
I don't want to talk anymore. Just like just something nice. Yep. Well, I don't think I'm gonna go there Chris
You know what? I normally say to my wife say it again
Really not takes you out of my fingers anymore. I'm talking it out. It's like, I got you attributed to that.
It's like, you know, I could, what do I gain from it?
And it's not exact one-to-one to like,
I could write this deaf module and write it all out,
but what do I gain by doing it myself
when I can have the board just do it for me?
And then we'll have more and more of these versions
of these things we do in our life. And you just say, well, I would just rather not do it for me. And then we'll have more and more of these versions of these things we do in our life.
And you just say, well, I would just rather not do it
that older way anymore, because this other way
is just like same place.
It becomes, the question is like,
why would you do it the other way anymore?
Like just don't do it that way anymore,
because this is the new way.
Yep, I think Thomas, one of my coworkers
that wrote a blog post on Fly,
it was about this whole LLM space and dialogue had a good comment, something like, you know,
people are writing worse versions of code purely out of spite that the LLM could do
better, something like that. It said much better. I thought it was really interesting.
Like the folks, like they know, they know that they would be better to like actually
go ask, but out of pure spite,
I'm gonna do this myself.
Well, as the old saying goes, don't move my cheese.
And our cheese is being moved,
and we need to be able to adapt or die,
as we've been saying often here.
And who knows, maybe you like the new world
more than you thought you would,
and that's what I'm starting to feel as well.
It's like, you know what, this way actually is,
it's got its warts, it's got its problems,
it's not perfect, and neither is any of the code
I've ever written in my life, so there you go.
All right, let's, how do we end this session?
How do we close this out?
phoenix.new, check it out now.
There you go.
If you haven't gone there yet,
well, I feel bad for you, son.
That's right.
Definitely share what you built with me
because I live vicariously
through watching people build things.
You probably had a great time here, man.
Hot or not for code, that was sweet.
That was fun.
It was so fun watching you actually
analyze what your creation was doing. like oh look at it did that. Oh my gosh. I can't believe it to this
The notes thing the notes thing was a recent addition to the system prompt where I squashed the window
So for research based tasks something that it's gonna be long relived context
it's supposed to write in a notes file.
And it was neat seeing it do that.
Yeah.
It's alive, it's alive.
All right, Chris, always a pleasure hanging with you.
Yeah, thanks for having me on.
Later, Chris.
All right, that is changelog for this week.
Are you feeling the vibe or are you getting all vibed out?
Well, I have bad news
for you if you're done with this topic. I don't think this is a passing fancy. In one form or
another, we are witnessing the way of the future and we're going to keep talking about it because
the agents are coming and we best be prepared for it. We'll continue our prepping next week when
Torsten Ball from Sourcecraft joins us to discuss building coding agents
in general and building AMP in particular. But on Friday we have something entirely different for
you. Well Adam does as he sat down with Jeff Kaley from World Wide Cyclery. I hope you enjoy
it and I hope I enjoy it too. I'm not sure what to expect from that one. Thanks again to Chris
for hanging out with us. To Fly.io for their continued support,
to Retool and Depot for sponsoring this episode, go to retool.com slash agents and to depo.dev,
and to Breakmaster Cylinder for the never-ending supply of dope beats.
Have a great weekend, send the show to your friends who might dig it, and let's talk
again real soon.