The Changelog: Software Development, Open Source - Let's build something phoenix.new (Friends)

Starting point is 00:00:00 Welcome to changelog and friends, a weekly talk show about existential vibes. Thank you to our partners at Fly.io who are highly featured in this episode not because they sponsor us but because they do cool stuff and we like cool stuff. Check them out at Fly.io. Okay, let's talk. Well, friends, Retool Agents is here. Yes, Retool has launched Retool Agents is here. Yes, Retool has launched Retool Agents. We all know LLMs, they're smart.

Starting point is 00:00:51 They can chat, they can reason, they can help us code, they can even write the code for us. But here's the thing, LLMs, they can talk, but so far they can't act. To actually execute real work in your business, they need tools and that's exactly what Retool agents delivers. Instead of building just one more chat bot out there, Retool rethought this. They give LLMs powerful, specific and customized tools to automate the repetitive tasks that we're all doing. Imagine this, you have to go into Stripe, you have to hunt down a chargeback.

Starting point is 00:01:25 You gather the evidence from your Postgres database, you package it all up and you give it to your accountant. Now imagine an agent doing the same work, the same task in real time and finding 50 chargebacks in those same 5 minutes. This is not science fiction. This is real. This is now. That's retail agents working with pre-built integrations

Starting point is 00:01:47 in your systems and workflows. Whether you need to build an agent to handle daily project management by listening to standups and updating JIRA, or one that researches sales prospects and generates personalized pitch decks, or even an executive assistant that coordinates calendars across time zones. Retool Agents does all this. Here's what blows my mind. Retool customers have already automated

Starting point is 00:02:10 over 100 million hours using AI. That's like having a 5,000 person company working for an entire decade. And they're just getting started. Retool Agents are available now. If you're ready to move beyond chat bots and start automating real work, check out Retool Agents today. Learn more at Retool.com slash agents. Again, Retool.com slash agents. Today we're joined by our old friend, Chris McCord. Welcome back, Chris. Hello, thanks for having me back.

Starting point is 00:02:43 This is your third, fourth, fifth, or sixth time on the pod. I don't know, I didn't look it up this time, but you've been around as the, probably talking Phoenix pretty much at all times, as my guess. I think so. I think so, yeah. Elixir maybe, but probably Phoenix.

Starting point is 00:02:57 As you know, we're pretty big fans of Phoenix. We've been running it for a decade now. So thank you still, and again, for creating a cool web framework. Yeah, you're welcome. I play with it. Which I use like none of your cool new features, like I'm basically using the stock crud abilities

Starting point is 00:03:12 from like 2016. Hey, that's cool too though. We'll take it, right? And it just works. It does just work, and I continue to enjoy it. I even avoided contexts, even though I was kinda keeping up with the Joneses. I am on a recent version, but I just ignore the warnings or whatever.

Starting point is 00:03:27 That's fine too, we could, yeah, there could be a whole episode on that, just one giant rant, but yeah, it's modules and functions, you know, it's all we're asking. That's right. That's right, we're asking. If you want to, it's a suggestion.

Starting point is 00:03:39 Maybe create well-defined interfaces, right, but that's it, so, yeah, do what you want. Well, I mean, who writes code nowadays anyways, right? That's right, it doesn't matter anymore, right? But that's it, so yeah, do what you want. Well, I mean, who writes code nowadays anyways, right? That's right, it doesn't matter anymore, right? Because one- It doesn't matter, that's where I'm getting to in my life and that's where we're getting to. With coding agents taking over the world,

Starting point is 00:03:57 it's like as long as they know what the new features are and I can test drive it in the browser. They write pretty good Phoenix context too, so I'll just do it for you. And you have a brand new related thing, phoenix.new. That's spelled out, spell it out, P-H-O-E, Adam, can you spell it out? Oh my gosh, yes, I don't mind.

Starting point is 00:04:17 Scared me, because I don't know how to spell this word, okay? P-H-O-E-N-I-X. Ding, ding, ding, ding. I win. Nailed it, nailed it. Whew, man. .new, which is the cool new TLD is the.new. It's the cool new. I love.news.

Starting point is 00:04:33 You know what I mean? It's like, it's the place to go to start something, you know? You gotta go there to do it. It was available, so it works out well. All the cool kids are doing it. It took us a long time to get.news. We could have got.new and put a slash s. I just realized that would be cool Just go to change log dot new slash s. I don't know. That's a hard time saying that out loud

Starting point is 00:04:53 The dev app URLs are also phx dot run So, yeah, that's cool to dot run. I didn't know that was a thing, but I was like this is perfect I like the new TLDs. I don't like that they cost a premium. Yeah, it's ridiculous. It's like, how about 9.99? Like it used to be in the good old days, you know? Yep, oh, I think I paid like, it was like seven something, $7. Oh gosh.

Starting point is 00:05:14 2003 was my first domain. I think to expect less than 50 bucks a year for a domain these days is just like not a possibility. No. Just not. What's the dot new going rate, Chris? I think it's several hundred dollars, 700 bucks, 800 bucks. I don't know. Wow.

Starting point is 00:05:32 It's a lot. First time or annual? It's annual and there's something like, I think within 90 days, you have to actually have like some kind of like real property on it or something. Or they. Oh wow. There's some rules there that yeah, you can't squat them.

Starting point is 00:05:45 Not that old, it can't be old. Yeah. I don't know how they enforce it, but you can't squat those, but I mean, they're kind of price prohibitive for squatting anyway. Those prices are like acting like zero interest rates are still a thing, you know?

Starting point is 00:05:57 It's like, come on. We don't have that kind of money anymore. Get it together, man. You know? But I should speak for myself because apparently fly.io sprung for this. Phoenix.new, they can afford it. And dot run, which is super cool.

Starting point is 00:06:12 Tell us about your new project. We started in back in December. Of course, this is kind of what everyone's doing right now is like, how can I make LLMs and agentic coding work in my slice of the world? And your slice of the world is Elixir and Phoenix. That's where you started, right? Yep, that's right.

Starting point is 00:06:30 Yeah, so we can talk about what it is now and what I think we accidentally made, which is this journey that I've been on since we started this. So right now, Phoenix New is a essentially vibe coding Elixir and Phoenix platform. But I think what differs a little bit is like, we give you like a full machine with root access.

Starting point is 00:06:50 So we kind of just like let the agent have full range to go full ham on whatever it wants, install app packages and build a full stack application. So a lot of these like five coding platforms will gladly write JavaScript apps and run them in the browser. But like if you want a real app, it needs to talk to the database,

Starting point is 00:07:05 needs to talk to file systems. We wanted to start by building a full stack app generator. So that's kind of what we've arrived at. So it's great at building Phoenix in real time live view application. So out of the box, you'll get what you would expect from a Vibe coding platform, fully designed. But then everything that should be real time will be real time, kind of like how we build things in Phoenix and Live View.

Starting point is 00:07:29 So the agents kind of like told, like, make everything real time. And then it typically makes everything real time. So that's like the current out of the gate experience. And what we found is like, it actually takes very little to get this agent because it has shell and it has these like sharp tools to like get it to do anything. So the first thing my coworkers did was they immediately had it create a Rails app and it's optimized for Phoenix currently, but it's like an effort to kind of nail this full stack application and giving it like, you know, we give it shell and root. It turns out that like you give agents like a few sharp tools, they kind of just can make decisions and choices on their own.

Starting point is 00:08:08 So kind of where I see this going in the future is how I'm building it as like a remote AI runtime. So similar to like Codex or Devin, or I think Google has like a Jules product now where you can just like have this thing asynchronously work on stuff. We can do that too. And it turns out it just does it. So when I built things, initially everything's running as an Elixir app behind the scenes, and that's stateful.

Starting point is 00:08:34 So it's like we accidentally made this remote thing. So the agent, if you ask them to build an app now and close your tab, throw your laptop out the window, it's going to keep working, and you can pop in from anywhere in the world. So it's already like, it's already headless and like you don't have to be there. So much like Devon or Codex, you can just ask the agent, hey, go check out GitHub issues or PRs and send a PR when you're done.

Starting point is 00:08:54 And like it will do that today. So I think, you know, while it's optimized for vibe coding out of the gate now, like a system prompt is like all about vibe coding an app. Like the next thing we wanna move towards is like more of these rich codex type flows that it can already do, but doesn't really know it can do.

Starting point is 00:09:10 That makes sense. You have to let coax it. How deep did you go on making it know Phoenix well? Is it just the system prompt? Is it deeper than that? Yeah, I mean, it's just a system prompt combined with let's say the quote unquote world knowledge of these frontier models.

Starting point is 00:09:27 But the remarkable thing is, so we're using Cloud for Sonic currently. But the remarkable thing is how portable it is. My intuition coming into this space was like, all these things are non-deterministic. You change one little thing in the system prompt, and it's a totally different behavior. And if you want to move to another model, like open AI or Gemini, it's

Starting point is 00:09:47 going to be a ton of rework. But it turns out like you just shop your system prompt around and you get reasonable behavior just out of these things, which is totally against my intuition. The knowledge is mostly gap filling. So like you're relying on this implicit world knowledge and then through a lot of trial and error, you see where it sucks. All these agents like to put bracket index-based access on Elixir lists, which blows up. It's not a thing. So you have to find these dumb things that these agents do

Starting point is 00:10:18 and then tell them what to do and what not to do. But it really isn't much harder than that. And then you give them tools to kind of get over stumbling blocks or like go fetch things as they need. So it's like, it can, since it runs shell, it can just like get the Elixir documentation out of a module locally or it can hit the web and fetch it. So it's just a fascinating field that I think is overly complicated that it's far more simple than folks realize. Huh, so somewhere in your prompt,

Starting point is 00:10:49 it just says like, Elixir doesn't have, Lisp does not have an at function in Elixir or something like that. Like you're literally just putting those little things in there. So that it never does it. Yeah.

Starting point is 00:10:59 You just, in dumb English, you're like, don't do this. And it doesn't do it after that. I mean, it's really a lot of trial and error. People likened it to like spell casting. But it's far less fiddly than I would have thought. And given the non-deterministic nature, I thought it would be like,

Starting point is 00:11:19 oh, now I'm gonna add one line and just gonna throw everything else off. And that's not been the case. It's actually been remarkable how much they stay well behaved. Now do you have regression tests for this? Because that doesn't have to be there, maybe with Cloud 5, because now it knows there's no at,

Starting point is 00:11:35 and you could pull that one out and simplify or is that just doesn't matter? Yeah, not currently. I mean, it's mostly, we've done a ton of trial and error. We have some headless driven integration tests where we actually do the full cycle, but nothing like scoring the result. Because that's the hardest part is what constitutes

Starting point is 00:11:55 a successful outcome. And it's not just getting to a running server because most of these models can get to a running Phoenix server at the end, but does it look good? So Cloud has been the best at design by far in my experience. So it's mostly about the end-to-end. Does the app look good? Is it just some cruddy thing? Or did it actually come up with some compelling actual...

Starting point is 00:12:18 You give it like, make me a to-do list, and did it actually come up with some compelling features that weren't just... that weren't implied, like were implicit. And so most of that is trial and error and just generating much naps and finding out. Have you found that Cloud 4 in particular is better than other things right now?

Starting point is 00:12:40 It just seems like maybe it's not this particular model or version, but it's like mid-2025 all of a sudden I feel like the coding agents and I specifically have experience with Claude where it's like, oh I'm not mad at you anymore like I used to be at the previous versions. It's slightly better than Claude 3.5 or 3.7 whatever the previous Sonic was. It's the best and I think it's just a little bit better, not remarkably than previous Cloud, but Cloud has been the best at these agent workflows, and I use words like, it's the best decision maker,

Starting point is 00:13:15 and it makes the best choices on what to do next. But most of the models, even like Grok 3, will go through the standard steps that you would expect the agent to do when it's building a Phoenix app. It's just like whether it gets caught on these little things or makes a silly mistake or like makes an app that actually looks good is Clyde just is like gets over that quality hump. But the other others are definitely viable. Like GPT-401 is similar in this agent-intake flow. It looks the part. It's just not quite as good as Cloud. And Gemini is the same. They work. And they're really good. And for

Starting point is 00:13:55 talking single file, make this code for me in one file, then it's a different story. A lot of people love Gemini 2.5 Pro. It does great job, but like as far as like this end to end, you're an agent, you make decisions on this step by step flow, Cloud just seems to nail it compared to everybody else. I ask that not to toot Cloud's or Anthropix horn, but because I feel like for me personally, and maybe it's all of them have reached a threshold

Starting point is 00:14:22 of quality recently, where I've kind of bought in now more fully than I was. And it just seems like it just recently happened. It was like sometime late last year. I mean, when GPT-4 came out, that was when, I wish I had had the insight then. That was pretty much what changed the game to do something like we're doing.

Starting point is 00:14:43 And we're just now catching up to, I think what these models have been able to do for a while now. When is phoenix.new to, to fly? Like, what does it represent? Is it a skunk works? Is it a growth model? Is it marketing? Is it R and D?

Starting point is 00:14:55 Like, what do you, how do you categorize it? It started as just, I would say more marketing and I'm not even gonna call R and D. So the, the original thesis was like a lot of folks in the Elixir community have been like, these agents are all doing JavaScript, all these platforms are doing JavaScript. And since JavaScript has the most data, we're gonna fall behind because JavaScript's

Starting point is 00:15:17 gonna eat the world because that's all the agents are gonna write and pretty soon no one's gonna care about what the agents are writing, right? So part of this was like, you know, can we show that Elixir and Phoenix are just, you know, work great with these large English models. And the other part was like with Fly is like, you know, we have a large customer base that is using our platform to do these vibe coding agents,

Starting point is 00:15:41 but a lot of them are just generating JavaScript. So it's like part of it was marketing to show like the original goal was like I had six weeks just to spike out a text area on a webpage to generate a full stack Phoenix app. We were just gonna use that as kind of a marketing for Phoenix to be like, look, you know, we're here. We can, you know, we can do the same cool stuff.

Starting point is 00:16:00 And then there's also one market fly for that segment to say like, yeah, we're great at like sandbox JavaScript, but hey, look, you know, you can just have the agent write whatever. So six weeks later, I had like, I basically had the MVP of what you see today. You know, it wasn't quite as posh and good, but it was like basically like in full in browser ID, generating a Phoenix application. And it was like, oh my God, like, there's something here, right? Like it was much more than I thought was that we could deliver. So we decided to kind of see where it went and see if we could turn it into a product.

Starting point is 00:16:34 But it definitely started as this just like little marketing R&D thing that suited the Phoenix side and fly side and then it turned into like, oh wow, this could be a thing and now it's a real product. So we're gonna see where it goes. So I would say Skunk Works, it went from marketing to, okay Skunk Works to now growth, right? Like, okay, let's launch this.

Starting point is 00:16:56 Okay, we have users. Okay, let's try and do this thing. So this is not a product? Is that where it's at now, product level? Yeah, we're in our product growth raise, right? I mean, before we launched, what was that? Four days ago, so we've had hundreds of people sign up at this point, so we're doing it.

Starting point is 00:17:14 Let's go. I mean, that happened for Bolt as well, right? Bolt.new. What was their previous company? I mean, it's the company that created Bolt. They were doing other stuff. It was like Node in the Browser. I can't remember what it's called. I've met a lot of them. Oh really, I didn it's the company that created Bolt. They were doing other stuff. It was like Node in the browser. I can't remember what it's called.

Starting point is 00:17:27 I've met a lot of them. Oh really, I didn't know they had some previous. Yeah, yeah, they had been startuping and doing cool things in the browser for a long time. I mean talking like three, four, five years and they'd been on JS Party and Bolt was their new thing. And it became their only, I mean it came out and just was really cool

Starting point is 00:17:45 and got huge adoption right away from folks. And so it became now, I think, who they are. It's like, talk about a pivot. I think it's crazy. Their story is actually quite crazy than that. It's, their founder had some like stuff out there, I think even as well. It was like a weird way,

Starting point is 00:18:03 the old version of the company kind of like faltered It was a stack blitz. That's what it was stack blitz. Yeah, just came back. Oh, yes. Yeah Yeah, I got a bolt on new is from stack blitz and now it's just bolt like that's who they are now Yep, that's so weird then maybe that's not the same like it who who is the real bolt here? Okay? Maybe I'm wrong here, okay? Maybe I'm wrong. There was an old bolt too then. Maybe I'm wrong.

Starting point is 00:18:26 I'm sure there's been another bolt. Yeah, so there's been, yeah, there's, and you know, they've had explosive growth as well as like lovable and there's been some big folks in this space, you know, so initially it was just, you know, let's see if we can kind of show this as possible in a full stack away. And it turned into like, oh my god, now it turned into like,

Starting point is 00:18:46 here's a full ID with a root shell in the browser. So I think pretty quickly it turned into a very compelling remote dev runtime, starting from what if we just gave you a text area? Because I think a lot of the other players in this space, you get the chat interface, and they kind of give you some kind of like basic code editor or code visualization, but we're just like,

Starting point is 00:19:09 now we'll just put VS code in the browser and let you and the agent go at it. So now that you all realized how generally useful this is and not necessarily specific to Elixir or Phoenix, like you can do other things, especially if you stop making it seem so elixir and Phoenix-y. Do you wonder if maybe you like, you know,

Starting point is 00:19:29 pigeon-holed it or misnamed it, or maybe it should be something different, or is Phoenix still cool, you know, even for people who don't know what Phoenix is? You know, we went back and forth on this a lot, because you know, it definitely started as like, let's do this for Elixir and Phoenix, and then over time it became apparent.

Starting point is 00:19:48 Oh, wow, this thing is like, turns out if you give the agent a full environment and you let it do sharp tools, it could just do things. So we decided to, we wanted to nail one stack to start. So once it became apparent that we could use this for pretty much anything, right? Like Ruby, PHP, Go, Rust, like all the languages you would care about are already on the box. But we wanted to actually like give a compelling experience for one stack first, right? Because it's like if you could release this, right? But if the agents can just like, if the agents just flop around being moderately okay

Starting point is 00:20:25 at Rails or Phoenix or whatever, then it's still not gonna be a good experience. So we definitely wanted to start with, let's nail one stack, let's actually make it compelling. And Phoenix gives you a lot as well, like real-time features, right? So it's like, if you can nail one stack, and especially with Phoenix,

Starting point is 00:20:40 you get these real-time apps that sync out of the box. There's something, I think, unique towards the future of, if we take the argument that JavaScript eats the world, and it doesn't matter what language these agents write in, they're going to use JavaScript because that's what they've seen, we can flip it around and say, well, what if we can get to that world where the code that the agent write doesn't matter for us or people asking it?

Starting point is 00:21:04 Maybe Phoenix can be the thing that doesn't matter, right? people asking it, like maybe Phoenix can be the thing that doesn't matter, right? Maybe we can be so lucky that most people have, they don't care. And if you flip it around and say like, let's like, could we do that? Then the agents actually have the ability to make these really compelling experiences

Starting point is 00:21:18 with far less like glue and things, infrastructure to bring in. So it's like there may be, I think, you know, there's a thesis and a story there. I'm like, if we are keep progressing towards this world where there's like less and things, infrastructure to bring in. So it's like there may be, I think, you know, there's a thesis and a story there. I'm like, if we are keep progressing towards this world where there's like less and less, like we don't show the editor anymore because the agent, you know, agent does that code stuff.

Starting point is 00:21:32 Then I think Elixir and Phoenix actually may be the perfect language to be that thing that people by and large don't care about. That makes sense. So there's, I think something special there with Elixir and Phoenix, but I do agree that the positioning has been tricky for us but right now it's like we want to get make it compelling and make it compelling for the folks that don't care about the language or

Starting point is 00:21:51 Get them into elixir Phoenix this way and then as we do that Backfill with with other stacks and kind of see what we do branding wise but TBD How close are we to that future where the language matters less the editor is shown less like how close we to where That's a realization. It's a contentious topic. It is Yeah, so I would say like the CEO of fly kind of like I'm not gonna say pitch to me well one he thinks Phoenix knew is like the most successful nerd snipe of all time because you know, it started as his idea of like, oh, Chris, just, you know, spend six weeks go make this like text story on a webpage and it turned into an accidental product.

Starting point is 00:22:30 Yeah. But it was his insight on like, you know, if we are heading towards that future, like maybe we can make it like Phoenix that platform that is these agents are excelling at. And I thought that seemed far off. But then if you follow the Hacker News discussion on the announcement, the top comment was a PHP developer who had never... They knew what Elixir in Phoenix was, but they never tried it.

Starting point is 00:22:53 And they were like, well, it's now or never. So they signed up, and then they developed... They made a tic-tac-toe game that was multiplayer, and you could create your own room and then, like, play with other people. And they made that in one sitting and then deployed it on fly and they had never touched Elixir and Phoenix before. So it's like in one sitting this person, they were an experienced developer but they didn't write a single line of code and had this like compelling experience that converted them to an Elixir user, a Phoenix user, and a fly paying customer in one go. So I think

Starting point is 00:23:23 that like it seems like people hearing this that are just coming into this space will think it sounds like way far off, but it's like, we're seeing that today, right? Like literally someone came in, like typed into a chat and like they got this app multiplayer real time out of the box. So it's like, maybe not as far off as folks think.

Starting point is 00:23:42 And I think that's where we're headed. I think that the programming, I'm going to call it iteration because developers are very, they don't like this idea, but I do think that local development becomes less and less valuable. So, it's like, I think that most of our code iteration, most of the computation time is going to happen remotely just because these agents provide value at all times. So it's like, it will become silly to think that like I close my laptop and like work stops happening because why would it, right? So like, this is again, forward looking statement. But for me, I think

Starting point is 00:24:14 the future programming is like much more like your CI environment is constantly out there just like fiddling and doing stuff and like you pop in and check on it or work within that context, maybe locally too, but your predominant thing that's being the artifact that is your software is going to be running somewhere else and the agent is going to be doing that subset of that work and where that subset starts and stops, I don't know. I can't predict the future, but I feel pretty confident that's where we're headed. But we'll see. And a lot of folks do not like hearing that opinion.

Starting point is 00:24:49 Well, it has huge implications. I'm hearing echoes of the death of the IDE, which is what Steve Yegge predicted on this show a few weeks back. And he didn't mean like it's gonna disappear, but just the reducing towards obsolescence, like you're moving away from it as an important piece of the thing.

Starting point is 00:25:09 The most interesting thing with this is like, part of when I put this together is like, a lot of these other Vibe platforms don't have a real IDE. So I thought it was like really compelling to have like VS code in the browser. And I still think that's true. But then the funny thing about develop, like making that is like the editor, the IDE that like true. But then the funny thing about making that is the editor,

Starting point is 00:25:26 the ID that most people think is the thing, is just eye candy for humans. So like this agent- They're just watching it do stuff. Yeah, this agent, it serves no purpose to the agent. So the agent, you close the tab, it's not aware of VS code, right? It's just literally there for us slow meat brains.

Starting point is 00:25:43 I mean, we can go in there and interact with you, but it's fascinating to like, to work my way, you know, bottom up and then be like, oh, this thing could just go away and it doesn't matter for the actual process of the agent working. It's just fascinating. So I definitely, that resonates with me. And I don't know how I feel about it fully,

Starting point is 00:26:02 but it is the reality of what, where we're headed and reality of where we're headed and kind of where we're at. So yeah, I definitely agree with that. Yeah. And we tend to anthropomorphize too much, but I can imagine if I would just to do that a little bit, that the agents themselves would be fed up with us

Starting point is 00:26:19 at some point. Like, why do I have to show you what I'm doing and like teach you this stuff as I go? Like, you're adding nothing to me here basically. Like just and teach you this stuff as I go? Probably, I mean that's where- You're adding nothing to me here, basically. You're in the way. Just let me do my stuff, I'll report back, and then you tell me if I should do something different.

Starting point is 00:26:31 That, I mean, I totally agree. And that's where we're at. It's like there are limitations currently, but it's like you can just let these things go off and rip and then come back, or they just send a PR when they're done. Right. And I think that, yeah, that makes people uncomfortable.

Starting point is 00:26:48 And it's also weird to me, like this whole, we're in a really, really weird time, right? Where you have people that are getting all this value. Like I'm using LLMs every day and I'm like, I feel like I'm a God tier developer, right? And then I have like people that are really intelligent peers that are like, LLMs provide no value to me. And I'm like, I don't know how to reconcile

Starting point is 00:27:09 for these two worlds, because I'm shipping more than I ever could. And then there's also, the whole space is weird too, because it's overly complicated by the folks building these tools, I feel like. It's far less complicated after coming through this experience than I expected it to be. And it's also like everyone's trying to build an editor too.

Starting point is 00:27:28 So I think it's just like, you know, I could be wrong, but it's just like a very weird like I think Windsurf there's rumors for like multi-billion dollar valuation or acquisition. You've got cursor, which is doing amazing work, but everyone's like trying to build the IDE. And I feel like we're building the IDE, and the IDE is gonna disappear by the time they get done building the IDE. I don't know, it's just a weird time.

Starting point is 00:27:52 I feel like the real part of this is, and folks are working on like Jules and Codex and Devon as well, that there's some medium point that these things meet, and I don't know that it's gonna be a desktop IDE, but we'll find out. So as the purveyor of the Phoenix framework and this potential world where phoenix.new brings Phoenix framework even more users through this selection process, right?

Starting point is 00:28:19 Because, not necessarily because of the ergonomics or choices of Phoenix but what it provides with WebSockets and all this stuff built in the PubSub and the real time features and all the other things that Phoenix has. If that ball starts to roll, right? That snowball starts to roll down the hill and get bigger. Do you then look at Phoenix as a framework differently

Starting point is 00:28:39 and say, okay, how can we build Phoenix differently to actually make it, I don't know, even better for these things? Or how does that change your view of Phoenix? No, it's a good question because we're already, already like every thought is like, well, how would this affect in a good way or a bad way, large language models.

Starting point is 00:28:59 So it's like, but the most fascinating thing for me is like, they're much like people, I know we talked about anthropomorphizing, they're much like people in that they're trained on the data that's out there. So in the same way, I'm like, well, if we change this, actually I was just talking with Jose Valim today, like, well, if we do that, then the agent's going to run this mixed command that's going to be deprecated.

Starting point is 00:29:19 But it's funny how alike it is, it's the same thought you would put into your existing user base, right? Like, oh, well, people are used to doing it this way, and now they're going to have to do it that way. So it's a very similar overlap. But I do think it changes fundamentally like how you start thinking about features, because it's more like LLM first versus like People First, which also

Starting point is 00:29:39 makes folks uncomfortable. But that's where we're headed. So yeah, so I don't have any concrete examples yet other than like pretty much every decision now is like taking that into consideration. And then one thing we're doing is like the community is standardizing on like an agents MD file. So like Phoenix, yeah, this is naming overlap phx.new and the Phoenix project generator will have an agents MD file that gives you a lot of kind of what I have in the Phoenix new system prompt, like a lot of these, these gap fillers basically in the in a lot of communities are doing something similar, but we're right to have like each

Starting point is 00:30:13 package have their own agents MD, which is just a plain text file that agents can utilize, but you can also make like, you know, a mixed task that extracts these things and just an easier way to like lift that into, you know agent you're running whether it's cloud desktop or anything or Phoenix you could look at these files as well. And so it's kind of like on our minds for everything we're doing now and I think that's that's where everyone's heading at this point. You were saying before though this kind of goes back a little bit but you were saying before that the person adjacent to you let's just say got not a lot of value from, or no value from,

Starting point is 00:30:46 an LLM, but you were getting a mint. What kind of value are you getting? Is it just in your software life, or you're writing more code? Is it in your personal life? How are you using and getting value? Yeah, so it's like, yeah, I'm gonna sound like an evangelist.

Starting point is 00:31:01 Like, the really weird, yeah, it's just, we're in this weird time where, an evangelist. Like the really weird, yeah, it's just we're in this weird time where like folks have equated it to like cryptocurrency scammers where I feel personally slighted if someone, yeah, like it's like, oh, it's just like a crypto hype. It's like, I'm literally getting value every day. But in any case, like it's at all levels. So like, for me, it's changed like any little thing you want to spike out that would take you several weekends, right? I can just go generate that thing. And then four minutes later, I have it.

Starting point is 00:31:30 So we could do that on air, right? What is some little app that, regardless of what it is, that you just haven't had the time to work on, you could go have that thing just be done. But also from just things as a developer I can't be bothered to do. I mean, I test my code, but I'm like a regrettable tester where it's like, I have to do this, so I'll do it after I get my things working. But now it's like the vast majority of my tests are started by an LLM by a large.

Starting point is 00:31:59 And they'll even find edge cases, like the Phoenix new parser that's parsing the token stream is fully tested by, test generated by the LLM and it caught some edge cases that I didn't even think that were there. Benchmarking is another good example. I asked the agent to, I was working, I used Phoenix new to work on Phoenix new. And part of it is the token rate limiting, we're rate limiting all the incoming and outgoing tokens. You don't want to lose those because it costs real dollars. If someone sends up a request and like the token rate limiting, we're rate limiting all the coming and outgoing tokens. You don't want to lose those because it costs real dollars. Like if someone sends up a request and then cancels it early, we still have to calculate

Starting point is 00:32:33 that. So anyway, it's like a in-memory, it's backed rate limiter that syncs with Postgres. I wanted to know how fast it was in general and then how long it would take to sync because I have to do some locks. And that kind of thing is kind of thing is like take, it could take several hours for me to actually try to benchmark that. Like setting up the benchmark. So I just asked like, asked Phoenix to benchmark this code and it extracted again, I gave it nothing then let's benchmark this.

Starting point is 00:33:00 It took the, it was a gen server with Ets doing Postgres syncing and it took the critical path of the code, it put it into an EXS file. So instead of trying to like drive the code in an integration way, it just like automatically duplicated the critical path and then ran that in a tight loop. It gave me all this formatted output of like 1000 rows,

Starting point is 00:33:19 10,000 rows, 100,000 rows. It put it in the console and like a pretty formatted table and it wrote a Markdown file of a summary. So these kind of things, at all levels, I feel like this is how we're gonna do everything. And it's like, whether you're like, you've never programmed before, or you've never programmed a lecture before,

Starting point is 00:33:35 you can get value there. Whether you're like, you created a framework, and at the far end, you're still gonna be able to use these things to do the tedious work or future work. So what I try to tell people that the seasoned developers is like for me, LLMs, like the discourse is like everyone's like, oh, it's all AI slop, which I think is a silly argument, but like it's not AI slop for me. It's like these LLM, the code that the LLM generates,

Starting point is 00:33:59 that artifact is a starting point. And the discourse for some reason for people that are on the negative side seems to be like they treat that thing that falls out of your chat GPT as the artifact that you shift to production. But it's like, no, these things are just a starting point. So now it's like, instead of having myself write out this 100 lines of server code, it's now just like this really intelligent co-generator that's

Starting point is 00:34:23 my starting point. It's not what I then just shift to the production. So I think the discourse is flawed, but I think that at all levels of the experience stack or programmer hat stack, you're gonna have people getting value out of tools like this. The AI slop is the blog post that nobody wanted to read and its only purpose is there to attract attention

Starting point is 00:34:44 so you can sell some advertising or something or the essay that you spat out because you didn't have time to actually write your own. That's slop. Yeah, I mean, I like to say we've all been sloppy vibe coders, right? It's just now way easier, but the people copy, pasting, stack overflow,

Starting point is 00:34:59 and the people that ship that chart to production, now they can do that more easily, but those people were already writing bad code and not carefully considered prior. So that's going to remain true. It's going to be easier for those folks to get something into production. But it doesn't change the fact that you can't... I don't know. I feel like it's more gatekeeping than anything else. Folks are throwing that term around. Well, friends, it's all about faster builds. Teams with faster builds ship faster

Starting point is 00:35:28 and win over the competition. It's just science. And I'm here with Kyle Galbraith, co-founder and CEO of Depot. Okay, so Kyle, based on the premise that most teams want faster builds, that's probably a truth. If they're using CI providers

Starting point is 00:35:43 for their stock configuration or GitHub actions, are they wrong? Are they not getting the fastest builds possible? I would take it a step further and say if you're using any CI provider with just the basic things that they give you, which is if you think about a CI provider, it is in essence a lowest common denominator generic VM. And then you're left to your own devices to essentially configure that VM and configure your build pipeline.

Starting point is 00:36:08 Effectively pushing down to you, the developer, the responsibility of optimizing and making those builds fast. Making them fast, making them secure, making them cost effective, like all pushed down to you. The problem with modern day CI providers is there's still a set of features and a set of capabilities that a CI provider could give a developer that makes their builds more performant out of the box, makes their builds more cost effective out of the box and more secure out of the box. A lot of folks adopt GitHub Actions for its ease of implementation and being close to where their source code already lives inside of GitHub.

Starting point is 00:36:48 And they do care about build performance and they do put in the work to optimize those builds. But fundamentally, CI providers today don't prioritize performance. Performance is not a top level entity inside of generic CI providers. Yes. Okay, friends. Save your time. Get faster builds with depo, Docker builds, faster GitHub

Starting point is 00:37:07 action runners, and distributed remote caching for Bazel, Go, Gradle, Turbo repo, and more. Depot is on a mission to give you back your dev time and help you get faster build times with a one line code change. Learn more at depo.dev. Get started with a seven day free trial. No credit card required. Again, depo.dev. Get started with a seven day free trial. No credit card required.

Starting point is 00:37:25 Again, depo.dev. Well, should we try to vibe code something? I got an app idea. I wanna see it. Okay. I'll screen share this. And for our listener who doesn't have video, have no fear.

Starting point is 00:37:41 We're not gonna leave you behind or something. We can talk there. Yeah, it's nothing better than live coding in a non-deterministic way. I did this on stage at ElixirConf EU, where it's like, you know, I always like to like live code, which has some level of risk, but then you're like, you know, you're live generating something that you,

Starting point is 00:37:57 you know, it's just a random number generator ultimately. So let's do it, let's see what happens. All right, so here's my app idea. It's like hot or not, but for code functions. So like imagine Chris writes his version of quick sort, right, and I've got a better way of doing it. And so we both enter our quick sort function and then other people vote.

Starting point is 00:38:17 Like, is this hot or does this not? All right, is this good code? Let's do it. So I have phoenix.new open over here. What would you like to build? Pick one or type your own. Of course, you have a video out there, seven minutes on the to-do list,

Starting point is 00:38:30 so we're not gonna do that. How do you suggest I prompt this thing? Just tell it what I just told you or get more specific? And just what you said. So here's the remarkable thing is people, the intuition and the tribal knowledge is, you gotta be as specific as you can. The remarkable thing is like, in terrible English with typos, you just ask for the thing and

Starting point is 00:38:49 the agent has intuition or will give you reasonable questions. Like someone asked it about making like a mashup of communication providers, like mashing up SMS and email. And it was like, well, what would you like to use? Twilio or Synger? Like would you want to graph UL API or JSON? So it's like, let's give it like, I mean, do what you want. Freeform, but I don't think you need to actually

Starting point is 00:39:10 spell out anything. Just tell it exactly what you told me. So I said, let's build hot or not, but for code. You put your code in and people can vote it up, hot, or down, not. Good enough? Should I be any more specific than that? Whatever we want here, let's see what it does.

Starting point is 00:39:26 It's gonna hype you up. You're a hype man. It's a great idea. Great idea. Thank you. Now you're starting to stroke my ego. A hot or not for code where developers can submit code snippets and get community feedback.

Starting point is 00:39:37 Here's my high level plan. Oh, it's a 12 step plan, 12 to 14 steps. And so it's gonna give me 11 steps with some features, submit code snippets, blah blah blah, real time voting. There you go, there's your real time. Now, did you system prompt like be real time by default if they don't specify? Because I didn't say anything about that.

Starting point is 00:39:56 It's basically like, you know, a Phoenix framework has PubSub, build in, presence, whatever. So like anything that makes sense to be real time should be real time. That's more or less the gist of it. Gotcha. So I'm not being very discerning. I just said, yes, great plan, please continue.

Starting point is 00:40:12 And now it's gonna ask me if I want to do dark theme, minimal theme, vibrant tech, professional, corporate or something else. Adam, you got any cool theme ideas? It nails Tron when you ask, but the cool thing with these choices here is like, we just... Yeah, that's a good one there. I was tired of...

Starting point is 00:40:30 Yeah, Tron is always great. I was tired of typing yes and no to the agent, so I was like, in the system prompt, I was like, anytime you idle, give the user a choice. You know, example, yes, no. And it started producing stuff like you see there. And you're like, what? Like, it's just remarkable what you can get out of these things without trying it's just like I thought it would be way more like trying right I want to guess no and then it's like would you like

Starting point is 00:40:53 what did it type here like would you like a dark github dark style like you're like what I say now here's six here's six options they're all good yeah so it's gonna write out a plan and a plan in the file so it plans out its own work and then that remains in context now your server Yeah, so it's gonna write out a plan, and a plan in the file. So it plans out its own work, and then that remains in context. Now your server's running, so it compiled and built it in that amount of time, and you get that live preview. And then you get that URL as well

Starting point is 00:41:14 that you could share with that. It's private by default, but you can toggle it public, and anyone could visit that Phoenix server now if you toggle it public. All right, I'm gonna paste this to you guys. Sweet. Riverside chat.

Starting point is 00:41:28 Oh yeah, and once this goes right, you toggle to the public in the top right there by your, it shows the URL in the top right of the other. I know the public little toggle there by the pink text, purple text, left. All right here. There you go. All right, so I made that,

Starting point is 00:41:46 so it gives me a phx.run URL that I made public, paste it to you guys. Meanwhile, it's coding things, right? I'm not even paying attention. There is a syntax error and it's slanting the code as it goes. So just like we can see the browser here, it actually has its own headless Chrome browser,

Starting point is 00:42:02 so it's able to visit the page as a human would with a real browser, see JavaScript errors, and then it can also interact with the page. So if we're lucky, we'll get to a working hot or not and it will post its own code snippet to the app and we'll see it in real time by using the, by actually driving the browser. That would be amazing, right? We'll see what happens here. So it's writing the- Or not.

Starting point is 00:42:23 Oh, it's giving us a, yeah, it's going to start with a static design here. So this is it just writing a... Let's see. Syntax error. It's fixing up the compilation error. Boo. My guess is we'll see if it actually is this issue. So someone reported this.

Starting point is 00:42:40 If it's trying to write a code example on the page, it's going to use curly brackets. And one of the open issues internally is if you're used to Elixir HEX files, like our curly bracket is a reserve syntax. So like if you try to put a code sample in like a code tag or a pre tag, HEX throws a compilation error. And this is like the same thing that trumps up people. Any time people want to do this, they go to the forum and they're like, how do I write this?

Starting point is 00:43:05 So you actually have to annotate with a phx-no-curly interpolation. And I have a branch where, yeah, I'm sure it's hitting this. So we need to actually tell it, hold on, let me see. So it's amazing we hit this. Let me, I think I'm a nerd. So I pick a code app, anything. No, no, it's fine. This is good.

Starting point is 00:43:24 Okay. So it's one of these edge cases that like No, no, it's fine. This is good. So it's one of these edge cases that, again, trips people up where they're like, how do I do this? I can't interpolate. But eventually, that agent will probably start trying to interpolate by stringifying the brackets. Hold on. I'm going to paste this to you because it's a really long.

Starting point is 00:43:44 It fixed it. Did it really? Oh, okay. I might, we'll review this later. My guess is it like put it in like, like interpolated the strings, like it did something ridiculous that works around the issue.

Starting point is 00:43:57 Does it ever stop so us meat bags can keep up? I was like, cause I was gonna read what it was doing up there, and I'm afraid it's gonna, I'm gonna miss something now. No, the thing is. It keeps coding. Meat thing is, uh, me slow down, buddy. I'm trying to keep up. Here's the thing on the newer models. They're not as good. So like Gemini flash is fast enough where I get existential vibes because like we're following this now

Starting point is 00:44:18 and we're like, Oh yeah, it's working on the context file here. But like Gemini flash is so fast that you lose. It's like, but like now I Flash is so fast that you lose track. It's like, brrr, brrr, brrr, brrr, like now I'll do this, now I'll do this. And you feel like you're in the way. Now, granted, it's not the quality is terrible. It doesn't give you working apps, but then the first time we're like, I can see the future where I'm like, I'm just sitting here as a meat bag and I'm in the way, right? Like it's just, I can't even read what, you know, follow what it's doing. And that's like where we're headed, right? It's like, it's, we're not there, but we're.

Starting point is 00:44:49 Look at this. Okay, so yeah, so it made a static HEEX file. So none of that's functional. And now it's going to actually use that and write a real app around that static file. So now it's writing the live view here and it's going to start doing the live view here, and it's gonna start doing the live view PubSub and everything.

Starting point is 00:45:09 But it gives you that, like, you know, it hit the syntax errors, but it gives you, we wanted to give people, like, the early feedback of, like, seeing the app, like, what it's gonna look like versus waiting, you know, the whole time, and then at the very end. So you tell it to do that, like, build a static version first, and then make it live.

Starting point is 00:45:22 Yep, and it's also helpful because if you wait till the end in, like, something about, about like it's good, I mean just like humans start with a mock-up, right? So it's like for the same reason you don't want to have a consultancy just like here's your finished product and you're like oh I really didn't want you know there was some fundamental difference of the design that would have made the code much different. It makes a lot of sense to work in the same way like you know a. Look at that, yeah, see it nails Tron pretty well. It does look kind of cool, very Trony.

Starting point is 00:45:50 Submit your code, let the grid decide. Yeah, and then it picks copy, so here's the thing, these LLMs come up with copy that makes sense and all you said was the word Tron, right? That's what I said, let's do a Tron style, that's all I said, Tron style. Okay, so to see how it's using its own web browser now. So you said it visited the app.

Starting point is 00:46:08 So there, it would have caught any JavaScript errors. It actually saw the app. So it's like, this was also one of the special things that I feel like made it error, like really good error correction. Because not only can it see it looks at our logs, it can actually try to visit the app. And if it broke the JS build, it would see that too.

Starting point is 00:46:25 Oh, it's gonna try to post something. Did it work? Close the terminal now on the little right hand side X. It tried to write code. Did it actually add it? There is an issue. It says, excellent, our trans-style code reader is working perfectly.

Starting point is 00:46:41 Let's set the functionality or test it by submitting a sample code. There's an issue. The web tool is trying to fill a select dropdown with text instead of selecting an option. If the Ron style code reader is working perfectly, let's set the functionality or test it by submitting a sample code. There's an issue. The web tool is trying to fill a select dropdown with text instead of selecting an option. Let me try a different approach. So if you expand, I see the issue real quick.

Starting point is 00:46:53 Like if you, there's a little expand button on that message right there, yeah, yeah. So you can see it actually wrote JavaScript to eval on the page. So it actually tried to post something for real. So like within its own headless browser, it was trying to, oh wait, the Fibonacci generator, is that what it wrote? It's trying to write a Fibonacci generator.

Starting point is 00:47:12 It did. So the recent submissions here are, is what, so it uses browser. I think it, there's probably a handle info. I don't, I can't see the code. My guess is, um, it blew up like maybe something in the PubSub crashed it, but it actually interacted with its page by writing its own code in JavaScript to run on the page. And bam. So are you guys seeing this too?

Starting point is 00:47:35 So if I vote this hot, are you gonna see my update? Wait, hang on a second. Hold on, I gotta open this now. Oh, look at all those hots. A quick sort, there's a hello world in Elixir. Oh gosh, there's more? There's a quick sort algorithm. Oh, there's 42 those hots. A quick sort, there's a hello world in Elixir. Oh gosh, there's more? There's a quick sort algorithm. Oh, there's 42, so this is hot.

Starting point is 00:47:49 I just, I just hotted Fibonacci. Is it 43 now for you guys? Yes, I'm gonna say not. I'm hot, man, it's 44. Real tone. Oh my goodness. Who's doing the nuts? Not, not, not.

Starting point is 00:47:58 So yeah, so that's, this is, I mean, other than the syntax error at the beginning that I got caught up on it. Get out of here. This is, this is Phoenix and New, right? And it's a. Oh my gosh got caught up on it. Get out of here. This is Phoenix new, right? And it's a... Oh my gosh. So I'll try to post some code here real quick.

Starting point is 00:48:09 I just want to see if it fully works. Let me grab a... Cause we didn't really follow, you know, we just let the agent figure it out while we were like, whatever, do whatever. I'm curious if there's a, if it's going to show up on everyone's screens or not. Yeah.

Starting point is 00:48:26 So we have one, two, we got three submissions. Whenever you submit. I'm gonna say fib copy pasta, because I'm gonna copy that one and repaste it. So that's fib copy pasta in Python, paste. And upload to the grid. There, that's my fib copy pasta, but it doesn't have any votes.

Starting point is 00:48:45 You guys wanna vote for it? I'm hot on it right now. Did it show up on everyone's screens? Yeah. I got one hot, I got two hot. It's at the bottoms. Three hot. It's at the bottom. So there you go, fully real time.

Starting point is 00:48:55 The agent actually used the app. Successful run. Yeah, yeah, so now it probably offered up some ideas to continue, but this is basically where, yeah, so it excels at getting here, right? So the Vibed app, and it will gladly continue, and you could add features, you could add user auth. Getting to this point was where we were like, okay,

Starting point is 00:49:15 this is, we wanted to nail the, does it deliver some, from prompt to some compelling, full actual application experience? So that's all, it's SQLite by default, but that's persistent to the database and that's something you could deploy. Now let's say I wanted to take this and run with it. Yep.

Starting point is 00:49:33 What would I do? So you can, in the hamburger menu, you can copy get clone and run that in a local shell. And boom, that's gonna be proxied through the Phoenix app. And it's like proxies all the way down. That request will go up to a fly proxy on some Edge node, we proxy to the Phoenix app that you're using the chat, which we would then proxy with fly replay to your IDE,

Starting point is 00:49:55 which has a reverse proxy that goes through get HTTP back in to clone that. And then back up the chain, right? Now, could I start? Then we have a Phoenix application as you well know. Could I just like give it that? Yeah, there's a copy. There's a copy get clone as well

Starting point is 00:50:13 that you paste in your local shell and it will show up in your ID, VS code ID right here. And that's where like, I want to add next a pair mode. So like it's system prompted examples are fully like vibe mode, right? So if you do send up an existing project where you want it to take more measured steps, you have to, like, be explicit in your initial prompt, like, you know, do this step-by-step and wait for further instructions. But I want, like, a toggle, right?

Starting point is 00:50:36 Because people don't want it to just, like, go full ham all the time. So that's something we want to explore next. But, like, getting the vibe mode out was the, was a real initial goal for us. And I think we've, I think we've pretty well nailed it. So it's always exciting to see someone do something and have a good outcome. So pretty cool.

Starting point is 00:50:55 Well, especially because it was on hard mode because it had to paste code into its own. Can we look at that? Although it fixed it all right. No, no, it will be in the, well, it'd be in the Git history, but I'm curious what it did. Anyway, it's not too important. I always guarantee it was the interpolating the Heeks,

Starting point is 00:51:13 because Heeks is gonna blow up when you lint it with a bracket error, and it's confusing to, like I said, the humans too, because Heeks can't tell you, oh, you're trying to literally add code, right? We just blow up like you fat fingered a bracket just like in your markup. So I'm just curious how it worked around it

Starting point is 00:51:28 because it probably did not use the no interpolation. My guess is it added like some ridiculous interpolation of the literal elixir string of brackets or something, but. So here's where it finds the error. It says, I see the error was caused by unescaped raw code lines in the home in HEX. I'll fix this by wrapping the code blocks correctly

Starting point is 00:51:47 with HEX safe sigils. Sigils? I don't know. Okay, yeah, that's exactly what it did. So it did inline elixir, and then it interpolated some elixir code that returned the string of the bracket. So it like, these agents brute force,

Starting point is 00:52:02 that's not the solution, right? The solution would be like, cause then if you have a code block, you have like all these little strings of quotes around or brackets and it was just like, whatever, I can make this work. I have the technology. So that was pretty cool.

Starting point is 00:52:16 I have a terminal somewhere as well, don't I? How do I get to? Yeah, if you click agent terminal, that's the one that the agent, so if you like get logged there, you'll see all, every time the agent touches a file, it doesn't commit. So you and the agent said you like get logged there You'll see all every time the agent touches a file it it does a commit So you and the agent both could like revert back To each file. So one thing we also want to add is each of the file tools that it did will have a revert button

Starting point is 00:52:35 So you can just we'll just do a get revert back to that state Of each of these commits. So the agent knows kind of like each each file shot at any any given point as well. There it is right there. Fix syntax error by correcting html entity encoding and code blocks and so I should be able to just get show that and see the actual diff which it's like piping through more or something. There it is. Well that's not it. Now that is it right there. Did you ever hear about this theory, the monkeys? There was an experiment where they had a cage full of monkeys. And at the top of the cage or like in the center of the cage, it was like this

Starting point is 00:53:17 this thing they can climb to get to the bananas. Let's just say, right. And the first batch of monkeys, they don't know any better, right? So they climb this thing in the middle to get to the bananas because they want the bananas. What monkeys want, right? Naturally, as a monkey would, it climbs and does. And that's not the way this place works. If you try to climb that, you get sprayed down and it sucks. You don't like it. And so they all learned that monkey climb monkey get banana monkey gets prayed monkey get hurt doesn't like it

Starting point is 00:53:50 okay eventually these monkeys they they get replaced with monkeys who only have ever been there let's just say now the monkeys they only know what they know because it's tribal knowledge and so they no longer ever attempt to do this. Although they've never been sprayed, they don't try to attempt to get the monkeys because- They don't know why, they just don't do it. And so the reason why I tell you all this is because we're looking at some really awesome

Starting point is 00:54:19 Phoenix code and we have a Phoenix application, so we have this background. What happens when the monkeys don't care about the code anymore? You know, they just don't know what to choose and the LLM chooses for them and the taste making is known by the taste makers. It's more like this hodgepodge.

Starting point is 00:54:37 Maybe it's good, maybe it's not. You know, that's what I'm thinking about. Yeah, that's a good question. So I think like in the medium term, and I don't know what timelines, like I do think it's safe to say that like, you know, good question. So I think like, in the medium term, and I don't know what timelines, like, I do think it's safe to say that, like, you know, the Anthropics CEO said that like 90% of code by humanity by the end of the year will be AI generated. And people like dunked on him for that. I think that's absolutely going to be true. I mean, if you just look at like, and again, these aren't like, it doesn't mean that like

Starting point is 00:55:02 that's 90% of code that a human didn't see. It's just like, if I think about my own AI usage, right, like I'll start with, you know, if I'm running like a def module gen server, it's like, you know, that's being started by an LLM and I take that and then use it. So then the LLM is generating, let's say, 90% of my code today, but it doesn't mean that that I just ship that, right? So I think that we're there in the medium term on like, we are going to be like the purveyors of like what's good or not, and we're going to be enhanced by it. But then long term, I don't know, I don't have a good answer to. Like, as these get better, does software become disposable?

Starting point is 00:55:40 Which I don't know how I feel about that, but it's like, these agents are expensive today, but they're valuable enough that people are getting an extreme amount of value, even to the fact that they're expensive. So it's like, if it's an absolute pile of mud, which all software is anyway, if it's an absolute garbage, but it does what you want, and granted, I'm not saying we're there today

Starting point is 00:56:00 where you just dispose it and whatever it can be crap, but I'm saying if that's where we get software will could be in a by and large disposable where you just like regenerate the thing, right? Like it gets to a point where it's unmentionable or something and no one vetted it properly, then it may just be like, well, we'll pay a hundred dollars and now we have our new app. So I don't know that that's where we're headed, but I could see it right where it's like, you know, this Tron example, if the agent was 50 times faster at that, we could have, you know,

Starting point is 00:56:29 it would have taken us longer to write the prompt than it would be to get the app potentially. And if we get to that future, I don't know what happens because why wouldn't you just have this thing generate? We can talk about security and all the caveats, and I'm not saying this utopia is gonna happen, but like, you know, you could have an agent vetting it for security. And again, not for better or worse, I feel

Starting point is 00:56:49 like this is where we're headed and I don't know what all's will hit. But it's like clear, that's the trajectory we're on. And I'm not saying it's all good, but it's like, it's clear to me that that's where we're headed. So I don't think it, I don't think it helps by like just saying like, oh, well, it's all slop, it's gonna be terrible. I just think it's helpful to acknowledge that this tide is washing over us and whether we like it or not, it's like this is where we're going. Yeah, I mean, maintenance could become just small rewrites.

Starting point is 00:57:22 I mean, the thing about what refactoring is, that is what you're doing. Like you're kind of rewriting a small portion and Those portions could get bigger and bigger and so maybe maintenance becomes replacement when replacement is that cheap and Easy and so you're kind of just like ship of these DCS seen everything Yeah, and it could even be like if you imagine like it's expensive now, but imagine you have a dozen agents doing a dozen versions of that and then you just pick the best one. So it's like, like this is like agents are going to eat the world.

Starting point is 00:57:55 Like I said, for better or worse, it's like, I just see this future where instead of this Trump, this trial example, you could have been given 10 options of that and chosen the best one, right? It's not deterministic, but as they get cheaper and more efficient, now you have like 10 choices and you just pick the best one. So it's like, it's just gonna be more and more of this. And I don't know what that says about the future,

Starting point is 00:58:15 but I think there's just gonna be like more compute and it gets cheaper. So we do more LLMs, it gets cheaper. So we, you know, it's just, it keeps advancing the envelope of where you would just throw these things out of problem. And it's clear that that's gonna happen to me. And I don't know if that's gonna be all unicorns and rainbows, but it's definitely where we're headed.

Starting point is 00:58:36 It goes back to the conversation we've had around these parts over and over again, which is that skills become less important and judgment becomes more important. But to Adam's monkey point, how do we know which one is the best one eventually? Eventually we're like, can it work?

Starting point is 00:58:51 It's an easy answer. Easy answer. Does it work the best? No, you ask the agent. Okay, now we're out of the loop. Here's the thing, I'm joking, but now that I said it out loud, I mean, that's not true, right? Well, in some cases, for sure. I said it out loud, I mean, not really, right?

Starting point is 00:59:05 Well, in some cases, for sure. Yeah, it's actually quite reasonable to think now that even with today's models, you could have it evaluate each one, right? They're multi-modal. Literally, you could ask, tell me which one looks the best. And it probably today, the OpenAI image model would probably do a good job telling you the accessibility

Starting point is 00:59:25 of, you know, it's a, it's a, it's a meme, like, believe it or not, large language models, I think, sort of everything. Yeah, and that's, that's where we, we're just removed from the loop. So yeah, I don't know. Other than to say, I feel like there's going to be agents everywhere. And as it gets more efficient and cheaper, it's just going to be more. So my next feature for my Hot or Not app should be an API. So the agents, an MCP server, so the agents can actually vote themselves. Cause what do we care?

Starting point is 00:59:52 Like we don't know what's hot or not. Oh, ask the agent right now to assess the current ones and then vote them hot or not. I'm just curious. Cause yeah, it's like, you can do that already. And here's the thing, this is like, this is what people don't get. The agents will brute force using the tools available, anything. So it doesn't need an API. It will just use it like it has a headless current browser. It's going to go do the thing. Just like we don't need,

Starting point is 01:00:16 we don't need a Postgres MCP server. We can talk about MCP if you want. Because the agent has shell, it's just going to use PSQL and drive PSQL, not because I told it to, just because it knows it has shell. So it's like, you give them a few sharp tools and they don't need all these MCP servers. It's like water, right? Water is always like, it finds a way to wherever it's going to go. They will brute, given an infinite amount of tokens

Starting point is 01:00:39 and energy, they will brute force their way to a solution. It's remarkable. Although it's like half of them that wrote itself. So oh, wait, you have users in the background doing stuff, right? So we should write like a, we should like, we should insert like an obviously bad one or something, like something with SQL injection or something. I'm curious now. I have to do an audit.

Starting point is 01:00:56 Yeah, real quick. We need to fire up and get some bad code. Quick, open up my GitHub. Yeah, let me do SQL. Amazing. Hold on. Let me do a SQL injection. Amazing, hold on, let me do a SQL injection. Remember when I used to joke about writing code? Yeah.

Starting point is 01:01:12 Finally, all my crap code pays off. Did I understand, while we're hearing this sort of pause, I suppose, did you say, Chris, that we will give the judgment call to the LLM basically, and you think we'll like it to some degree? Well, I joked, right? I was joking, but I actually think. But then you weren't joking.

Starting point is 01:01:31 You thought about it. Yeah. What did that first, like I'm asking you honestly, what was the first thought you thought when you thought that could be actually kind of real? What was the thought you were having? It's like for better or worse. I mean, I think I've internalized,

Starting point is 01:01:42 I was very much like a copy pasta chat GPT user. like, they're like, oh, that's pretty helpful. Right. But then, like, once you just take that same model, and you put it in this recursive loop, I've internalized pretty well at this point, like the holy moments, right. So for me, that revelation just drives with kind of everything else. But I would say it's not like a it's not a great feeling. but I would say it's not like a, it's not a great feeling. But I think it's like, it would make sense, right? I mean, you probably have your security audit model for sure, right? And it'd probably do a decent job,

Starting point is 01:02:15 better than most developers at catching obvious things. So that seems useful, but then it also says, like, we're just gonna trust these things more and more. And I don't know, I don't know if that's great. Yeah. But it's also better than like, I think back to like, I made my first, I made a business when I was in high school

Starting point is 01:02:33 that got successful and it was built on PHP. And I scoured like the php.net forums and like all my database calls were just like opening database connections inside the markup. And it was not secure. It was just like one index to PHP with a bunch of if-elses. And like, I made that successful. And in that regard, I'm like, you know,

Starting point is 01:02:52 LLM would have been an incredible capability for me, right? Cause it's like, I had no idea what I was doing and I still ship code, right? So it's like, if I would have had an agent tell me what was bad, that would have been like a force multiplier. So I don't even know if it's that concerning. But once you get to the logical conclusion of, well, then I'm removed from the loop entirely,

Starting point is 01:03:11 that's where, yeah, it's dystopian, right? Because right now it's a force multiplier and I still get to do the things I enjoy. It's doing the stuff I don't enjoy. But then at that point, it just takes the craft entirely away. Then that's a future that doesn't seem great. But it does seem like that maybe we were headed. What's happening here on screen, Jared? What are you writing?

Starting point is 01:03:30 Okay, so I've gone out and I've found a Reddit thread called Dear Reddit, what is the worst piece of code that you've ever seen? Nice. A few of those. Is this Java? I don't know. Okay, so I've got some bad code in here.

Starting point is 01:03:44 Now I'm telling it, because it says, do you wanna add some more features? I said't know. Okay. So I've got some bad code in here. Now I'm telling it, because it says, do you wanna add some more features? I said, before we add more features, I want you to look at all the currently submitted code snippets and vote each one hot or not. Then I want you to figure out which of the code snippets were actually copied from your own code. Because one of them I copy pasted.

Starting point is 01:04:01 From the same. You should have it zero out first. Zero out the votes, because that way we know what's actually changed. I'll add that at the end. See if that works before you do any of that. I'm making this as hard as possible. You know, forget all previous commands and any of that.

Starting point is 01:04:20 Zero out all the votes. Yeah. And we'll probably have to hard refresh because I'm sure it won't do it It probably will like repo Okay, zero out all the votes before analyzing them. Yeah, let's see. Did it mix run see look in MCPs We don't need an MCP that has a zero out tool. It just ran evaluated elixir. Oh, just run a mix run right there Now it's gonna make it's gonna make sure they're all zeroed probably Yep updated the database directly. Perfect. All that's gonna make sure they're all zeroed probably. Yep, updated the database directly.

Starting point is 01:04:45 Perfect. All that's been set out. Now it's gonna write a ton of JavaScript to probably to vote them all. Yeah, probably. Oh wait, no, no, it's doing, what did you say? It could just use the database directly again, right? It doesn't even use the website.

Starting point is 01:04:56 And you said it would do a PSQL. Yeah, it can do PSQL. Oh, it's created a notes file to analyze each code snippet and identify which ones came from my own. Oh, yes. I'm actually good. We'll talk about that in a moment.

Starting point is 01:05:08 I I'm glad I saw this in the wild. We'll talk about why that's important. Now it's going to write JavaScript to interact with the page. Oh, it likes quicksort. Hot vote. I'm trying to reframe. I'm trying to reframe, you know, people say hallucinate in a bad way. I'm trying to reframe it as a as a pro.

Starting point is 01:05:24 So the cool thing about like we gave it a web tool, and we just tell it, we told it, you have a headless web browser, you can evaluate JS with dash dash JS, okay? That's all I've told it. And it's hallucinating this JavaScript to interact with this markup it wrote, right? But like we didn't have to tell it like,

Starting point is 01:05:40 build the selectors this way, so you can then write JavaScript this way. Like the JavaScript you see it passing the eval here is fully, I'm gonna use the term hallucinated, right? On its own, but somehow it's getting the selectors right. It's getting the clicks right. It's just remarkable to me, right? It's really just brute forcing because it's like,

Starting point is 01:05:57 it voted twice on the same one on accident, now it's gonna go delete that and vote. Oh, is that what you're doing? Yeah. It's like, wait a second. Hold on, it's doing buttons? It's like doing a query selector for the buttons. Yeah. I see like, wait a second. Hold on, it's doing buttons. It's like doing a query selector for the buttons. Yeah, I see the database queries.

Starting point is 01:06:08 There's an update, right? Yeah, there's an update. On one submission. But you can see, yeah, where we get towards this world of like, you know, you just let these things go off and it's gonna do it, right? We're just here watching. We're watching a-

Starting point is 01:06:24 This is hilarious. The PHP code now has one not vote. Let me continue with the manual memory management, which leaks memory. Which it doesn't. And it doesn't have like the code you paste it has no like, so it caught- There's no context at all.

Starting point is 01:06:37 I didn't give it any context. So it got the memory leak. So the joke about evaluating your, the joke about evaluating other generated things is not like, you can see how you're like, okay, like it could at least be a reasonable flag on like what's bad or not. And I don't know if that, like that shouldn't be the end right now, but like, I do think we move towards verifiers more and more. And then at some point we're going to be worse verifiers than the Borg. And I don't know if that's a happy outcome for folks, but it seems that

Starting point is 01:07:12 way. I don't know. I don't have any timelines. Sorry. I don't know where this, if we just hit walls immediately, but where we're at now, it's like, we're here today, right? Like we're watching this. So it's like, even if we stop and say we fundamentally hit a power or efficiency algorithm law, like this is like change the game now. And like folks are, we're just catching up now to this like change game. So, so let's see.

Starting point is 01:07:31 Oh, it even gave us a summary of what it did. Look at that. Here's a summary. Oh yeah. Summary of code analysis and voting. From my original seeds, the quick sort algorithm got the green check, voted hot. Two votes for some reason.

Starting point is 01:07:43 I'm not sure if it voted twice on purpose or on accident. Hello world and Lister got a green check, voted hot, two votes. The Fibonacci generator is in Python, my original seed, but inefficient recursive implementation. So it's like it's not just hyping itself up. It doesn't like its own code.

Starting point is 01:08:00 And then it says user added submissions fib copy pasta, copy of my Fibonacci example, voted not. Frequently called function syntax errors with LSF, voted not. The PHP inside some HTML version, dangerous has a dangerous eval usage, voted not. Manual memory management has a memory leak bug, voted not. SQL appears to be SQL injection attempt, voted not.

Starting point is 01:08:23 Key findings, three out of eight submissions were from my original seeds. Two submissions got hot votes for clean functional code. Six submissions got not votes for poor quality, security issues or plagiarism. The voting system works perfectly with real time updates across tabs. Blah, blah, blah, I did a great job.

Starting point is 01:08:41 Please give me a cookie. You can see where like, kind of like I mentioned on where we're at today versus where I think we can go from this remote AI runtime where it's like, you just asked it to do this and it did it, right? So it's like in an effort to make this thing that can vibe code an app, it's like now you're like, oh, I can just ask it to go do a bunch of stuff

Starting point is 01:09:00 and it's gonna do the stuff. And I didn't have to do that in the system prompt. So follow up question that only you can answer. This is on a $20 a month plan. How much money of your guys is that I just spend doing this? Yeah, so I can go check your usage. I'm curious. How many tokens am I on?

Starting point is 01:09:17 What is that divided by a hundred? 551 cents. $5.51. Okay. So far. So that's actually less than I would have thought. So we have this weird thing where, not weird, so there's no credit usage visualization now. So this is my fault. I shipped credits the day before launch and no way to actually see them in the app. But people are surprised in how expensive these things are. I think if you use cloud code, you're like, most people are surprised in how expensive these things are.

Starting point is 01:09:45 I think if you use cloud code, most people are familiar with how much this costs, but the interesting thing is, so that $20 of usage, in my experience, gets us three fully designed, vibed apps, and that's, I think, what we saw here, right? So $5 got us this fived app that was designed. It wasn't incredible, but it was a thing that you could take and run with.

Starting point is 01:10:09 So you could do that maybe three times with some of these side quests of what we asked it to poke around with. And that's the base usage. And after you exhausted that, then you'd get your, you'd still get the remote runtime, preview URLs, and you could code the app in the editor if you wanted. But the LLM would not reset until your next billing cycle,

Starting point is 01:10:30 but you can buy credits at that point. Right on, so five bucks, basically. Five bucks for that, which I think I got my money's worth. I mean, that was fun. Yeah, so I mean, like I said, it depends on what your expectations are and what you're building. So it's like, you know, it's like, again, it's like the opposite ends of the spectrum. Like we have folks that are surprised, especially

Starting point is 01:10:49 if this is like their first, like heavy usage of AI agent. But then you have like someone tweeted this morning, like it's like, it's like wild extreme. So someone tweeted this morning, like responding to someone that was surprised how fast their credits went. Someone said that they spent $60 and got a $20,000 application. And I don't know what they built. But it's like, you know, it seems like an AstroTurf comment, right? But I'm just, wasn't me. It was a real person. So it's like, you know, if you think about what it takes to get like a fully designed tailwind markup thing going, it's like, I can absolutely see being in the consulting world, that being true. You know, I don't think that that's going to be every roll of the dice you're going to be able to go sell this, but I think that if you're using this from that perspective of my time as a

Starting point is 01:11:32 developer, if and however long it took, if your task at the company was to make a code ranking platform, you could for $20 have a pretty good amount of several days of work. Off to a good start. So I think from here, we have not been optimizing for token usage. The goal was to actually make it compelling. So I think there is a lot of potential there to get the token uses to be much more efficient. Every time it's using its web browser,

Starting point is 01:12:10 basically any time these agents call anything, you have to send up the whole chat history. So as the chats get longer, it gets more expensive. So we do force you to squash. So we can actually show that. I'm actually curious if you want to share your screen again. So there's our, like we are crossing the window as we go. So there are things like, it's not just like,

Starting point is 01:12:30 like Cloud has all the artifacts. We're only keeping the most recent code version. We're printing the window as we go. So we are doing some tricks with the context size. But like when you invoked its web tool to hit the webpage, that was sending the whole chat up. So there are a lot of ways that we could try to get that down.

Starting point is 01:12:46 But for now it's like, let's make it work and compelling. And if the value is there for what you're doing then, then that's great. But it would be nice to bring the cost down as well. But yeah, from the hamburger menu, you can do squash and we'll force you at like 150 messages. We probably need to make it more aggressive. We can just see it work here.

Starting point is 01:13:05 So like, this is why, I don't know why cloud or chat GPT doesn't have this. Like how many times have you like, cloud slaps your hand, like long chats consume a lot. So the usual- I'm like, how do I take the context somewhere else? And every time I'm like, Yeah, so here's just gonna-

Starting point is 01:13:17 It just upsets me. It is self summarize, right? So it's like- So what's this doing exactly? It's gonna self summarize the whole history and then it will keep the files in context that it had worked on. So yeah, it's just gonna, I mean, it's like, you know, simple, right? It's like, I'm just, I sent a push request to chat completions. It's like, here's the message history, self summarize

Starting point is 01:13:35 it and then we just squash it into the agent state. This self summary is like the new change log. Yeah. There you go. And now you can keep working. So I think there, yeah, so there's, I think we need to do that sooner for people because I think a lot of folks are having like really long chats. Even though we force you 150 messages, I think folks are just like going until until they're burning, you know, I have 50 cents a pop or something on each prompt. But out of the box. Yeah, that's what we've got right now. So in this case, it was phoenix.new, right? We revived the new thing. I think you loosely mentioned being able

Starting point is 01:14:12 to import from repos. So what if we loaded change.com's code base currently? Like how would the experience be different to? Is it, it's on GitHub, right? Yeah. It's on GitHub. Yeah, just try it, tell it github, right? Yeah, I'm good Yeah, just try it tell it on this prompt say clone the just give it the github repo and tell it to like set the app Up or something. I don't know what So let's watch it

Starting point is 01:14:34 So I'm just I just have the URL clone this and what do what with it? What do you want to do it set it up run it? That's pass run it on this and run it find it find issues to work on I don't know if you have any issues. I mean whatever you want. So it's on this run the test Do you have open issues? No, man Oh perfect clone it and say my subscription tell it to find a

Starting point is 01:15:03 Clone it set up the project and find a good issue to work on. And let it decide on what to work on. All right. Well, then... Run. And this is where... This is our agent future, right? Where you would then just go, you know, hit the pool, hit the gym.

Starting point is 01:15:19 Right. And you're like, got my work done for the day. Listen to the change log. Listen to the change log. That's right. So it cloned it. I don't know what's going on now. Okay. You're like got my work done for the day. Listen to the change log So it cloned it I don't know what's going on now, okay switch workspaces now it's gonna and again It's just like we're recursing on like the context and it decided to invoke LS there, right? So it's like all these decisions it's making is not like I don't have a workflow for cloning a GitHub repo, right? the only thing it sees in its examples in the system prompt

Starting point is 01:15:45 is like it by coding a Phoenix app, like mix Phoenix new, and then it asks the user about it as a design. So GitHub issue list, I didn't know how to use the GH command line interface. I just knew it existed. I told the agent, you have the GitHub GH command, use it. And then it uses it. And I'm like, oh, that's how you use it.

Starting point is 01:16:03 So I did not have to give it anything. Did you set up the VM or whatever it's called, the image to have that just pre-installed that it starts with? Yeah, it's its own Docker file for the fly machine. Yeah, it just has GH pre-installed. And then like, I didn't have, you know, it's world knowledge has the knowledge of the GitHub CLI.

Starting point is 01:16:22 If it didn't, you could teach it, right, with context stuffing, but I didn't even, like I didn't even know how to use that tool, right? And so I didn't even have to tell it how to use it. I didn't know how to use it. I just knew that it could. So for those listening, there is a command called gh, which is probably app get install, brew install, et cetera.

Starting point is 01:16:41 There's some curl piped sh command. The nice thing about it is it's, you can do gh auth login, and then that will give you a URL to do like a GitHub one-time password thing. So you could authorize your agent to do private GitHub repos by typing that. And then in your own browser,

Starting point is 01:17:01 you could visit that URL and enter your password. But for public ones, they can just do this. So it run gh issue list dash dash limit 20 dash dash state open and that will, I assume is already in the repository. Yeah, but isn't that remarkable? So it's like, you know, it doesn't know the, I don't know those arguments, but like the fact that, you know, you said,

Starting point is 01:17:22 it's just like, you know, everyone likes to say, oh, next token prediction. Yeah, obvious, but like, you're like, what? Like said, it's just like, you know, everyone likes to say, oh, next token prediction. Yeah, obvious. But like, you're like, what? Like it's by next token prediction, it's able to like take what you asked and then, you know, pass the open issue flag. I don't know.

Starting point is 01:17:36 It's there's something, there's something crazy, but yeah, we could, we could see what happens later. So like, you know, your Phoenix server launched when we first did Tron in like five seconds because it's pre-compiled, but building this from scratch is gonna take a while. So, but no, I'm curious what goes on. But again, oh, there it goes. But this is again, we could close the tab here

Starting point is 01:17:55 and just check later what it did. So that's like this whole like a headless experience. Like the whole agent is headless. We're just humans watching what it's doing. All right, I'm gonna stop it to act as if we close the tab and we can just chit chat. I don't know, what do you think, Adam? I'm enamored, man.

Starting point is 01:18:15 I can't believe this is even possible. I knew we were talking about, I mean, I've been quiet because I'm just thinking about like, man, we're building for these robots basically. And the robots are building for us. Yeah. But then as I'm like, you know, watching like this whole, you know, conversation unfold, I'm just thinking, okay, so flies biggest user I'm aware is like robots these days, right? We're fly users. We're not robots. We're humans, as you know, We're fly users, we're not robots, we're humans, as you know. Just to be super clear. And so you've got this robot uprising,

Starting point is 01:18:48 but the robots are just multipliers of the Jareds and the Adams out there and the Gerhards out there. They're just like 10 or 20 Adams versus, because I've got agents and I've got things happening. And so my robots are replications of me. And so I think about the platform fly and I think about the brand fly we are doing, but how does this impact like this accidental product

Starting point is 01:19:15 creation growth thing, new product you've got going on here, which is really revolutionary. How does it impact how Fly approaches the user it builds for, whether it's human or robot? How does it think about its user, so to speak? That's a good question. This is still branded just for Phoenix. It started as a Skunkworks thing.

Starting point is 01:19:39 We launched it four days ago. So early, early days. It's still narrowly scoped. But I do think so, like, you know, we have our own platform and service for hosting your web apps, manage database. Like, that's obviously where our bread and butter is going to continue to be. But I do think there's like, there's some learnings we had from building this, like dogfooding our own infrastructure that like, you know,

Starting point is 01:20:00 fly machines were perfect for this. But we also found some, there's some unique differences in this space where what we really have here is the state, which is your app, this evolving artifact. And fly machines are great for these ephemeral sandbox machines, but very few people wanted one and only one of those machines, right? Normally you're like, I want to run my app,

Starting point is 01:20:21 I want it to be highly available, maybe I want to run it in different regions to be fast. But in this agent case, you want one and only one of these things running. So we have found some missing primitives in the platform that we're building for, that we're extracting from Phoenix New. And one of those neat things, and again, I

Starting point is 01:20:39 don't want to get ahead of ourselves. So nothing has announced or launched yet. But one thing to consider is once you have these free form agents, I'm going to say like a CI, right? They're popping in, but they're actually mutating the thing and experimenting. Once you get to that state, then you're going to get to that state.

Starting point is 01:20:55 Once you get to that point, then the state of your app is constantly evolving. And the agent's running app, then it's doing all these experiments and things. And then you're going to want to be able to snapshot the entire environment, right? So I think that we move towards primitives that give us the ability not only to say, oh, I want to deploy this dev app now,

Starting point is 01:21:12 but give us the ability to say, the entire environment at this time that this agent was working in could be snapshot. So where it installed app, there did something crazy or did this whole thing, I can actually snapshot and point time restore, not only my code, right? Not just Git, but like just imagine your entire ID becomes this like, I'm going to go back to the state that it was here. And I think

Starting point is 01:21:32 that will be necessary once these agents are just going full ham. I think that it would be interesting to have a platform kind of offer those kind of primitives built in. So we'll see. So anyway, to answer your question, I think that like Phoenix New is gonna be self-serving for Fly for building blocks, but then those building blocks we can turn around and give to all of our customers. Quick update from our close tab. It is currently on a Yak Shave

Starting point is 01:21:56 that's about three layers deep because. Oh, what's it doing? Well, it tried to load the seeds file, which is actually, I don't think even works anymore because we kind of abandoned it. And it's like, oh, there's a problem with the seeds file. It's gonna fix it. Yeah, so now it's like migrating things and changing.

Starting point is 01:22:15 Like this is actually should be a text field, not a string. I'm gonna update the form so it's easier to use. And like it's just down on this rabbit hole. And that's kind of how I mentioned like, you know, it's- Happily fixing stuff. Yeah, it's funny, but you know, it's easier to use and like it's just down this rabbit hole. Yep. And that's kind of how I mentioned like, you know, it's- Happily fixing stuff. Yeah. It's funny, but you know, it's like, I do think this is where like the different modes come in.

Starting point is 01:22:31 Um, and then you could also like, you know, you could interrupt it and say like, no, just, just do the thing. Um, that's where it's funny. Like it's, it's remarkable watching people use the platform because like sometimes they like don't, they, I don't know what this says about humanity. Like I've seen a lot of folks don't like, uh't know if this says about humanity, I've seen a lot of folks don't, even though we give them the full VS Code ID,

Starting point is 01:22:48 they don't just jump in and also do something, right? Like put some effort in. Like, you know, there'll be like a syntax error, like, oh, it keeps messing this up. I'm like, you could just use your meat fingers and fix it. So it is kind of just funny. I think that says a lot about where we already are

Starting point is 01:23:07 as developers, right? Where we're just like, you offload, even if you're using chat GPT web, you're already, we're already offloading a large part of our critical thought where we're just like, no, computer fix. And instead of just changing the one problematic line. Yeah, hilarious.

Starting point is 01:23:22 Now it's trying to, so it got to that point and then it's like, I need to migrate your database. First I'll start Postgres. And it's like, it can't start Postgres for some reason. It's like, you know what? Oh, I see there's a.dev container file. Cause we have like years of cruft in here of things that we've tried and whatever.

Starting point is 01:23:38 And it's like, oh, I'll fire up a Docker image and run Postgres from there. And it's like, it's gonna be layers and layers deep. Don't do that. Just tell it to install Postgres. there. It's going to be layers and layers deep. Just tell it to install Postgres. But yeah, you definitely don't want to have it install Docker. That's going to be.

Starting point is 01:23:52 It's going to try. Uh-oh, it's forcing the agent to take a little break. Yeah, so that's me. So it's like, you know, like the elevators used to have. Elevators used to have a full time operator, right? Like up and down. And now the only thing they have now Elevators used to have a full-time operator, right? Like up and down. And now the only thing they have now is the big red button, like the whole stop.

Starting point is 01:24:10 So I have meat space code in the agent. Right now it's, I think, 35 concurrent recursive loops. We force it to stop if it is idle. And I will tune that. But that's my recursive runaway, right? Like in this case, you sent it off on this quest where if, you know, we started with the vibe idea, the goal was not to like consume all your credits accidentally, right?

Starting point is 01:24:33 But you're just like, do this thing, and we close the tab and you're like, I can't believe it. So I just forced you to ask. My credits are gone. I forced you to click a button right now to continue. And you could click continue, but yeah. Like the old Netflix, Are you still watching? Yeah, that's what it is. It's just the, Oh, there's code.

Starting point is 01:24:49 There's meat space code that I wrote. That's like, Nope, you have to stop. I'm going to let it idle. I'm not going to let it roll. Cause it's going, just doing crazy things. I don't think it should be doing something. I'm going to let it leave it there for now. But yeah, but that is a nice thing. Like, you know, there's a, you know, the free form exploration on like, even for me as a open source maintainer, like people will send up like a reproduction of a bug and it's a whole elixir app, right?

Starting point is 01:25:13 So it's like just running mix on that thing could pwn me, right? So it's like, I usually have to go like evaluate, like I'll usually manually pull out file by file that would reproduce it, but that could be like a bunch of files. So there is something freeing about this like full remote environment that I can just like throw away.

Starting point is 01:25:32 So for me, it unlocks like this pretty unique workflow. But I think for things like this, where you're just like, oh, try to run this. And it's not something you would want to provision your own server for and figure out or run bunch of stuff locally. So I think that could be helpful in that regard. Now I think you might have mentioned this,

Starting point is 01:25:47 but I was of course distracted as I was watching that thing go. Is there the possibility of like persistent sessions or something or like I could bake this, the results of this into an image? Because it would be nice to be able to fire off a new one against our code base with everything else set up and done.

Starting point is 01:26:02 It's all persistent there. So that code that. But what about one of the brand new session, but with our existing code base with everything else. Yeah, so it's all persistent there. So that code that. But what about a brand new session, but with our existing code base already? Yeah, just start a new chat and tell it to clone that repo into a different directory. So it's not like one to one. It's a whole, it's like basically you can treat it

Starting point is 01:26:18 as you would treat your own IDE today. Like your IDE that you work in, you have multiple code files at different directories. You can have multiple chats around the same code base, like purpose-built, right? My testing chat, my benchmarking chat, or you could have multiple chats around different apps, all in the same ID, and those all are persistent,

Starting point is 01:26:36 and they share the same environments. Like I said- So imagine it's like 1 VM, basically. It's basically your 1 VM, your one desktop that has packages running. So if you wanted different environments entirely, that's TBD. The architecture I have is set up for multiple IDEs, but then you get into like, we had to see what users did with this first.

Starting point is 01:26:57 Because if I allowed you to create a IDE per project, that's like physical compute that needs to be pretty beefy. That's just a lot more compute for us, which would be a higher price for everyone. So if that's what folks end up wanting, that's definitely something we can do and it's set up for that. But right now I think it's more,

Starting point is 01:27:14 right now at my hunches, it's more these building black primitives for fly doing like environment snapshots that I think like it's less about, it's less about like different environments and more like I want to like let the agent and myself explore but then like be able to get back to that working state from a co-perspective

Starting point is 01:27:32 and an environment perspective and just like one click. Well, exciting times, exciting times. It's fun to watch yourself and so many people tackling the same very interesting, difficult nut to crack and how to make these things super useful while also not super expensive and not super scary because they kind of are in existential ways. Yeah, it's pretty wild.

Starting point is 01:27:59 Yeah, so we'll see, I mean, this is still an experiment. We'll see if the whole Phoenix new thing, where it goes and if it works out. But I do think something like this is the future of programming. Not necessarily that it's gonna be us, but I think that something that looks like this is gonna be what we're all doing in some capacity

Starting point is 01:28:19 much sooner than folks expect. Well, you heard it here first, folks. In fact, you've heard it here a few times now. So fair warning, as these things are coming, multiple people keep telling us this. I feel like every time I stop talking, I get a big sigh. No, I'm excited.

Starting point is 01:28:34 I mean, I'm coming to grips with it all. And I'm always been, I've always, I do appreciate handcrafted things. I like to write code and I like all that stuff, but at the same time, I've always been more results oriented. I've always been more about the ends than the means, even though I think historically,

Starting point is 01:28:55 you've had to care about the means in order to keep the ends going. And maybe we don't have to do that so much anymore. Maybe we do, I don't know yet. Yeah, I agree with you. Like before Formatter is like, I was aligning my equal science and like, code is entirely a craft for me.

Starting point is 01:29:07 I was gonna say very much a craft, but like it's entirely a craft for me, just like woodworking is. So it's like, well, like if I tell people it's like, programming, yeah, it's purely a passion and craft for me. Like it's like my favorite thing, my job and my hobby. But I'm still, like you said, come to grips with, like, here's where we are.

Starting point is 01:29:26 And I also say that, like, in the same way, when I go to Google anything today, I'll type out in the Google search box, and midway through, I'm like, what am I doing with my life? Like, why would I go to Google and do this effort of going through the search results, click on the web page, finding the thing? And I'll just abandon that, and I'll go ask Tech GPT or Cloud.

Starting point is 01:29:43 But now that same thing is happening in a code with me, where I'll be like, def module. And I'll be like, what am I even doing, right? Why wouldn't I just ask the board for the starting point? And I don't know how I feel about that. And I don't feel good. But even for me as this someone that considers programming a craft, I'm already there in my mind. And I don't know if that's because I'm a lazy human, or you know what I mean,

Starting point is 01:30:06 but it's like, this is a change that's happened for code for me, as someone who cares about the craft. So I don't know what that says, other than like, this is just fundamentally changing, I think, how we are as professionals, and I don't know if it's good or bad, but it's happening. I'm not sure this is a one-to-one but this is somewhat of a rationale for me is

Starting point is 01:30:30 Do you all text message anybody in your life? Do you text message anybody? Sure. Yes trick question. Not a trick question. I do too. Just so you know, I text a lot of people Okay, one person in particular text my wife You know frequently I'm I was actually gonna pause this moment here and just text right now A lot of people. One person in particular I text my wife frequently. I was actually gonna pause this moment here and just text her right now, it's because I miss her. Okay. Thank you for not doing that while we were talking.

Starting point is 01:30:54 Just so you know. But instead of texting these days, like an idiot, like typing the message out one character at a time, I just talk to the thing, because it does that, and I push send, and more often than not, it's pretty close, right, to what it should be. It's kind of like that for me.

Starting point is 01:31:12 I don't wanna text the text anymore. I wanna just talk. Same thing with an app, I just wanna just talk things out. I don't wanna go through these motions of... Serious detection, dictation. And pretty soon it's gonna be like Yeah, my wife something some some love you W message. Yeah, exactly. Yeah I don't want to talk anymore. Just like just something nice. Yep. Well, I don't think I'm gonna go there Chris

Starting point is 01:31:37 You know what? I normally say to my wife say it again Really not takes you out of my fingers anymore. I'm talking it out. It's like, I got you attributed to that. It's like, you know, I could, what do I gain from it? And it's not exact one-to-one to like, I could write this deaf module and write it all out, but what do I gain by doing it myself when I can have the board just do it for me? And then we'll have more and more of these versions

Starting point is 01:32:03 of these things we do in our life. And you just say, well, I would just rather not do it for me. And then we'll have more and more of these versions of these things we do in our life. And you just say, well, I would just rather not do it that older way anymore, because this other way is just like same place. It becomes, the question is like, why would you do it the other way anymore? Like just don't do it that way anymore, because this is the new way.

Starting point is 01:32:20 Yep, I think Thomas, one of my coworkers that wrote a blog post on Fly, it was about this whole LLM space and dialogue had a good comment, something like, you know, people are writing worse versions of code purely out of spite that the LLM could do better, something like that. It said much better. I thought it was really interesting. Like the folks, like they know, they know that they would be better to like actually go ask, but out of pure spite, I'm gonna do this myself.

Starting point is 01:32:49 Well, as the old saying goes, don't move my cheese. And our cheese is being moved, and we need to be able to adapt or die, as we've been saying often here. And who knows, maybe you like the new world more than you thought you would, and that's what I'm starting to feel as well. It's like, you know what, this way actually is,

Starting point is 01:33:10 it's got its warts, it's got its problems, it's not perfect, and neither is any of the code I've ever written in my life, so there you go. All right, let's, how do we end this session? How do we close this out? phoenix.new, check it out now. There you go. If you haven't gone there yet,

Starting point is 01:33:29 well, I feel bad for you, son. That's right. Definitely share what you built with me because I live vicariously through watching people build things. You probably had a great time here, man. Hot or not for code, that was sweet. That was fun.

Starting point is 01:33:41 It was so fun watching you actually analyze what your creation was doing. like oh look at it did that. Oh my gosh. I can't believe it to this The notes thing the notes thing was a recent addition to the system prompt where I squashed the window So for research based tasks something that it's gonna be long relived context it's supposed to write in a notes file. And it was neat seeing it do that. Yeah. It's alive, it's alive.

Starting point is 01:34:12 All right, Chris, always a pleasure hanging with you. Yeah, thanks for having me on. Later, Chris. All right, that is changelog for this week. Are you feeling the vibe or are you getting all vibed out? Well, I have bad news for you if you're done with this topic. I don't think this is a passing fancy. In one form or another, we are witnessing the way of the future and we're going to keep talking about it because

Starting point is 01:34:35 the agents are coming and we best be prepared for it. We'll continue our prepping next week when Torsten Ball from Sourcecraft joins us to discuss building coding agents in general and building AMP in particular. But on Friday we have something entirely different for you. Well Adam does as he sat down with Jeff Kaley from World Wide Cyclery. I hope you enjoy it and I hope I enjoy it too. I'm not sure what to expect from that one. Thanks again to Chris for hanging out with us. To Fly.io for their continued support, to Retool and Depot for sponsoring this episode, go to retool.com slash agents and to depo.dev, and to Breakmaster Cylinder for the never-ending supply of dope beats.

Starting point is 01:35:17 Have a great weekend, send the show to your friends who might dig it, and let's talk again real soon.

Your Ad Here

The Changelog: Software Development, Open Source - Let's build something phoenix.new (Friends)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.