The AI Daily Brief: Artificial Intelligence News and Analysis - AI, Agents and Software 3.0

Starting point is 00:00:00 Today on the AI Daily Brief, software in the era of AI or software 3.0. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Hello, friends, quick announcements. First of all, thank you to our sponsors for today's show, KPMG, Blitzy, Plum, and Vanta. To get an ad-free version of the show, you can go to patreon.com slash AI Daily Brief. Now, today is one of our weekend episodes, classically are Long Reads episodes, which really in some ways are, aren't about long reads as much as big ideas. And we have a very big idea to explore today.

Starting point is 00:00:39 This one comes courtesy of former OpenAI co-founder Andre Carpathy. A couple of weeks ago, Andre gave the keynote at the YC Startup School. Subsequent to that speech, it was published to YouTube. You should watch the whole thing. I will include a link. We're going to discuss it and try to contextualize it, so this is not meant to just repeat what Andre said. But the funny thing about it is that the video wasn't posted right away. And people were going so crazy for the and there were so many tweets and X posts about it that the fine folks at latent space actually were able to hack together the slide deck out of clips and pictures from X. The speech is in many ways about redesigning the world of software for LLM Native operations

Starting point is 00:01:17 and LLMs being a new type of computing. And one of the really interesting things that Andre notes is that while software stayed largely the same, at least from a paradigm perspective for about 70 years, we have now had two big shifts in a very short period of time. And we'll get into it a moment his articulation of those shifts, but you can also see this just in the discourse around the engineering field. A couple of years ago, latent space wrote an incredibly important post called the rise of the AI engineer.

Starting point is 00:01:46 And the distinction that Swix was trying to draw here was that when we were talking about AI engineers now, we were no longer just talking about machine learning researchers and data scientists. We weren't just talking about people who were dealing with training and evaluation and inference and data. we were dealing with people who were building on top of this new ecosystem focused on product and taking advantage of foundation models, agents, new tooling, and infrastructure to redesign how people interact with software. That piece actually quoted Andre Carpathy back then as well,

Starting point is 00:02:15 when he said, in numbers, there's probably going to be significantly more AI engineers than there are ML engineers. And at the time, Swix was trying to put some context around what this actually meant. He said that there are no end of challenges in successfully evaluating, applying, and productizing AI. He talked about model selection, tool selection, and just keeping on top of research, progress, and new opportunities. The conclusion, which seems so obvious now, is that this was a full-time job. Quote, I think software engineering will spawn a new sub-discipline, specializing in applications of AI, and wielding the emerging stack effectively, just as site reliability engineer, DevOps engineer, data engineer, and analytics engineer emerged. The emerging and least

Starting point is 00:02:55 cringe version of this role seems to be AI engineer. Now, even what AI engineering means has continued to evolve over the last couple of years. If you were listening earlier this week, we talked all about context engineering. The definition that Langchains Harrison Chase gives is this. Context engineering is building dynamic systems to provide the right information and tools in the right format, such that the LLM can plausibly accomplish the task. In other words, it's about giving AI models the context they need to accomplish their goals, something that's become even more important in the architecture of agents, which are dealing with much more context and much more complexity. The point is that the very field of software and engineering is continuing to evolve, and that's basically the context

Starting point is 00:03:35 for Andre's speech. In software 1.0, it was computer code being written by humans to program a computer. Software 2.0, which Andre wrote about a number of years ago, shifted computer code to neural network weights learned from data, with the output being the neural net itself. In software 3.0, large language models can themselves be programmed with natural language prompts. To quote Andre himself from back in January 2023, the hottest new programming language is English. Discussing the transition from software 1 to software 2, Carpathy drew on his time at Tesla.

Starting point is 00:04:08 As the company built out autopilot, the code base was largely written in C+++, but most of the visual data was handled by the neural network. Over time, as the autopilot improved, the neural network component grew while C++ code was deleted. Carpathy said the software 2.0 stack quite literally 8 through 3.3,000, the software stack of the autopilot. He believes we're seeing the same thing again with the proliferation of LLMs. Carpathy described LLMs as functionally a type of programmable neural network. Rather than a set path, the user can program the LLMs to produce a variety of different outcomes. Now, this is not

Starting point is 00:04:38 about vibe coding or getting an LLM to spit out lines of traditional code. This is about shifting our thinking to consider the use of LLMs themselves as an entirely new type of software. By way of example, if you're prompting an LLM to produce a deep research report, that's akin to writing a Python script that could search the web and summarize data. Of course, there are a huge number of differences, but the key point is that we're talking about using an LLM to achieve a particular outcome in the same way you would use a traditional program. And because of all this, he argued that we need to think about LLMs in a slightly different way. He argued effectively that AI is the new electricity, and pointed out that LLMs feel like they have the properties of utilities right now. Carpathy drew links

Starting point is 00:05:16 to how infrastructure is built, how tokens are metered, and even how brownouts in AI when a major service goes down, can be similar to the electricity shutting off. He also argued that LLMs are like computer chip fabs, that they require massive CAPEX and have deeply held secrets in their construction, naturally trending towards a small number of powerful players. Finally, though, he settled on the analogy of LLMs as operating systems. Rather than thinking of LLMs as similar to electricity, where every electron is the same, he argued that LLMs are now complex ecosystems, where there's differentiated functionality, tool use, and performance. Giving a direct example, he noted that cursor can be run using models from OpenAI, Google, or Anthropic, each with different outcomes.

Starting point is 00:05:55 Looking towards the future, he noted that we're still in the 1970s era for the LLM computer, with large centralized players serving a very finite amount of compute. But Carpathy anticipates something similar to the PC Revolution coming to LLMs, with users able to run them on their own hardware eventually. Taking the analogy further, he suggested that current LLMs are still very similar to using an operating system directly through the terminal, arguing, I think a GUI hasn't yet been invested in a general way. Shouldn't chatGBT have a graphical user interface different to the text bubbles? Today's episode is brought to you by KPMG. In today's fiercely competitive market,

Starting point is 00:06:29 unlocking AI's potential could help give you a competitive edge, foster growth, and drive new value. But here's the key. You don't need an AI strategy. You need to embed AI into your overall business strategy to truly power it up. KPMG can show you how to integrate AI and AI agents into your business strategy in a way that truly works and is built on trusted AI principles and platforms. Check out real stories from KPMG to hear how AI is driving success with its clients at www.kpmG.org.com slash AI. Again, that's www.kpmg.comg.coms slash AI. This episode is brought to you by Blitzy. Now, I talk to a lot of technical and business leaders who are eager to implement cutting-edge AI, but instead of building competitive modes,

Starting point is 00:07:13 their best engineers are stuck modernizing ancient codebases or updating frameworks just to keep the lights on. These projects like migrating Java 17 to Java 21 often means staffing a team for a year or more. And sure, copilot's help, but we all know they hit context limits fast, especially on large legacy systems. Blitzy flips the script. Instead of engineers doing 80% of the work, Blitzy's autonomous platform handles the heavy lifting, processing millions of lines of code and making 80% of the required changes automatically. One major financial firm used Blitzy to modernize a 20 million line Java codebase in just three and a half months, cutting 30,000 engineering hours and accelerating their entire roadmap. Email Jack at Blitzy.com with Modernize in the subject line for prioritized onboarding.

Starting point is 00:07:56 Visit blitzie.com today before your competitors do. Today's episode is brought to you by Plum. You put in the hours, testing the prompts, refining JSON, and wrangling nodes on the canvas. Now, it's time to get paid for it. Plum is the only platform design for technical creators who want to productize their AI workflows. With Plum, you can build, share, and monetize your flows without giving away your prompts or configuration. When you're ready to make improvements, you can push updates to your subscribers with a single click.

Starting point is 00:08:26 Launch your first paid workflow at useplum.com. That's Plum with a B and start scaling your impact. Today's episode is brought to you by Vanta. In today's business landscape, businesses can't just claim security. they have to prove it. Achieving compliance with a framework like SOC2, ISO-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-2-HIPA, HIPA, GDPR, and more, is how businesses can demonstrate strong security practices. The problem is that navigating security and compliance is time-consuming and complicated. It can take months of work and use up valuable time and resources.

Starting point is 00:08:57 Vanta makes it easy and faster by automating compliance across 35-plus frameworks. It gets you audit-ready in weeks instead of months and saves you up to 85% of associated costs. In fact, a recent IDC White Paper found that Vantta customers achieve $535,000 per year in benefits, and the platform pays for itself in just three months. The proof is in the numbers. More than 10,000 global companies trust Vanta. For a limited time, listeners get $1,000 off at vanta.com slash nLW. That's V-A-N-T-A-com slash NLW for $1,000 off. Now, when it comes to a different era of software, what's most interesting is about how it is different from previous eras. One example he points out is that during Software 1.0, the early adopters

Starting point is 00:09:41 were governments and massive corporations because they were the only ones that could afford to operate mainframes. A similar thing was true in Software 2.0, with neural networks largely the domain of research labs and tech companies. This time, however, the everyday user was the first adopter of LLMs and able to access this powerful new way to program a computer. He said, it's really fascinating to me that we have a new magical computer, and it's helping me boil an egg rather than helping the government with military ballistics. Indeed, corporations and governments are lagging behind the adoption of all of us. His point is that this is completely unprecedented. He continued, we all have a computer, it's all just software, and chat GPT was beamed down to billions

Starting point is 00:10:16 of people instantly and overnight. It's kind of insane to me that this is the case and now it's our time to program these computers. Which is not to say that they are perfect. Indeed, with a new era of software, we're finding new problems as well. There are, of course, the problems of hallucination and just more generally jagged intelligence. In other words, while LLMs have perfect knowledge in some areas, they can also then fail to be able to see how many R's are in the word strawberry. Less discussed, though, is the idea that LLMs don't natively learn new things. While a human working in an organization will learn how to perform specific tasks, and LLM will forget everything as soon as the context window is closed.

Starting point is 00:10:50 This presents some very real limitations and breaks the analogy of human thought. Carpathy said, you have to simultaneously think through this superhuman thing that has a bunch of cognitive deficits and issues. Yet Carpathy also believes that there's an entire category of computing tasks that are unlocked by LLMs that were only starting to scratch the surface of. One of these ideas he called partial autonomy apps, or copilot or cursor for X. The idea is an app like cursor, which acts as an overlay to LLMs and allows users to move faster. Rather than talking to the LLM operating system directly, cursor can orchestrate many actions with the human overseeing the process. He noted that these kinds of apps often have a feature he

Starting point is 00:11:25 referred to as an autonomy slider, where the user can select how much autonomy the LLM has to take actions and make changes depending on how sensitive the task is. Carpathy, in fact, suggested that most software will become partially autonomous, with some big implications for the software industry who need to figure out how to integrate the new modality. He said, traditional software right now has all these switches designed for humans, but that has the change to be accessible to LLMs. One of the conclusions is that software should seek to make the feedback loop between LLM generation and human verification as tight as possible. Carpathy is apparently interested in MCU references, as he used the Ironman suit as a way to explain this autonomy slider idea. On one end of the

Starting point is 00:12:02 spectrum, there is Tony Stark wearing the suit versus when, a little bit down the line. He actually built autonomous versions of the suit that could operate themselves. Carpathy said, we can build augmentations or we can build agents, but we kind of want to do a bit of both. At this stage, working with fallible LLMs, it's less building flashy demonstrations of autonomous agents and more building partial autonomy products. And in one more example of the the need for interfaces that connect the dots more fluidly between what semi-autonomous software is producing and humans, he gave the example of vibe coding. As it stands at the moment, Carpathy said, vibe coding is super great when you want to build something custom that doesn't appear to exist and

Starting point is 00:12:35 you just want to wing it, but he also walked through an app he has in production that transforms restaurant menus into pictures for easy selection. He said, the code was actually the easy part. Most of it was actually adding authentication and payments and a domain name. All of this was really hard. It was me and a browser clicking stuff. I had the app working in a few hours, and then it took me a because it was trying to make it real. Bringing it all together, Carpathy argued that there's a new category of consumer that needs infrastructure, saying, it used to be just humans through guis or computers through APIs.

Starting point is 00:13:02 Agents are computers, but they're human-like. There's people's spirits on the internet and they need to interact with our software infrastructure. One example he gave of what it's going to look like to design for this audience is Versel and Stripe, who allow LLMs to access their documentation via Markdown. Carpathy said if we can make docs accessible to LLMs, it's going to unlock a huge amount of use. And while accessibility is a big deal, the docs also need to fundamentally change to reflect how an LLM will take actions. For Cell, for example, has already done this, replacing the word click with agent-friendly API commands. Anthropics MCP is built on a similar concept. Carpathy said,

Starting point is 00:13:35 anytime your docs say click, this is bad, as an LLM won't be able to natively take this action right now. The big takeaway is that there is still an absolute ton of code to be written to re-architect the world of software for agents. The revolution in practice is about slowly an increment. mentally moving the slider from augmentation to full automation, but the architecture buildout, which Carpathie views as at least a decade long, has only just begun. And so that is LRS for this week. Like I said, guys, I have barely scratched the surface on this and would highly encourage you to go watch the whole video. For now, though, that is going to do it for today's AI Daily Brief. I appreciate you listening or watching as always, and until next time,

Starting point is 00:14:12 peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - AI, Agents and Software 3.0

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.