The AI Daily Brief: Artificial Intelligence News and Analysis - Code AGI is Functional AGI (And It's Here)

Starting point is 00:00:00 Today on the AI Daily Brief, why code AGI is functional AGI and why functional AGI is here. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. First of all, thank you to today's sponsor, ZenCoder, robots and pencils, section, and superintelligent. To get an ad-free version of the show, go to patreon.com.com slash AI Daily Brief. And if you are interested in sponsoring the show, send us a note at sponsors at AIDDailybrief.A.I. So we are back now with another long read slash big think episode. And this week, we're getting into a topic that I have been kind of obsessing about for the last several weeks.

Starting point is 00:00:46 It feels to me quite clear that something dramatic has shifted. Obviously, I don't mean some new model that changes everything, but more, it feels as though we've digested what the latest round of models is actually capable of. We've had enough time with them for them to start to shift our behaviors. And the implication of all of that is, fundamentally speaking, some different new era in the story of AI and more broadly in the story of work. It is a shift which I am still trying to figure out how to put words around, but one that I am convinced has profound implications for how companies do what they do.

Starting point is 00:01:18 To some extent, the shift is starting to come home to roost in a concerted conversation around whether we are finally at AGI. I will argue that we are with some nuance. But what I'm going to do first is read some excerpts from a recent piece by Sequoia's Pat Grady called 2026, this is AGI, follow it up with a more skeptical piece by Every's Dan Shipper, called Toward a Definition of AGI, and then I'm going to add my own thoughts,

Starting point is 00:01:41 steel manning both perspectives, and trying to end with where I think is the most useful place to be. Let's start with Pat's piece. It's actually by Pat Grady and Sonia Huang, and begins, years ago, some leading researchers told us that their objective was AGI, eager to hear a coherent definition we naively asked, how do you define AGI?

Starting point is 00:01:59 They paused, looked at each other tentatively, and then offered up what's become something of a mantra in the field of AI. Well, we each kind of have our own definitions, but we'll know it when we see it. The vignette typifies our quest for a concrete definition of AGI. It is proven elusive. While the definition is elusive, the reality is not. AGI is here now. Coding agents are the first example.

Starting point is 00:02:22 There are more on the way. Long Horizon agents are functionally AGI, and 2026 will be their year. Now in the next section, Pat and Sonia make sure to qualify that they do not have any sort of scientific authority to propose this definition. And yet, with that said, they offer what they call a functional definition of AGI. AGI they write is the ability to figure things out. That's it. A human who can figure things out has some baseline knowledge,

Starting point is 00:02:46 the ability to reason over that knowledge, and the ability to iterate their way to the answer. An AI that can figure things out has some baseline knowledge, pre-training, the ability to reason over that knowledge, inference time compute, and the ability to iterate its way to the answer, long horizon agents. The first ingredient, knowledge and pre-training, is what fueled the original chat GPT moment in 2022. The second, reasoning and inference time compute came with the release of 01 in late 2024. The third, iteration and long horizon agents, came in the last few weeks with ClaudeCode

Starting point is 00:03:14 and other coding agents crossing a capability threshold. Generally intelligent people can work autonomously for hours at a time, making and fixing their mistakes and figuring out what to do next without being told. Generally intelligent agents can do the same thing. This is new. So what's an example of this new capability that they're talking about? They provide an example of a founder telling his agent that he needs a developer relations lead. He gives a set of qualifications, including the fact that this person needs to enjoy being on Twitter.

Starting point is 00:03:40 The agent starts in an obvious place. LinkedIn searches for developer advocate, for example. Unfortunately, it finds hundreds of examples so it has to iterate. It pivots they write to signal over credentials. It searches YouTube for conference talks. From there, it finds 50 plus speakers and filters for those with talks that have strong engagement. Next, because of that Twitter qualification, it cross-references those speakers with Twitter. The total number is now whittled down to a dozen with real followings and posting real opinions.

Starting point is 00:04:06 Honing in even further for who's been most engaged in the last few months, that total list, which was hundreds and then 50 and then dozen, is now down to three. Now it can hone in on those three. One just announced the new role. One is the founder of a company that just raised funding. The third was a senior dev rel at a Series D company that just did layoffs in marketing. The agent they write drafts an email acknowledging her recent talk, the overlap with the startup ICP and a specific note about the creative freedom a smaller team offers. It suggests a casual

Starting point is 00:04:31 conversation, not a pitch. Total time, 31 minutes. The founder has a short list of one instead of a JD posted to a job board. This Pat and Sonia right is what it means to figure things out, navigating ambiguity to accomplish a goal, forming hypotheses, testing them, hitting dead ends, and pivoting until something clicks. The agent didn't follow a script. It ran the same loop of great recruiter runs in their head, except it did it tirelessly in 31 minutes without being told how. To be clear, agents still fail. They hallucinate, lose context, and sometimes charge confidently down exactly the wrong path. But the trajectory is unmistakable and the failures are increasingly fixable.

Starting point is 00:05:07 So what? Well, soon they say you'll be able to hire an agent, which with a hat tip to Saraguo they call one litmus test for AGI. You can hire GPD 5.2 or Claude or Grock or Gemini today. More examples are on the way. In medicine, open evidences deep consult functions as a specialist. In law, Harvey's agents function as an associate. They go through examples in cybersecurity, DevOps, go-to-market, recruiting, math, semiconductor design, and AI research.

Starting point is 00:05:31 All of this, they say, has profound implications for founders. The AI applications of 23 and 24 were talkers. Some were very sophisticated conversationalists, but their impact was limited. The AI applications of 26 and 27 will be doers. They will feel like colleagues. Usage will go from a few times a day to all day every day, with multiple instances running in parallel. Users won't save a few hours here and there. They'll go from working as an IC.

Starting point is 00:05:54 to managing a team of agents. Remember all that talk of selling work? Now it's possible. What work can you accomplish? The capabilities of a Long Horizon agent are drastically different than a single forward passive model. What new capabilities do Long Horizon agents

Starting point is 00:06:08 unlock in your domain? What tasks require persistence where sustained attention is the bottleneck? Saddle up, they say, it's time to ride the Long Horizon Agent exponential. Today, your agents can probably work reliably for around 30 minutes, but they'll be able to perform

Starting point is 00:06:20 a day's worth of work very soon, and a century's worth of work eventually. Ultimately, they write, the ambitious version of your roadmap just became the realistic one. Let's move over to Dan Shippers toward a definition of AGI. Dan writes, when an infant is born, they are completely dependent on their caregivers to survive. They can't eat, move, or play on their own. As they grow, they learn to tolerate increasingly longer separations. Gradually, the caregiver occasionally and intentionally fails to meet their needs. The baby cries in their crib at night, but the parent waits to see if they'll self-soothe.

Starting point is 00:06:51 The toddler wants attention, but the parent is on the phone. These small, manager, Disappointments, what the psychologist DW Winnicott called Good Enough Parenting, teach the child that they can survive brief periods of independence. Over months and years, these periods extend from seconds to minutes to hours until eventually the child is able to function independently. AI is following the same pattern. Today we treat AI like a static tool we pick up when needed and set aside when done. We turn it on for specific tasks, writing an email, analyzing data, answering questions, then close the tab. But as these systems become more capable, we'll find ourselves returning to them more frequently, keeping sessions open longer

Starting point is 00:07:27 and trusting them with more continuous workflows. We already are. So here's my definition of AGI. Artificial general intelligence is achieved when it makes economic sense to keep your agent running continuously. In other words, we'll have AGI when we have persistent agents that continue thinking, learning, and acting autonomously between your interactions with them, like a human being does. I like this definition because it's empirically observable. Either people decide it's better to never turn off their agents or they don't. It avoids the philosophical rigmarole inherent to trying to define what true general intelligence is. And it avoids the problems of the Turing test and OpenAI's definition of AGI. In the Turing test, a system is AGI when it can fool a

Starting point is 00:08:04 human judge into thinking it's human. The problem with the Turing test is that it sets up movable goalposts. If I interacted with GPT4 10 years ago, I would have thought it was human. Today, I'd simply ask it to build a website for me from scratch and I'd instantly know it was not human. OpenAI's definition of AGI, which is AI that can outperform humans at most economically valuable work, suffers from the same problem. What constitutes economically valuable work constantly changes. We will invent new economically valuable work that we can perform in conjunction with AI. These hybrid roles then become the new benchmark that AI will need to learn to do before it counts as

Starting point is 00:08:35 AGI. So the definition is an ever-receding target. By contrast, the definition I proposed, AGI is achieved when it makes economic sense to keep your agent running continuously, is a binary, irreversible and immovable threat. I like this definition because in order to meet it, we will need to develop a lot of necessary but harder to define components of AGI. 1. Continuous learning. The agent must learn from experience without explicit user prompting. 2. Memory management. The agent needs sophisticated ways to store, retrieve, and forget information

Starting point is 00:09:03 efficiently over extended periods. 3. Generating, exploring, and achieving goals. The agent requires the open-ended ability to define new, useful goals and maintain them across days, weeks, or months, while adapting to changing circumstances. 4. Proactive communication. The agent should reach out when it has updates, questions, or requires input, rather than only responding when summoned. It must also be able to be interpreted and redirected by the user. 5. Trust and reliability. The agent must be safe and reliable. Users will not keep agents running unless they are confident the system will not cause harm or make costly errors autonomously. While I've described these capabilities, I'm deliberately avoiding the trap of trying to specify

Starting point is 00:09:40 exact technical criteria for each one. What precisely constitutes continuous learning or trust is difficult to pin down. Instead, my AGI definition entails that all of these capabilities are present to some extent. And these capabilities already are present in limited ways. Chad GBTGBT, for example, has rudimentary forms of memory and proactive communication. The length of time during which an AI can run on its own is increasing, gradually and consistently. When GBT3 first came out, the primary use case for AI was the GitHub co-pilot. The best it could do was complete the line of code you were already writing. Chad GBT lengthened the amount of time the AI could run from the amount required for you to

Starting point is 00:10:14 press tab to complete a line of code to the time required to deliver a full response in a chat conversation. Now, agentic tools like Claude Code, Deep Research, and Codex can run for between five and 20 minutes at a stretch. The trajectory is clear. From seconds to minutes to hours, and to days and beyond, eventually the cognitive and economic costs of starting fresh each time will outweigh the benefits of turning AI off. If you're using AI to code, ask yourself, are you building software or are you just playing prompt roulette? We know that unstruck structured prompting works at first, but eventually it leads to AI slop and technical debt. Enter Zenflow. Zenflow takes you from vibe coding to AI-first engineering. It's the first

Starting point is 00:10:58 AI orchestration layer that brings discipline to the chaos. It transforms freeform prompting into spec-driven workflows and multi-agent verification, where agents actually cross-check each other to prevent drift. You can even command a fleet of parallel agents to implement features in fixed bugs simultaneously. We've seen teams accelerate delivery 2x to 10x. Stop, gambling with prompts. Start orchestrating your AI. Turn raw speed into reliable production-grade output at Zenflow.free. Today's episode is brought to you by robots and pencils, a company that is growing fast. Their work as a high-growth AWS and Databricks partner means that they're looking for elite talent ready to create real impact at velocity. Their teams are made up of AI native

Starting point is 00:11:41 engineers, strategists, and designers who love solving hard problems and pushing how AI shows up in real products. They move quickly using RoboWorks, their agenic acceleration platform, so teams can deliver meaningful outcomes in weeks, not months. They don't build big teams. They build high-impact nimble ones. The people there are wicked smart with patents, published research, and work that's helped shaped entire categories. They work in velocity pods and studios that stay focused and move with intent. If you're ready for career-defining work with peers who challenge you and have your back, robots and pencils is the place. Explore open roles at robots and pencils.com slash careers. that's robots and pencils.com slash careers.

Starting point is 00:12:21 Here's a harsh truth. Your company is probably spending thousands or millions of dollars on AI tools that are being massively underutilized. Half of companies have AI tools, but only 12% use them for business value. Most employees are still using AI to summarize meeting notes. If you're the one responsible for AI adoption at your company, you need Section. Section is a platform that helps you manage AI transformation across your entire organization. It coaches employees on real use cases, tracks who's using AI.

Starting point is 00:12:47 for business impact and shows you exactly where AI is and isn't creating value. The result, you go from rolling out tools to driving measurable AI value. Your employees move from meeting summaries to solving actual business problems, and you can prove the ROI. Stop guessing if your AI investment is working. Check out section at sectionaI.com. That's SEC, T-I-O-N-A-I-com. Today's episode is brought to you by Super Intelligent. Superintelligent is a platform that very simply put, is all about helping your company figure out how to use AI better. We deploy voice agents to interview people across your company, combine that with proprietary intelligence about what's working for other companies, and give you a set of recommendations around use cases,

Starting point is 00:13:27 change management initiatives that add up to an AI roadmap that can help you get value out of AI for your company. But now we want to empower the folks inside your team who are responsible for that transformation with an even more direct platform. Our forthcoming AI strategy compass tool is ready to start to be tested. This is a power tool for anyone who has to be able to is responsible for AI adoption or AI transformation inside their companies. It's going to allow you to do a lot of the things that we do at superintelligent, but in a much more automated, self-managed way, and with a totally different cost structure. If you are interested in checking it out, go to AIDailybrief.ai slash compass, fill out the form and we will be in touch soon. So both good

Starting point is 00:14:08 entries into the canon of what is AGI. But as I indicated at the beginning, what I think is actually most relevant about them right now is the fact that we are having this conversation right now. We are having this conversation because people have a sense that something big has shifted. But something big does not necessarily convey AGI. Indeed, one of the best steelman arguments against what we have now being AGI is the need to separate wow level competence from general autonomy. The argument would go along the lines of AGI isn't just about being able to generate impressive outputs across domains.

Starting point is 00:14:41 It's about robust, self-directed competence under real-world constraints. AGI could be dropped into novel situations, defined success criteria, manage long horizon execution, and reliably converge without a human acting as the external executive function. And as much as things have changed, what both of the pieces we just read have in common is that more than anything else, they're disagreeing about which point we're on on an agreed-upon trajectory. The funny thing, in fact, about that this is AGI piece is that when you actually read it closely, it's not so much saying that this is AGI.

Starting point is 00:15:11 It's saying that we're really, really close, that what is AGI, i.e. these Long Horizon agents, are available-ish now and just getting better, and that because we're now within, call it, months rather than years of AGI, you better start preparing. Dan isn't really disagreeing with that, although he doesn't get into timelines. Instead, he's pointing out all of these things that need to happen to get to a certain point of indispensability, which he is arguing is the key thing. But what about what we've seen over the last couple of weeks? the sense among some of the most enfranchised and powerful users of AI

Starting point is 00:15:42 that we really are in a fundamentally different moment. To take one example of a type of testimony we've seen lots of, Mid-Journey founder David S. Holtz tweeted on January 3rd, I've done more personal coding projects over Christmas break than I have in the last 10 years. It's crazy. I can sense the limitations, but I know nothing is going to be the same anymore. And honestly, this brings up a more interesting and nuanced take on

Starting point is 00:16:04 it's not AGI yet. That argument would go something like, Yes, clod code and similar tools have crossed the threshold for coding specifically, but generality is the whole point of general intelligence. There's still so much that current AI fails at, like novel reasoning, multi-step planning, and unfamiliar domains. These new big breakthroughs that everyone is sensing happened in a domain that's really well-suited to LLMs, well-documented, pattern-rich, verifiable outputs. That's not the argument would go evidence of general intelligence, it's evidence of domain fit. This would in some ways be an argument about the

Starting point is 00:16:37 the vagueness of AI, the idea that it can be superhuman in one area, and infantile in another. And indeed, it is the case that this sense of what has shifted is about AI's capacity to code. But I keep coming back to this essay from Sean Wang, aka Swix, when he decided to join cognition. This line, which absolutely wins the award, for the couple of sentences that have lived most rent-free in my head since they were written, Sean wrote, The central realization I had was this. Code AGI will be achieved in 20% of the time of full AGI and capture 80% of the value of AGI. Now for him, this was an argument to simply do code AI now rather than later, but I think

Starting point is 00:17:15 what I would argue is that code AGI doesn't quote unquote capture 80% of the value of AGI. I think code AGI is more or less just functional AGI. The argument here is that coding is effectively a universal lever in the modern world. Most economically valuable work, to reference open AIs terminology, has been computer shaped for long time. If your job touches a screen, an API, a database, a spreadsheet, a ticketing system, a CRM, a repo, a dashboard, or a docs tool, then in principle it's addressable by software. So if an AI can understand intent, translate intent into procedures, write and modify code, run tools, inspect outputs, and iterate until it meets acceptance criteria, then it has

Starting point is 00:17:57 a meta skill that can simulate competence in many domains by building the missing tool. And in that framing, coding isn't one domain. It instead is closer to instrumental generality. Want data analysis? Write SQL or Python notebooks, run them, interpret the results, generate charts, and build pipelines. Want operations? Automate workflows across systems, tickets, approvals, audits alerts. Want finance? Pull data, reconcile, generate variance analysis, draft narratives. Want product? Spin up prototypes, instrumentation, AB analysis, telemetry pipelines. Basically, the idea is that if you can program, you can create capabilities. And if you can create capabilities on demand, you're not narrow, you're general in a way that matters for real

Starting point is 00:18:39 work. You could take this argument even farther. Coding doesn't just help you build general capabilities. It also is to be good in some ways a test of general reasoning. Non-trivial coding forces abstraction, decomposition, causal reasoning, adversarial thinking, and iterative debugging. Those are indicators, ultimately, of general intelligence. And I would argue that a lot of what feels different about building with AI coding tools now as opposed to six months ago, is in that set of general reasoning capabilities rather than just how good it is at knowing a bunch of different coding languages. We recently had an issue come up where a company that we were producing an AI and agent readiness audit for found contained in one of the recommendations, a tool or platform

Starting point is 00:19:20 that was at contrast at the text act that they currently have. Now, this is a problem that we are extremely conscientious of. It would be very easy to recommend a bunch of platforms that an enterprise is never going to use. It's much more difficult. difficult and much more valuable to let them know how to work with the band of tools they have or the things that they would consider to actually solve the problems that are clear and present for them. And so we spent a lot of time making sure that that type of recommendation doesn't make it through, but it did. And so as the team was talking about new processes and procedures for making sure this didn't happen anymore, I was following the conversation on Slack from a haircut.

Starting point is 00:19:53 It struck me that this might not actually be all that difficult, at least if you removed it from the domain of the human. And so as I was sitting there, I fired up my mobile web, browser version of lovable, and by the time the haircut was done, I had a checker to run final reports through that would make sure to compare the tech stack of the company to all the recommendations in the report, and in literally a matter of seconds, make sure, before the final deliverable was sent, that that sort of thing didn't happen. Now, of course, there are even more sophisticated ways to do this with code. Basically, we could just build that capability that I built as a standalone into the overall processing pipeline. But the point that I'm trying to make is that this wasn't

Starting point is 00:20:26 coding to solve a technical problem. It was coding to solve a business problem. Increasingly, the people who are most adept at working with AI. For the people who are most adept at working with AI, that's what they're doing? The question is decreasingly, what's the best AI tool for this? And increasingly, can I build some custom software to solve or enable this? So what's happening right now at all these different startups is not some cute little example of the non-technical folks being able to vibe code and prototype features. What's happening is a complete collapse in the distance between idea and execution for everyone. And that's amazing for those startups. we are going to see complete and utter re-evaluations of how to do everything.

Starting point is 00:21:06 And startups and small companies are going to be the incubators of that change. The thing that would worry me right now, if I was an enterprise leader, is that this Rubicon that we've crossed starts to feel more like a shift in kind than a shift in scale. What I mean by that is that for the three years since ChatGBT was launched, obviously enterprises were behind, more nimble companies and AI users, relatively speaking. There's all sorts of systems inertia, there's compliance issues, governance issues, etc. But the patterns of what they were doing were still similar to the patterns of what other people were doing, just maybe with a little bit slower adoption and a little bit more process along the way.

Starting point is 00:21:40 Still, they were running on a parallel track. I now believe that the tracks have diverged. The frontier of what's possible and the median of what's deployed in enterprises has decoupled in a way in which I believe that they are increasingly pointed in different directions. The standard enterprise invocations at this point to audit and automate your workflows, experiment with AI, will ultimately contain the transformation possible within existing power structures, more or less keeping the org chart intact. The reality is, in a world of code AGI, a world of functional AGI, the org chart is broken, bottleneck shift from who can code to who has good ideas,

Starting point is 00:22:17 the role of management shifts from resource allocation to taste and judgment, impetitive advantage shifts from execution capability to speed of iteration, and the gap increasingly isn't linear. It's compounding. Every month, someone is building in the new paradigm. They are getting comparatively farther ahead from those who aren't. So is the answer as simple as letting everyone on your team vibe code? Honestly, I think you could do worse. I think we are at a moment. We are increasingly the modality by which things are produced in this world looks and is different to the way that it was just a few years ago, even just a few months ago. The message is less upskill your workforce and and audit your AI use cases, and much closer to your entire organizational model is built for a

Starting point is 00:22:58 world where execution was the bottleneck, and that world is over. I worry for enterprises because this new set of shifts involves accepting a loss of control, a restructuring of incentives, and a total transformation of process that is even harder than the AI transformation that has come so far. And to the extent that you are in one of those enterprises and looking for a bright spot in this, it's that at least most of you are in this together, and that it is unlikely that many enterprises are going to get comfortable fast with the types of change they really should be making. But the change, I believe, has happened.

Starting point is 00:23:29 And I think that the rewards for the companies, not just startups, but enterprises too, who can lean in to this new capability set and can live on the other side of this inflection will be immense. So, like I said at the beginning, I think Code AGI is functional AGI, and I think it's here. This is something that I will be exploring a lot more in the weeks to come.

Starting point is 00:23:50 For now, though, that is going to do it for today's AIDI. brief. Appreciate you listening or watching as always and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Code AGI is Functional AGI (And It's Here)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.