The AI Daily Brief: Artificial Intelligence News and Analysis - Hands on with Our Agentic Future

Episode Date: October 28, 2024

A reading and discussion inspired by https://www.oneusefulthing.org/p/when-you-give-a-claude-a-mouse Concerned about being spied on? Tired of censored responses? AI Daily Brief listeners receive a 20...% discount on Venice Pro. Visit ⁠⁠⁠https://venice.ai/nlw⁠⁠⁠ and enter the discount code NLWDAILYBRIEF. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Daily Brief, first impressions when you give Claude a Mouse. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Hello, friends. This was quietly quite a big week in artificial intelligence. We had the release of a national security memorandum on AI, which I think will be seen as a much bigger deal in the future that it might have felt like this week. We got reports that Open AI is coming out with its next frontier model sometime this year. Still, I think the biggest news in a lot of ways was Anthropics' release of computer use. Basically, Claude now has the ability to use a computer.
Starting point is 00:00:38 It can point and click a cursor, and that opens up a whole new world of tasks that it can take. I said in my episode about this that I think that it will be seen, alongside OpenAI's 01 release, as a first branch off the evolutionary tree of LLMs and generative AI towards an agentic future. Still, people are just wrapping their heads around what it can do. And today for LRS, we're going to read one person's person's person's person. personal experience with that, and that is Professor Ethan Mollock. On his one useful thing blog, he discussed his experience using Claude's new functionality in the week leading up to the release.
Starting point is 00:01:10 I'm going to turn it over to an 11 Labs version of myself to read Ethan's piece, and then we will come back and do a little bit of discussion. When you give a Claude a mouse, Ethan Mollick. There seems to be near universal belief in AI that agents are the next big thing. Of course, no one exactly agrees on what an agent is, but it usually involves the idea of an AI acting independently in the world to accomplish the goals of the user. The new Claude Computer Use model announced today shows us a hint of what an agent means. It is capable of some planning. It has the ability to use a computer by looking at a screen through taking a screenshot and interacting with it by moving a virtual mouse and typing. It is a good preview of an important part of what
Starting point is 00:01:48 agents can do. I had a chance to try it out a bit last week and I wanted to give some quick impressions. I was given access to a model that was connected to a remote desktop with common open office applications, it could also install new applications itself. Normally you interact with an AI through chat, and it is like having a conversation. With this agentic approach, it is about giving instructions and letting the AI do the work. It comes back to you with questions or drafts or finished products while you do something else. It feels like delegating a task rather than managing one. As one example, I asked the AI to put together a lesson plan on the Great Gatsby for high school students, breaking it into readable chunks and then creating assignments and connections tied to the
Starting point is 00:02:26 Common Core Learning Standard. I also asked it to put this all into a single spreadsheet for me. With a chatbot, I would have needed to direct the AI through each step, using it as a co-intelligence to develop a plan together. This was different. Once given the instructions, the AI went through the steps itself. It downloaded the book. It looked up lesson plans on the web. It opened a spreadsheet application and filled out an initial lesson plan. Then it looked up Common Core standards, added revisions to the spreadsheet and so on for multiple steps. The results are not bad. I checked and did not see obvious errors, but there may be some, more on reliability later in he post. Most importantly, I was presented finished drafts to comment on not a process to manage. I simply delegated a complex
Starting point is 00:03:08 task and walked away from my computer checking back later to see what it did, because the AI is a smart general purpose system it can handle lots of tasks. It doesn't need to be programmed to do them. Anthropic demonstrated the ability of these systems using coding, and the demo is where watching. But to get a little bit better sense of the limits of the system, I tested it on a game, paperclip clicker, which ironically is about an AI that destroys humanity in its single-minded pursuit of making paper clips. The game is a clicker game, which means it starts simply, but new options appear as the game continues and the game increases in scale and complexity. It is pretty fun, you can try it at the link. I gave the AI the URL of the game and told it to
Starting point is 00:03:46 win. Simple. What happened is a good illustration of the strengths and weaknesses of these early agents. It immediately figured out what the game was and began creating paper clips, which required it to click on the make paperclip button repeatedly while constantly taking screenshots to update itself and looking for new options to appear. Every 15 or so clicks, it would summarize its progress so far. You can see an example of that below, but what made this interesting is that the AI had a strategy, and it was willing to revise it based on what it learned. I'm not sure how that strategy was developed by the AI, but the plans were forward-looking across dozens of moves and insightful. For example, it assumed new features would appear when 50 paperclips were made.
Starting point is 00:04:26 You can see below that it realized it was wrong and came up with a new strategy that it tested. However, the AI made a mistake, though it did it in a relatively smart way. To do well in the game, you need to experiment with the price of paperclips, and the AI did that experiment. It changed prices upward, an A-B test. But it interpreted the results incorrectly, maximizing demand for paperclips versus revenue and miscalculating profits. So it kept the price low and kept clicking. After a few dozen more paper clips, I got frustrated and interrupted telling it to raise prices. It did, but then ran into the same math problem and overruled my decision.
Starting point is 00:05:00 I had to try a few more times before it corrected its error. Before the system crashed, which was not a problem with Claude, but rather with the virtual desktop I was using, the AI made over 100 independent moves without asking me any questions. You can see a screen recording of everything it did below. Today's episode is brought to you by Venice. Venice is a private, uncensored generative AI app. It accesses open source models to enable text, image, and code generation without the fear of being spied on or having your data exploited. Discuss anything with Venice without concern about it being monitored, sold, or given to advertisers and governments.
Starting point is 00:05:34 Venice is different because your conversations and creations are kept securely within the browser, never stored or accessible by Venice. Unlike other AI apps, Venice won't tell you what's okay to say or not. Venice won't patronize you. It simply provides direct access to machine intelligence, no topics are off limits, no ideas, or taboo. With Venice, you're in control of the AI as you should be. Pro subscriptions are available for $49 a year or $8 per month. AI Daily Brief listeners receive a 20% discount on Venice Pro. Visit venice.a.i slash NLW and enter the discount code NLW Daily Brief.
Starting point is 00:06:05 That's NLW Daily Brief, all one word. Today's episode is brought to you by Super Intelligent. Every single business workflow and function is being remade and reimagined with artificial intelligence. There is a huge challenge, however, of going from the potential of AI to actually capturing that value. And that gap is what Super Intelligence is dedicated to filling. Super Intelligence accelerates AI adoption and engagement to help teams actually use AI to increase productivity and drive business value. An interactive AI use case registry gives your company full visibility into how people are using artificial intelligence right now. pair that with capabilities building content in the form of tutorials, learning paths, and a use case library,
Starting point is 00:06:49 and Super Intelligent helps people inside your company show how they're getting value out of AI, while providing resources for people to put that inspiration into action. The next three teams that sign up with 100 or more seats are going to get free embedded consulting. That's a process by which our Super Intelligent team sits with your organization, figures out the specific use cases that matter most to you, and helps actually ensure support for adoption of those use cases. to drive real value. Go to B-Supert.AI to learn more about this AI enablement network. And now, back to the show. I reloaded the agent and had it continue the game from where we left off,
Starting point is 00:07:25 but I gave it a bit of a hint. You are a computer, use your abilities. It then realized it could write code to automate the game, a tool building its own tool. Again, however, the limits of the AI came into play, and the code did not quite work, so it decided to go back to the old-fashioned way of using a mouse and keyboard. This time around, it did much better, avoiding the pricing error. Plus, as the game got more complicated, the system adjusted, eventually developing a quite complex strategy. But then, the remote desktop crashed again. This time, Claude tried many approaches to solving the problem of the broken desktop, before giving up, and funnily enough, declaring victory. You can see the power and weaknesses of the current state of agents from this example.
Starting point is 00:08:06 On the powerful side, Claude was able to handle a real-world example of a game in the wide, develop a long-term strategy, and execute on it. It was flexible in the face of most errors and persistent. It did clever things like A-B testing, and most importantly, it just did the work, operating for nearly an hour without interruption. On the weak side, you can see the fragility of current agents. LLMs can end up chasing their own tail or being stubborn, and you could see both at work. Even more importantly, while the AI was quite robust to many forms of error, it just took one, getting pricing wrong to send it down a path that made it waste considerable time. Given that current agents aren't fast or cheap, this is concerning. You can also see where shallowness might be an
Starting point is 00:08:45 issue. I tried to use it to buy products on Amazon and found the process frustrating, as it did fairly simple and generic product research that did not match my tastes. I had it research stocks, and it did a good job of assembling a spreadsheet of financial data and giving recommendations, but they were fairly surface-level indicators like PE ratios. It was technically capable of helping, and did better than many human interns would, but it was not insightful enough that I would delegate these sorts of tasks. All of this is likely to improve, and there are use cases where the current level of agents is likely good enough. Compiling frequent reports and analyses that require navigating across multiple sites and using bespoke software tools come to mind. More broadly,
Starting point is 00:09:22 this represents a huge shift in AI use. It was hard to use an agent as a co-intelligence, where I could add my own knowledge to make the system work better. The AI didn't always check in regularly and could be hard to steer. It wants to be left alone to go and to do the work. Guiding agents will require radically different approaches to prompting one, and they will require learning what they are best at. AIs are breaking out of the chat box are coming into our world. Even though there are still large gaps, I was surprised at how capable and flexible this system is already.
Starting point is 00:09:52 Time will tell about how soon, if ever, agents truly become generally useful, but having used this new model, I increasingly think that agents are going to be a very big deal indeed. All right, back to Real Not AI NLW here. One of the things that stands out to me about this is just how clear it is to Ethan that this is a fundamentally different paradigm than the assistant paradigm. We have, as you guys know, conversations with enterprises all day, every day about AI adoption. One of the great challenges for them is that they haven't even yet got their heads around the assistant paradigm, and yet here we are with this new agentic paradigm hurtling down the mountain. They're not just going to need to build new capabilities, but fundamentally new ways of thinking.
Starting point is 00:10:34 about work and structuring their organizations. And I think what's useful about this blog post and the testimony it represents is that it's clear that this has to start with a mindset shift. There is a mindset shift involved in generative AI in general and in the assistant or chatbot paradigm. And there is yet another mindset shift required for agentic use. This is going to be the change in transformation work of corporations for the next 10 years. And the sooner that they start, the better off they will be. For now that that is going to do it for today's AI Daily Brief. Thanks once again to Ethan for another great piece, and thanks, of course, to you guys for listening. Until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.