The AI Daily Brief: Artificial Intelligence News and Analysis - From Cointelligence to Agent

Starting point is 00:00:00 From co-intelligence to agent, today we are reading an essay on Open AIs just released O1 model. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Hello, friends. Happy weekend. It, of course, being the weekend means it is time for a long reads episode.

Starting point is 00:00:25 And today we are doing sort of the third in our troika of OpenAI-O-1-related shows. This essay comes once again from Professor Ethan Mollick's blog, One Useful Thing, which is at One Useful Thing.org, and which you should absolutely subscribe to, and it's called something new, on OpenAI Strawberry and Reasoning. Ethan writes, I've had access to the much-rumored OpenAI Strawberry

Starting point is 00:00:46 Enhanced Reasoning System for a while, and now that it is public, I can finally share some thoughts. It is amazing, still limited, and perhaps most importantly, a signal of where things are headed. The new AI model called O1 Preview, as an aside, why are the AI company so bad at names, lets the AI, quote-unquote, think through a problem before solving it.

Starting point is 00:01:05 This lets it address very hard problems that require planning and iteration, like novel math or science questions. In fact, it can now beat human PhD experts in solving extremely hard physics problems. To be clear, O1 Preview doesn't do everything better. It is not a better writer than GPT40, for example, but for tasks that require planning, the changes are quite large. For example, I gave O1 preview the instruction, figure out how to build a teaching simulator using multiple agents in generative AI, inspired by the paper attached and considering the views of teachers and students, write the code and be detailed in your approach. I then paste it in the full text of a paper. The only other prompt I gave it was build the full code. Ethan then shares a video of what the system produced. Section, strawberry

Starting point is 00:01:46 and action. But it is hard to evaluate all of this complex output, so perhaps the easiest way to show the gains of strawberry and some limitations is with a game. A crossword puzzle. I took the eight clues from the upper left-hand corner of a very hard crossword puzzle and translated that into text because O-1 Preview can't see images yet. The clues include Galaxy Cluster, part of a midway skill crane, beat it, caught, bits of dishonesty, fall report, much of Carmen's premier audience, and preferred place of athletic contact. Ethan continues, crossword puzzles are especially hard for LLMs because they require iterative solving, trying and rejecting many answers that all affect each other. This is something LLMs can't do, since they can only add a token or word at a time

Starting point is 00:02:27 to their answer. When I gave the prompt to Claude, for example, it first comes up with an answer for one down, it guesses star, which is wrong, and then is stuck trying to figure out that the rest of the puzzle with that answer, ultimately failing to come even close. Without a planning process, it just has to charge ahead. But what happens when I give this to Strawberry? The AI, quote-unquote, thinks about the problem first for a full 108 seconds. Most problems are solved in much shorter times. You can see its thoughts, a sample of which are below, and which are super illuminating. You can see its thoughts include tackling the clues, noticing patterns, investigating word patterns, piecing together clues, assessing clues together, breaking down clues, examining letter alignment,

Starting point is 00:03:02 breaking down the puzzle, charting the course, weighing options, evaluating word choices, evaluating potential fits, assessing choices, weighing options, assessing options, etc, etc. Ethan continues, the LLM iterates repeatedly, creating and rejecting ideas. The results are pretty impressive and it does well, but O1 preview is still seemingly based on GPD-40, and it is a little too literal to solve this rather unfair puzzle. The answer to one down, Galaxy cluster, is not a reference to real galaxies, but rather a reference to the Samsung Galaxy phone, and so the answer is apps. Stuck on real galaxies, the AI instead kept trying out the name of actual galactic clusters before deciding one down is coma, which is real galactic cluster I had no idea.

Starting point is 00:03:43 Thus, the rest of the results are not correct and do not fit the rules exactly, but are pretty creative. To see if we could get further, I decided to give it a clue. One down is apps. The AI takes another minute. Again, in a sample of its thinking, you can see how it a rate's ideas. For example, for the clue bits of dishonesty, four letters which it now knows starts with A, possible answers starting with an A. L. Fibs but starts with an Fibs but starts with an F, a bit doesn't fit, cons doesn't start with an A, Aces doesn't fit the clue. Alternative answer, arts doesn't fit the clue. Wait, perhaps it's aces as bits in card games but doesn't fit dishonesty. Another possibility is acts. Acts can mean deeds, but bits of dishonesty could be

Starting point is 00:04:23 acts as in deceptive acts. Tenatively, acts. The final answer is completely correct solving all the hard references, though it does hallucinate a new clue 23 across, which is not in the puzzle I gave it. So, O1 preview does things that would have been impossible without strawberry, but it still isn't flawless. Errors and hallucinations still happen, and it is still limited by the intelligence of GPT-40 as the underlying model. Since getting the new model, I haven't stopped using Claude to critique my posts. Claude is still better at style, but I did stop using it for anything involving complex planning or problem solving. It represents a huge leap in those areas. Closing, from co-intelligence to dot-d-dot. Using O1 preview means confronting a paradigm change in AI. Planning is a form of agency, where the AI arrives

Starting point is 00:05:04 at conclusions about how to solve a problem on its own without our help. You can see from the examples above that the AI does so much thinking and heavy lifting, churning out complete results, that my role as a human partner feels diminished. It just does its thing and hands me an answer. Sure, I can sift through its pages of reasoning to spot mistakes, but I no longer feel as connected to the AI output, or that I am playing as large a role in shaping where the solution is going. This isn't necessarily bad, but it is different. As these systems level up and inch towards true autonomous agents, we're going to need to figure out how to stay in the loop, both to catch errors and to keep our fingers on the pulse of the problems we're trying to crack.

Starting point is 00:05:37 O1 Preview is pulling back the curtain on AI capabilities we might not have seen coming, even with its current limitations. This leaves us with a crucial question. How do we evolve our collaboration with AI as it evolves? That is a problem that O1 Preview cannot yet solve. Today's episode is brought to you by Venice. Venice is a private, uncensored generative AI app. It accesses open source models to enable text, image, and code generation without the fear of being spied on or having your data exploited. Discuss anything with Venice without concern about it being monitored, sold, or given to advertisers and governments. Venice is different because your conversations and creations are kept securely within the browser, never stored or accessible by Venice. Unlike other AI apps, Venice won't tell you what's okay to say or not. Venice won't patronize you.

Starting point is 00:06:19 It simply provides direct access to machine intelligence, no topics are off limits, no ideas, or taboo. With Venice, you're in control of the AI as you should be. Pro subscriptions are available for $49 a year or $8 per month. AI Daily Brief listeners receive a 20% discount on Venice Pro. Visit venice.a.i slash NLW and enter the discount code NLW Daily Brief. That's NLW Daily Brief, all one word. Today's episode is brought to you by Super Intelligent, which is of course our point.

Starting point is 00:06:49 platform that helps you learn how to use AI tools and perhaps even more importantly, gives you ideas on the best use cases that are actually going to help you achieve whatever it is you want to achieve. To recognize the end of summer and back to school slash back to work, we are running our best promotion ever when you sign up for super intelligent between now and the end of August using code so back your first month will be 100% free. The platform features over 600 fun, highly practical AI tutorials that get you using AI fast and with an eye to actually transforming how you get things done. We've just launched Super for Teams. So if you have a group of people at your company that want to figure out how to use AI together, I highly suggest you check it out. But for those

Starting point is 00:07:33 of you who are using Super Intelligent as an individual, once again, if you sign up for Super intelligent between now and the end of the month using code so back you will get your first month 100% free, go to B-Super.aI and check it out today. another thought-provoking piece from Ethan here. A couple things that I think are worth honing in on. First, this idea that although still nascent and although this is not agentic, there is the first glimpses on how we move from the assistant paradigm to the agent paradigm. Again, Ethan writes planning as a form of agency where the AI arrives at conclusions about how to solve a problem on its own without our help. The key to agents is exactly that process. And also, I believe that

Starting point is 00:08:11 the feeling that Ethan has of feeling more out of the loop is real. And there are potentially real consequences. The more we learn to trust AI, the less able to catch errors and hallucinations we're going to be. At the same time, at least from the limited sort of business context that I tend to think about AI from initially, it's hard not to view this as pretty much just exclusively an upgrade and quite exciting. One of the things I discussed in a recent video about how to get the most out of 01 is that when it comes to business problems, business strategy in particular, where 01 is good is when there is an objective right answer. You might remember we had Ali Miller experimenting with some optimization problems. Staffing scheduling optimization is one example,

Starting point is 00:08:51 office warehouse planning and arrangement as another. These are areas where, yes, there is strategy, but depending on the variable that you pick, whether you're trying to maximize, for example, revenue or revenue opportunity in the form of warehouse space, there could actually be an objective right answer. Where things get blurier is when there's not an objective answer, when it's still subjective. I noted when I was exploring how 01 did versus GPT40 on creating a sales strategy for superintelligent, that they had pretty similar outputs, with 01 just being slightly more comprehensive. So when I think then about the way that this reasoning paradigm improves ability to make business decisions, it's two different types of benefit. For things that have a

Starting point is 00:09:31 genuine right answer, the AI can get there faster, which is useful because who wants to spend a bunch of time thinking about staffing optimization, for example. And when there's not an exact right answer, AI has just become an even better brainstorming partner. I think ultimately that a huge amount of business, perhaps the vast majority of it, does not have an easy or obvious right answer. Some discrete problems and tasks within it do. But in terms of what to do next, it's usually all about weighing different options, which have different benefits and risks, comparing them to the capabilities that you have or that your team has, and making decisions from there. The number of variables makes it basically impossible for there to ever be a strictly right answer, at least without the

Starting point is 00:10:10 benefit of hindsight. And so even with more advanced reasoning AI, there's still a need at the end of the day to make a call. Now, of course, it's not inconceivable that there's a point at the future where people decide that AI is better on average at weighing those different possibilities, but I remain somewhat skeptical of this. I think that humans as orchestrators of vast armies of AI that radically increase our productivity as a whole and just our ability to do things is a more likely scenario than one where humans are cut out entirely.

Starting point is 00:10:37 But of course we can't know the future until we get there. And so for now, I appreciate you hanging out and trying to get this little preview of it with me. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - From Cointelligence to Agent

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.