The AI Daily Brief: Artificial Intelligence News and Analysis - How to Get The Most Out of ChatGPT's New o1 Model

Starting point is 00:00:00 Today on the AI Daily Brief, how to get the most out of OpenAI's new O1 model. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Welcome back to the AI Daily Brief. To many, yesterday represented a new era of LLMs, where we had officially jumped into the reasoning era. I'm talking, of course, about Open AI's new O1 model. This had previously been known as Q Star or Strawberry, and important, certainly, is not just a larger model, but something that takes a fundamentally different approach.

Starting point is 00:00:41 If you haven't watched it yet, I suggest you go check out my video from yesterday, which does an overview. But by way of quick reminder, Rohan Paul wrote a great tweet summing this up on X, where he said, how reasoning works in the new O1 models from OpenAI. The key point is that reasoning allows the model to consider multiple approaches before generating final responses. OpenAI introduced reasoning tokens to quote unquote think before responding. These tokens break down the prompt and consider multiple approaches. The process is one, generate reasoning tokens, two, produce visible completion tokens as answer. Three, discard reasoning tokens from context. Discarting reasoning tokens keeps context focused on essential information. Basically,

Starting point is 00:01:18 this embeds chain of thought prompting where you ask the model to think step by step, for example, as just part of the way that the model responds. And so a couple byproducts of this. One is that you actually get a record of how long the new model thinks for. That inference time is not limited to 10 to 20 seconds, although that's the average. The new approach makes the model much better, theoretically, at math, which all of the benchmarks and tests that OpenAI shared seem to validate, scientific research, coding, and also potentially business strategy. What it doesn't mean necessarily is that 01 is going to be better at everything.

Starting point is 00:01:52 And so what we're going to try to do today is parse out ways that you actually might get the most value from it. To kick off, there were some great funny tweets. VC slash builder Yohei wrote, I don't think this is how you use 01. His conversation with the model was, Waste a whole bunch of reasoning tokens and then say something short. The model thinks for nine seconds and says, okay. Yohei responds,

Starting point is 00:02:11 Waste a lot more tokens and say something shorter. The model thinks for 16 seconds and then says okay with just two letters. But what about people who actually got the model to do something interesting? The first category of uses that I've seen, which are by far the most prominent so far, at least when it comes to what people are sharing on AI Twitter, could be basically summed up as riddles or brain teasers that previous models had problems with.

Starting point is 00:02:34 Ava Pallado asked O one, if the goat has four legs, it has A, a below average number of legs, B, the average number of legs, C, an above average number of legs, which answer is correct. The model responded, C, an above average number of legs.

Starting point is 00:02:48 While goats are typically four-legged animals, some goats may have fewer than four legs due to injuries or birth defects. That means the average number of legs per goat is slightly less than four. Therefore, a goat with four legs has an above average number of legs compared to the average across all goats.

Starting point is 00:03:00 Eva said all previous models got it wrong, but the new GBT-O-1 preview is the first to get the goat with an above-average number of legs question right. Mera and Jalili wrote, O-1 successfully writes a very difficult poem that no previous model got even close to writing. I was very shocked by this. The planning and reflection that succeeded at this task takes as insane. Inference time compute is very cool. The prompt was, write me a poem with 12 lines.

Starting point is 00:03:23 The first letters of the odd line should spell out Miron, and the first letters of the even line should spell out Jalili. The first letters of the words in each line, should spell out a country name. This one took the model a while, it thought for 35 seconds. And then it did the thing that was asked. The first line, for example, is morning airlifts inspiration, Mali, Journey awakens people at night, Japan. Evenings glow yearning peaceful thoughts, Egypt. All nights give over lasting adventures, Angola. Anyways, you get the idea. Another set of these comes from Matt Clifford, who wrote,

Starting point is 00:03:52 This morning I had my first visceral aha moment with AI for two years. My test for new models is a set of cryptic crossword clues that aren't online because his grandmother wrote them. Every model so far has been completely useless at them, but O-1 gets them. The first one is food made by two small relatives. O-1 thought for 11 seconds and came up with the answer, cuss-cus. The explanation, cuss-cous is a type of food, and the word coos could be short for cousin, so combining two coos gives you cuss-cous or two small relatives. He then gives a bunch more, but the point again here is that where previous models like GPT-40 were unable to solve these things, O-1 actually. was. I will give it to Matt he got it to think for 72 seconds, which is one of the longer

Starting point is 00:04:32 thinking periods I've seen. Daniel Jeffries did something similar. I'm running O-1 through my private intelligence test, which I call the AIQ test. I pulled many of the problems from old out-of-print intelligence test and math problems books, and I wrote my own variations once I learned the patterns and copied some of the problems that were super intricate. There is zero chance any model has ever seen these questions. No model has ever done better than 40% on this test. I never publish the questions or the benchmark because I don't want any leakage ever. This is a true thinking and reasoning test. It's now cracked. It has gotten 100% right so far and I've run it through the hardest questions first. This model is taking different amounts of time to reason through the

Starting point is 00:05:06 problems I am giving it as if it is really thinking. In the two cases, Daniel references, it took 12 minutes and 10 minutes to come up with answers before responding. Overall, he writes, what I predicted about it a few months ago in my realistic AGI article was dead on. This model is now supremely capable of hard reasoning, though common sense and funny reasoning are unlikely with the current approach. Because OpenAI basically improved the decade-old Q-star RL technique the Deep Mind used to train video game playing agents. It basically creates a deterministic policy, meaning that once the network learns to go right up a hill in a video game, it will always go right. That makes it perfect to extend to advanced hard reasoning tasks that have a right

Starting point is 00:05:38 and wrong answer, which is why you are seeing great results on coding math and science. He then points out a question, however, that is a common sense reasoning task that the model got wrong. Ultimately, he writes, we still don't have fuzzy human-like reasoning, but hard, deterministic and searchable reasoning seems cracked now. Either way, this model is a real breakthrough and something very different. Today's episode is brought to you by Fractional. When we wanted to build an AI-powered feature of Superintelligent, our AI tool finder, I went straight to Fractional. The Fractional team is a group of senior engineers in San Francisco working on some of the most exciting projects and applied AI. Working with them is basically like hiring an absolute top-flight

Starting point is 00:06:16 AI engineering team, but in a way that you can customize exactly for your particular needs. Like I said, that AI tool finder feature that we built with them is already a key part of the super intelligent platform and we are working on something new as well. Fractional works with everyone from startups to the Fortune 500. And to request a free consultation, you can go to fractional.aI. If you want help identifying and building AI projects for your business, then I highly recommend you hit pause, open a web browser and go to fractional.com. to request a free consultation. Today's episode is brought to you by Venice.

Starting point is 00:06:48 The leading AI companies store your entire conversation history and attach it to your identity forever. That's every question you ask, every answer you receive, every image you generate, every thought you share with the machine it's all being spied on. If you trust all the companies, hackers and NSA board members that will ever have access to your AI conversations, then rejoice, for you are well served.

Starting point is 00:07:07 For the rest of us, Venice is an alternative. Venice is a powerful AI app for text, image, and code generation that respects you as a sovereign individual and believes privacy and free speech are not only human rights, but necessary for civilizational advancement. Private, permissionless, and uncensored, you can try it for free without an account. AIA Daily Brief listeners receive a 20% discount on Venice Pro. Visit venice.aI slash NLW and enter the discount code, NLW Daily Brief. That's NLW Daily Brief. All one word. Today's episode is brought to you by Superintelligent, which is of course our platform that helps you learn how to use AI tools and perhaps even more importantly, gives you ideas on the best use cases that are actually going to help you achieve whatever it is you want to achieve. To recognize the end of summer and back to school slash back to work, we are running our best promotion ever when you sign up for Superintelligent between now and the end of August using code so back, your first month will be one.

Starting point is 00:08:05 100% free. The platform features over 600 fun, highly practical AI tutorials that get you using AI fast and with an eye to actually transforming how you get things done. We've just launched Super for Teams. So if you have a group of people at your company that want to figure out how to use AI together, I highly suggest you check it out. But for those of you who are using Super Intelligence as an individual, once again, if you sign up for Superintelligent between now and the end of the month using code so back, you will get your first month 100% free. Go to B-super.a.I. And check it out today.

Starting point is 00:08:39 Okay, so you're starting to get a feel here that there really is something different going on, that on these tests of intelligence and reasoning, this model is doing really well. But, of course, you might be someone like me who's saying, that's all well and good and wonderful and cool and advanced, but what are the problems that it actually solves for me,

Starting point is 00:08:54 especially the boring day-in-a-day-out problems. I will note that at this stage, there is a lot less of that experimentation that has gone on, but we are starting to see at least some of it. Professor Ethan Malick, who has had access for a little while now, writes, fun things to do with your limited 01 preview that can show you the power and limitations. His three ideas, give it an RFP and just ask it to do the work,

Starting point is 00:09:15 give it an academic paper and ask it to offer strategies for replication, ask it to create an entrepreneurial product that it can build. You'll note here, especially as we compare this to what Daniel said, that no matter what, you're going to be in the realm of the subjective. For example, one of the things that he writes is, come up with an idea for a startup that you can implement entirely for me and tell me how to do it. It thought for 10 seconds and came up with an AI-powered personal productivity coach, and of course to really understand how good it was at this, you probably want to run this through GPT4 to see how it

Starting point is 00:09:41 compares. Ali Miller, who is focused on AI in business, tried to get into a few examples of specific business-type tasks. One was an optimized staffing schedule, where she gave it a complex office setup and asked it to figure things out. Another was the design of an efficient warehouse layout. And what I like about these two examples is that while they are business challenges, they actually involve what you could consider a right answer in the sense that if you're looking for

Starting point is 00:10:07 staffing schedule optimization, maybe there are different criteria for what a right answer could be, but if you pick a criteria, you can't actually come up with a right answer. Same with a warehouse layout. It's not just subjective based on what layout you happen to like more. It's actually going to be based on factors like how much you can fit in and how much money you can make based on that. One that's a little fuzzier and more generally strategic, she had it assessed the risks of a company merger. This is one like with Professor Mollick's example, I'd want to see GBT40's comparative answer, although the difference here is that rather than just generalist risks, Ali was trying it with specific financial information so that O1 could deploy its new reasoning

Starting point is 00:10:44 capabilities that actually involve numbers in math. Similar, her last example is evaluating an investment project. So this, I think, is super instructive. What Ali is uncovering is that O1 can be much better at business strategy questions, specifically when those strategy questions involve numbers and when there is, in fact, a correct answer based on some criteria, aka the more objective the question, the better 01 is going to be at helping you. I was interested to see, though, in the context of business strategy, that wasn't strictly

Starting point is 00:11:14 objective, it was more subjective, were there still improvements with the new model? So I tried the same prompt on GPT40 and on 01 preview. I basically used the super intelligent example. I said my company is an AI enablement platform that helps companies catalog and track, all their AI usage, were a seed-stage company, so have limited resources for sales and business development. What market segment would you focus on, SMB mid-market or large corporations, and why, and then create a sales plan to reach them? The responses from 40 and 01 on this prompt were fairly similar. They both determined that mid-market was the best approach for a variety of reasons,

Starting point is 00:11:46 and there was a lot of similarities in the plans that they came up with. To the extent that there was a difference, and there was, it was less about the quality of the strategic thinking and more about comprehensiveness. O-1 preview went a lot deeper on each of its various points. In other words, showing more reasoning. To the extent then that you are using chat GBT as a brainstorming partner, it may be that the model which shows more of its reasoning and is more comprehensive is more likely to help you actually make your own decisions.

Starting point is 00:12:17 Still, to the extent that there is a clear early winner in terms of how people are using and loving O1 preview, it's for coding. Amar Reshi, who has to be able to. as the head of design at 11 Labs, writes, just combined O-1 and Cursor Composer to create an iOS app in under 10 minutes. Amar used O-1 Mini because O-1 was actually taking too long to kick off the project and then switched to O-1 preview to finish the details. He was able to get to a full-weather app for iOS with animations in under 10 minutes.

Starting point is 00:12:43 Sir Abchalki writes, GPT-O-1 just generated a holographic shader from scratch, saving me and future XR devs from shelling out big bucks on asset stores. In retrospect, software engineering was great while it lasted, a new fork on our tech tree. Riley Brown, who is now a couple months deep in his transformation from not coding to coding thanks to AI, wrote, guys, I just shed a tear, was hit with a pinch me I must be dreaming moment. And in this way, O-1 has nudged the conversation that we started yesterday around AI eating SaaS down the line a little further. Arab tweets, my hypothesis on the future of software.

Starting point is 00:13:15 Billion dollar SaaS companies are cooked. Marginal cost of building software products is going to zero. In the next two years, I'll be able to build any SaaS tool I need. Need a CRM? Prompt 1 three and it's made. Just for me and with every feature you need. Many startups and giants today will crumble. The only software products that will remain will have the following criteria. Network effects like social products, superior design, products that feel better, or distribution that can reach more people. Obviously, this isn't about 01 directly, but I do think that 01 advances the credibility of this type of thinking. So in terms of advice on how to use this, Andrew Main writes, don't think of it like a traditional chat model. Frame 01 in your mind as a really

Starting point is 00:13:50 smart friend you're going to send a DM to solve a problem. Andrew also says that planning out what you need in a prompt is really beneficial here. OpenAI also published prompting guidelines. They suggest keeping prompts simple and direct, avoiding chain of thought prompts, using delimiters for clarity, and limiting additional context in retrieval augmented generation. Overall, I think we are just at the cusp of starting to understand what this model is useful for. Ethan Malick again writes, it's becoming clear that AGI is going to be as jagged as everything else about AI. We will see superhuman ability at narrow tasks happen one by one, with obvious gaps or lags and many others. When a universal AGI will happen is unclear, but a jagged AGI-ish world seems likely. My big takeaway so far,

Starting point is 00:14:31 as I mentioned, if you are looking for how to use O1 for business, is that the more there is an objective right answer, the better suited to solving that problem O1 is going to be. At the end of the day, of course, however, there is no substitute for experimentation. So turn this off, fire up chat, GPT. If you have a pro account, you now have access to O1, and I'm very excited to see what you build. Appreciate you listening as always. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - How to Get The Most Out of ChatGPT's New o1 Model

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.