The AI Daily Brief: Artificial Intelligence News and Analysis - From Cointelligence to Agent
Episode Date: September 16, 2024A reading and discussion inspired by https://www.oneusefulthing.org/p/something-new-on-openais-strawberry Concerned about being spied on? Tired of censored responses? AI Daily Brief listeners receive... a 20% discount on Venice Pro. Visit https://venice.ai/nlw and enter the discount code NLWDAILYBRIEF. Learn how to use AI with the world's biggest library of fun and useful tutorials: https://besuper.ai/ Use code 'podcast' for 50% off your first month. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown
Transcript
Discussion (0)
From co-intelligence to agent, today we are reading an essay on Open AIs just released
O1 model.
The AI Daily Brief is a daily podcast and video about the most important news and discussions
in AI.
To join the conversation, follow the Discord link in our show notes.
Hello, friends.
Happy weekend.
It, of course, being the weekend means it is time for a long reads episode.
And today we are doing sort of the third in our troika of OpenAI-O-1-related shows.
This essay comes once again from Professor Ethan Mollick's blog, One Useful Thing,
which is at One Useful Thing.org,
and which you should absolutely subscribe to,
and it's called something new,
on OpenAI Strawberry and Reasoning.
Ethan writes,
I've had access to the much-rumored OpenAI Strawberry
Enhanced Reasoning System for a while,
and now that it is public, I can finally share some thoughts.
It is amazing, still limited,
and perhaps most importantly, a signal of where things are headed.
The new AI model called O1 Preview,
as an aside, why are the AI company so bad at names,
lets the AI, quote-unquote,
think through a problem before solving it.
This lets it address very hard problems that require planning and iteration, like novel math or science
questions. In fact, it can now beat human PhD experts in solving extremely hard physics problems.
To be clear, O1 Preview doesn't do everything better. It is not a better writer than GPT40, for example,
but for tasks that require planning, the changes are quite large. For example, I gave O1 preview the
instruction, figure out how to build a teaching simulator using multiple agents in generative AI,
inspired by the paper attached and considering the views of teachers and students, write the code and be
detailed in your approach. I then paste it in the full text of a paper. The only other prompt I gave
it was build the full code. Ethan then shares a video of what the system produced. Section, strawberry
and action. But it is hard to evaluate all of this complex output, so perhaps the easiest way to show
the gains of strawberry and some limitations is with a game. A crossword puzzle. I took the eight clues
from the upper left-hand corner of a very hard crossword puzzle and translated that into text because
O-1 Preview can't see images yet. The clues include Galaxy Cluster,
part of a midway skill crane, beat it, caught, bits of dishonesty, fall report, much of Carmen's
premier audience, and preferred place of athletic contact. Ethan continues, crossword puzzles are especially
hard for LLMs because they require iterative solving, trying and rejecting many answers that all
affect each other. This is something LLMs can't do, since they can only add a token or word at a time
to their answer. When I gave the prompt to Claude, for example, it first comes up with an answer
for one down, it guesses star, which is wrong, and then is stuck trying to figure out that
the rest of the puzzle with that answer, ultimately failing to come even close. Without a planning
process, it just has to charge ahead. But what happens when I give this to Strawberry? The AI, quote-unquote,
thinks about the problem first for a full 108 seconds. Most problems are solved in much shorter times.
You can see its thoughts, a sample of which are below, and which are super illuminating.
You can see its thoughts include tackling the clues, noticing patterns, investigating word patterns,
piecing together clues, assessing clues together, breaking down clues, examining letter alignment,
breaking down the puzzle, charting the course, weighing options, evaluating word choices, evaluating
potential fits, assessing choices, weighing options, assessing options, etc, etc. Ethan continues,
the LLM iterates repeatedly, creating and rejecting ideas. The results are pretty impressive and it does
well, but O1 preview is still seemingly based on GPD-40, and it is a little too literal to solve this
rather unfair puzzle. The answer to one down, Galaxy cluster, is not a reference to real galaxies,
but rather a reference to the Samsung Galaxy phone, and so the answer is apps.
Stuck on real galaxies, the AI instead kept trying out the name of actual galactic clusters
before deciding one down is coma, which is real galactic cluster I had no idea.
Thus, the rest of the results are not correct and do not fit the rules exactly, but are pretty creative.
To see if we could get further, I decided to give it a clue. One down is apps.
The AI takes another minute. Again, in a sample of its thinking, you can see how it
a rate's ideas. For example, for the clue bits of dishonesty, four letters which it now knows
starts with A, possible answers starting with an A. L. Fibs but starts with an Fibs but starts with an F,
a bit doesn't fit, cons doesn't start with an A, Aces doesn't fit the clue. Alternative answer,
arts doesn't fit the clue. Wait, perhaps it's aces as bits in card games but doesn't fit
dishonesty. Another possibility is acts. Acts can mean deeds, but bits of dishonesty could be
acts as in deceptive acts. Tenatively, acts. The final answer is completely correct solving all the hard
references, though it does hallucinate a new clue 23 across, which is not in the puzzle I gave it.
So, O1 preview does things that would have been impossible without strawberry, but it still isn't
flawless. Errors and hallucinations still happen, and it is still limited by the intelligence of GPT-40
as the underlying model. Since getting the new model, I haven't stopped using Claude to critique my posts.
Claude is still better at style, but I did stop using it for anything involving complex planning or
problem solving. It represents a huge leap in those areas. Closing, from co-intelligence to dot-d-dot.
Using O1 preview means confronting a paradigm change in AI. Planning is a form of agency, where the AI arrives
at conclusions about how to solve a problem on its own without our help. You can see from the examples
above that the AI does so much thinking and heavy lifting, churning out complete results,
that my role as a human partner feels diminished. It just does its thing and hands me an answer.
Sure, I can sift through its pages of reasoning to spot mistakes, but I no longer feel as connected to the
AI output, or that I am playing as large a role in shaping where the solution is going.
This isn't necessarily bad, but it is different. As these systems level up and inch towards
true autonomous agents, we're going to need to figure out how to stay in the loop, both to catch
errors and to keep our fingers on the pulse of the problems we're trying to crack.
O1 Preview is pulling back the curtain on AI capabilities we might not have seen coming,
even with its current limitations. This leaves us with a crucial question. How do we evolve
our collaboration with AI as it evolves? That is a problem that O1 Preview cannot yet solve.
Today's episode is brought to you by Venice. Venice is a private, uncensored generative AI app.
It accesses open source models to enable text, image, and code generation without the fear of being spied on or having your data exploited.
Discuss anything with Venice without concern about it being monitored, sold, or given to advertisers and governments.
Venice is different because your conversations and creations are kept securely within the browser, never stored or accessible by Venice.
Unlike other AI apps, Venice won't tell you what's okay to say or not. Venice won't patronize you.
It simply provides direct access to machine intelligence, no topics are off limits, no ideas,
or taboo.
With Venice, you're in control of the AI as you should be.
Pro subscriptions are available for $49 a year or $8 per month.
AI Daily Brief listeners receive a 20% discount on Venice Pro.
Visit venice.a.i slash NLW and enter the discount code NLW Daily Brief.
That's NLW Daily Brief, all one word.
Today's episode is brought to you by Super Intelligent, which is of course our point.
platform that helps you learn how to use AI tools and perhaps even more importantly, gives you
ideas on the best use cases that are actually going to help you achieve whatever it is you want
to achieve. To recognize the end of summer and back to school slash back to work, we are running
our best promotion ever when you sign up for super intelligent between now and the end of August
using code so back your first month will be 100% free. The platform features over 600 fun, highly
practical AI tutorials that get you using AI fast and with an eye to actually transforming how you
get things done. We've just launched Super for Teams. So if you have a group of people at your company
that want to figure out how to use AI together, I highly suggest you check it out. But for those
of you who are using Super Intelligent as an individual, once again, if you sign up for Super
intelligent between now and the end of the month using code so back you will get your first
month 100% free, go to B-Super.aI and check it out today.
another thought-provoking piece from Ethan here. A couple things that I think are worth honing in on.
First, this idea that although still nascent and although this is not agentic, there is the first
glimpses on how we move from the assistant paradigm to the agent paradigm. Again, Ethan writes
planning as a form of agency where the AI arrives at conclusions about how to solve a problem
on its own without our help. The key to agents is exactly that process. And also, I believe that
the feeling that Ethan has of feeling more out of the loop is real. And there are potentially real
consequences. The more we learn to trust AI, the less able to catch errors and hallucinations
we're going to be. At the same time, at least from the limited sort of business context that I tend
to think about AI from initially, it's hard not to view this as pretty much just exclusively an
upgrade and quite exciting. One of the things I discussed in a recent video about how to get the most
out of 01 is that when it comes to business problems, business strategy in particular, where 01 is
good is when there is an objective right answer. You might remember we had Ali Miller
experimenting with some optimization problems. Staffing scheduling optimization is one example,
office warehouse planning and arrangement as another. These are areas where, yes, there is strategy,
but depending on the variable that you pick, whether you're trying to maximize, for example,
revenue or revenue opportunity in the form of warehouse space, there could actually be an
objective right answer. Where things get blurier is when there's not an objective answer,
when it's still subjective. I noted when I was exploring how 01 did versus GPT40 on creating a sales
strategy for superintelligent, that they had pretty similar outputs, with 01 just being slightly
more comprehensive. So when I think then about the way that this reasoning paradigm improves
ability to make business decisions, it's two different types of benefit. For things that have a
genuine right answer, the AI can get there faster, which is useful because who wants to spend a
bunch of time thinking about staffing optimization, for example. And when there's not an exact right answer,
AI has just become an even better brainstorming partner. I think ultimately that a huge amount of
business, perhaps the vast majority of it, does not have an easy or obvious right answer. Some discrete
problems and tasks within it do. But in terms of what to do next, it's usually all about weighing
different options, which have different benefits and risks, comparing them to the capabilities
that you have or that your team has, and making decisions from there. The number of variables
makes it basically impossible for there to ever be a strictly right answer, at least without the
benefit of hindsight. And so even with more advanced reasoning AI, there's still a need at the
end of the day to make a call. Now, of course, it's not inconceivable that there's a point at the
future where people decide that AI is better on average at weighing those different possibilities,
but I remain somewhat skeptical of this.
I think that humans as orchestrators of vast armies of AI
that radically increase our productivity as a whole
and just our ability to do things
is a more likely scenario than one where humans are cut out entirely.
But of course we can't know the future until we get there.
And so for now, I appreciate you hanging out
and trying to get this little preview of it with me.
Until next time, peace.
