The AI Daily Brief: Artificial Intelligence News and Analysis - The Most Important AI News From A Week Where We Glimpsed The Future
Episode Date: May 13, 2023On this week's Weekly Recap: OpenAI Shap-E text-to-3D Anthropic 100k context window Meta ImageBind multimodal HuggingFace Transformers Agents Google I/O announcements Influencer chatbot mak...es $70k selling conversation at $1/minute Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown
Transcript
Discussion (0)
Today on the AI breakdown, we cover the week's most important AI stories, from Google I.O to 100K
context limits to the future of LLMs in multimodal models. The AI breakdown is a daily video and
audio show covering the most important news in AI. If you're enjoying it, please like, subscribe,
and share. Welcome back to the AI breakdown. Today, we are doing the weekly recap. And man,
this week was another wild week. It was so jam-packed with news. It kind of felt like preview.
We started off with a lot of interesting announcements in the 3D generative space.
So OpenAI released some new research for Shape E, which is their latest work on 3D generative
modeling.
You can see here a chair that looks like an avocado, an airplane that looks like a banana.
And while these things might seem like little toys or novelty when you look at them here,
this idea of text to 3D has obviously huge implications for gaming, for metaverse 3D world-type
things and of course for 3D printing. Now speaking of 3D worlds, Lovelace Studios also announced
NIRIC, which is their AI World Generation platform, which is effectively text to 3D world.
It's very, very cool. From there we move into a major update around LLMs. The context window for
chat GPT right now is 8,000 tokens. That means that if you have a document that you want
analyze that's more than 8,000 tokens, you have to segment it up and that can lose valuable context.
It can make the results poorer than they would be if GPT could ingest the entire information all at
once. Now, ChatGPT432K is coming and people have been absolutely hyped about that. In fact, I did a video
about all the different use cases that that might be get. Well, Anthropic went and won up them with their
100K context window for Claude, which was just announced earlier this week. So this new
context window is 100,000 tokens of text, which corresponds to around 75,000 words. That's a little more
than one great Gatsby, but it's also more than the average, for example, SEC 10K. And that type of
corporate filing is the example that they used. A lot of the initial response was really excited.
Matt Schumer here had a very interesting set of tweets. The first one, he said,
Claude 100K is amazing, but it's not perfect. I gave it the entire 100-page GPT4 technical report
and asked for a summary.
It hallucinated multiple details, including mentioning that GPT4 has 15 billion parameters,
which was never mentioned in the report.
But then Matt responded to himself, saying, I messed up.
The Anthropic Playground doesn't use the updated 100K token model.
So I ran the same test via API, and holy shit, the results are amazing.
Now, for a slightly more systematic test, let's turn over to Jerry Liu.
He writes,
How well does Claudev1 with the 100K context token limit do on Uber SEC 10K filings?
Well, they said their high-level findings were that Anthropics 100K model does well,
holistic understanding of the data.
They write, Anthropics model does demonstrate an impressive capability to synthesize insights
across the entire context window to answer the question at hand.
It can miss details, though.
What it also did well was latency, and they said this one was surprising to us.
Anthropics model is able to crunch an entire Uber 10K filing in 60.
to 90 seconds, which seems long, but is much faster than repeated API calls to GPT3, which when
added up can take minutes.
Now, they say where Anthropics 100K model doesn't do well, cost, this one is obvious.
Every query we ran processed hundreds of thousands of tokens at $11 per million tokens for
Claudev1.
This equates to $1 per query, which can quickly add up.
Second, they said, reasoning over more complicated prompts.
I'm sure that over the next few days, we'll get even more people testing this out and we'll
have to see just how it does in practice. Next, we turn to an announcement that I think is
reflective of a huge trend, and that is Facebook's ImageBind, their newly open source multimodal
model. ImageBind works across six different modalities, including images, text, audio, depth, thermal,
and inertial movement data. So what does this look like in practice? Well, let's actually listen to
this clip of Mark Zuckerberg explaining it. All right, check this out. Most AI models only work across
one or two modes. But our new image bind model works across six, text, audio, images and video,
3D, thermal, and motion data. You give it input in one form and it can relate it to any others.
It works more like our own imagination. If you give it a picture of a beach, it can find the sound of
waves. If you give it a photo of a tiger and the sound of a waterfall, it can give you a video
that combines both. This is a step towards AIs that understand the world around them,
more like we do, which will make them a lot more useful and will open up totally new ways to create
things. We're open sourcing ImageBind so everyone in the world can access and build on top
of these state-of-the-art models. I'm excited to see what you build. Pretty cool stuff, right? Dr. Jim
fan from Nvidia says, wow, meta is on open-source steroids since Lama. ImageBind Meta's latest
multimodal embedding covering not only the usual suspects text image audio, but also depth, thermal, and IMU
signals. Jim should know about this, given that he announced NVIDIA's research in this area
called Prismar a couple months ago. He wrote then, after ChatGPT, the future belongs to multimodal
LLMs. What's even better? Open sourcing. Announcing Prismar, my team's latest vision
language AI, empowered by domain expert models in depth, surface normal, segmentation, etc.
Now, bringing it back to this week, I will note that that was not the only multimodal project
to make headlines. Hugging Face tweets on Wednesday. We
Just released Transformers' boldest feature, Transformers' agents.
This removes the barrier of entry to machine learning.
Control 100,000-plus Hugging Face models by talking to transformers and diffusers.
Fully multimodal agent, text, images, video, audio, docs.
So just to give an example, they say, create an agent using LLMs, like Open Assistance,
StarCode, or OpenAI, and start talking to Transformers and diffusers.
It responds to complex queries and offers a chat mode.
Create images using your words, have the agent read the summary of,
websites out loud, read through a PDF, etc. So the example that they show includes a text prompt,
draw me a picture of rivers, lakes, and trees, which it does. Then it moves to another image,
saying, transform the image, a frozen lake and snowy forest, which it does. Then they say,
then they move to another, then they move to another image that says read out loud the content of
the image. And it has a sound icon saying a river flowing through a frozen forest. Now,
Hugging Face gets a lot more into what's actually going on underneath the surface and what people can
build, but I think that the point here, the takeaway is that we are already, even though for so
many people they're just discovering LLMs and chat GPT and things like it, we're already looking
to the next area in which a multi-sensory experience that cuts across not only text and images,
but also motion, also sound, also temperature data, environmental data. It's just on the horizon.
But you might say we don't even really understand how existing LLMs work. Well, this week we
also got some updates on that front. One of the big stories from the beginning of the week was that
OpenAI released research where they had used GPT4 to label all 307,200 neurons in GPT2. As Siki Chen puts it,
they labeled each with a plain English description of the role each neuron plays in the model.
This opens up a new direction in explainability and alignment in AI, helping make models
more explainable and potentially easier to align. Interpretability or otherwise put our ability to
understand what's going on inside models is one of the big challenges for AI researchers.
It's also something that's been flagged frequently by AI safety experts as a huge reason why
maybe we should consider pausing, we're trying to put this genie back in the bottle before it's
fully out. Their reasoning is that if we don't understand what's going on now, how are we going
to understand what's going on once these models start to become sentient and decide that they want
out of their constraints? Given that, it was notable this week that Eleazar Yudkowski said that
this research surprised him in that people went out and did it, and that while even though he still
had big questions, his P. Doom, the percentage chance he ascribed of humans dying because of
AI had gone down a little. Not bad. Back to Anthropic, their one other big announcement this week
was that they described in more detail their approach to what they call constitutional AI. Effectively,
their goal with this is rather than just reinforcement learning through human feedback, which they cite
a number of problems with, including most notably scalability, constitutional AI trains a model
on a set of underlying principles
and tries to get it to reason for itself
with those principles to conform
to what is effectively a certain set of ethics.
Anthropic had discussed this approach before,
but this week they actually released the principles in full.
They were sourced from areas like the Universal Declaration of Human Rights,
Apple's Terms of Service,
principles encouraging consideration of non-Western perspectives,
principles inspired by DeepMind Sparrow Rules, and more.
Still, when it came to what a lot of people will remember this week for,
it will be this.
AI, AI, AI, Generative AI, Generative AI,
AI, AI, AI, AI, AI, AI, AI, AI, AI, AI, AI, AI, AI, AI, AI.
It uses AI to bring AI, AI, AI, AI, AI.
Yes, this week was Google IO, their developer conference, and it was all about AI.
I actually did a video ranking their top five AI announcements from their new Palm 2 model, to
barred upgrades, to the integration of AI into Google workspace, to generative AI in search,
which I think is the big one, given that it's Google's biggest mode
and potentially the biggest change to the internet that we've seen for some time,
but other people had other things that they were most excited about.
Anyways, moving on through a couple more important stories before we wrap up.
One thing that's happening next week, which should be interesting,
is that OpenAI CEO Sam Altman has been called to testify before Congress.
This is the first time he will do so.
He'll testify alongside Gary Marcus and someone from IBM.
And it should give us a pretty good chance to see where that conversation,
is starting right now, what crazy priors people from all sides of the aisle are bringing to the
discourse, and where it seems like there might be challenges or opportunities when it comes to
regulation going forward. In the meanwhile, the pace of AI just will not slow down. Another story
that had people squawking this week was that of rewind. About a month ago, they tweeted,
AI is so hot right now, we've had 100 plus investors reach out. We don't have time to meet with
everyone, so instead we're sharing our investor presentation with the world. More than anything,
hope this transparency builds customer trust. Well, less than a month later, they had raised a Series
A investment round at a $350 million valuation. They fielded over 1,000 offers, including 22 investors
who wanted to invest at a billion dollar valuation. Holding aside any of the merits of Rewind
specifically, this is just a perfect reflection of how hyped and hot it is from a VC perspective
right now. And finally, we have to close it out with this. The woman who made the most
money from AI last week. I mean, maybe, who knows? Karen Marjorie trained a chat bot on thousands of
hours of her videos from Snapchat, from Instagram, and beyond, and got the brilliant idea to start
charging a dollar a minute for access. In her beta test, she made just under $72,000 in one week.
Whether this is novelty, insanity, or the next great trend, kudos to Karen for being first to figure
it out. Anyways, guys, that is it for the AI breakdown weekly recap. I hope that you learn something. I
hope that you left inspired. I hope that you can't believe it's like this every week.
Hope you're having a great weekend. And until next time, peace.
