The AI Daily Brief: Artificial Intelligence News and Analysis - The Most Important AI News From A Week Where We Glimpsed The Future

Starting point is 00:00:00 Today on the AI breakdown, we cover the week's most important AI stories, from Google I.O to 100K context limits to the future of LLMs in multimodal models. The AI breakdown is a daily video and audio show covering the most important news in AI. If you're enjoying it, please like, subscribe, and share. Welcome back to the AI breakdown. Today, we are doing the weekly recap. And man, this week was another wild week. It was so jam-packed with news. It kind of felt like preview. We started off with a lot of interesting announcements in the 3D generative space. So OpenAI released some new research for Shape E, which is their latest work on 3D generative modeling.

Starting point is 00:00:44 You can see here a chair that looks like an avocado, an airplane that looks like a banana. And while these things might seem like little toys or novelty when you look at them here, this idea of text to 3D has obviously huge implications for gaming, for metaverse 3D world-type things and of course for 3D printing. Now speaking of 3D worlds, Lovelace Studios also announced NIRIC, which is their AI World Generation platform, which is effectively text to 3D world. It's very, very cool. From there we move into a major update around LLMs. The context window for chat GPT right now is 8,000 tokens. That means that if you have a document that you want analyze that's more than 8,000 tokens, you have to segment it up and that can lose valuable context.

Starting point is 00:01:34 It can make the results poorer than they would be if GPT could ingest the entire information all at once. Now, ChatGPT432K is coming and people have been absolutely hyped about that. In fact, I did a video about all the different use cases that that might be get. Well, Anthropic went and won up them with their 100K context window for Claude, which was just announced earlier this week. So this new context window is 100,000 tokens of text, which corresponds to around 75,000 words. That's a little more than one great Gatsby, but it's also more than the average, for example, SEC 10K. And that type of corporate filing is the example that they used. A lot of the initial response was really excited. Matt Schumer here had a very interesting set of tweets. The first one, he said,

Starting point is 00:02:21 Claude 100K is amazing, but it's not perfect. I gave it the entire 100-page GPT4 technical report and asked for a summary. It hallucinated multiple details, including mentioning that GPT4 has 15 billion parameters, which was never mentioned in the report. But then Matt responded to himself, saying, I messed up. The Anthropic Playground doesn't use the updated 100K token model. So I ran the same test via API, and holy shit, the results are amazing. Now, for a slightly more systematic test, let's turn over to Jerry Liu.

Starting point is 00:02:53 He writes, How well does Claudev1 with the 100K context token limit do on Uber SEC 10K filings? Well, they said their high-level findings were that Anthropics 100K model does well, holistic understanding of the data. They write, Anthropics model does demonstrate an impressive capability to synthesize insights across the entire context window to answer the question at hand. It can miss details, though. What it also did well was latency, and they said this one was surprising to us.

Starting point is 00:03:20 Anthropics model is able to crunch an entire Uber 10K filing in 60. to 90 seconds, which seems long, but is much faster than repeated API calls to GPT3, which when added up can take minutes. Now, they say where Anthropics 100K model doesn't do well, cost, this one is obvious. Every query we ran processed hundreds of thousands of tokens at $11 per million tokens for Claudev1. This equates to $1 per query, which can quickly add up. Second, they said, reasoning over more complicated prompts.

Starting point is 00:03:49 I'm sure that over the next few days, we'll get even more people testing this out and we'll have to see just how it does in practice. Next, we turn to an announcement that I think is reflective of a huge trend, and that is Facebook's ImageBind, their newly open source multimodal model. ImageBind works across six different modalities, including images, text, audio, depth, thermal, and inertial movement data. So what does this look like in practice? Well, let's actually listen to this clip of Mark Zuckerberg explaining it. All right, check this out. Most AI models only work across one or two modes. But our new image bind model works across six, text, audio, images and video, 3D, thermal, and motion data. You give it input in one form and it can relate it to any others.

Starting point is 00:04:35 It works more like our own imagination. If you give it a picture of a beach, it can find the sound of waves. If you give it a photo of a tiger and the sound of a waterfall, it can give you a video that combines both. This is a step towards AIs that understand the world around them, more like we do, which will make them a lot more useful and will open up totally new ways to create things. We're open sourcing ImageBind so everyone in the world can access and build on top of these state-of-the-art models. I'm excited to see what you build. Pretty cool stuff, right? Dr. Jim fan from Nvidia says, wow, meta is on open-source steroids since Lama. ImageBind Meta's latest multimodal embedding covering not only the usual suspects text image audio, but also depth, thermal, and IMU

Starting point is 00:05:28 signals. Jim should know about this, given that he announced NVIDIA's research in this area called Prismar a couple months ago. He wrote then, after ChatGPT, the future belongs to multimodal LLMs. What's even better? Open sourcing. Announcing Prismar, my team's latest vision language AI, empowered by domain expert models in depth, surface normal, segmentation, etc. Now, bringing it back to this week, I will note that that was not the only multimodal project to make headlines. Hugging Face tweets on Wednesday. We Just released Transformers' boldest feature, Transformers' agents. This removes the barrier of entry to machine learning.

Starting point is 00:06:04 Control 100,000-plus Hugging Face models by talking to transformers and diffusers. Fully multimodal agent, text, images, video, audio, docs. So just to give an example, they say, create an agent using LLMs, like Open Assistance, StarCode, or OpenAI, and start talking to Transformers and diffusers. It responds to complex queries and offers a chat mode. Create images using your words, have the agent read the summary of, websites out loud, read through a PDF, etc. So the example that they show includes a text prompt, draw me a picture of rivers, lakes, and trees, which it does. Then it moves to another image,

Starting point is 00:06:37 saying, transform the image, a frozen lake and snowy forest, which it does. Then they say, then they move to another, then they move to another image that says read out loud the content of the image. And it has a sound icon saying a river flowing through a frozen forest. Now, Hugging Face gets a lot more into what's actually going on underneath the surface and what people can build, but I think that the point here, the takeaway is that we are already, even though for so many people they're just discovering LLMs and chat GPT and things like it, we're already looking to the next area in which a multi-sensory experience that cuts across not only text and images, but also motion, also sound, also temperature data, environmental data. It's just on the horizon.

Starting point is 00:07:20 But you might say we don't even really understand how existing LLMs work. Well, this week we also got some updates on that front. One of the big stories from the beginning of the week was that OpenAI released research where they had used GPT4 to label all 307,200 neurons in GPT2. As Siki Chen puts it, they labeled each with a plain English description of the role each neuron plays in the model. This opens up a new direction in explainability and alignment in AI, helping make models more explainable and potentially easier to align. Interpretability or otherwise put our ability to understand what's going on inside models is one of the big challenges for AI researchers. It's also something that's been flagged frequently by AI safety experts as a huge reason why

Starting point is 00:08:07 maybe we should consider pausing, we're trying to put this genie back in the bottle before it's fully out. Their reasoning is that if we don't understand what's going on now, how are we going to understand what's going on once these models start to become sentient and decide that they want out of their constraints? Given that, it was notable this week that Eleazar Yudkowski said that this research surprised him in that people went out and did it, and that while even though he still had big questions, his P. Doom, the percentage chance he ascribed of humans dying because of AI had gone down a little. Not bad. Back to Anthropic, their one other big announcement this week was that they described in more detail their approach to what they call constitutional AI. Effectively,

Starting point is 00:08:45 their goal with this is rather than just reinforcement learning through human feedback, which they cite a number of problems with, including most notably scalability, constitutional AI trains a model on a set of underlying principles and tries to get it to reason for itself with those principles to conform to what is effectively a certain set of ethics. Anthropic had discussed this approach before, but this week they actually released the principles in full.

Starting point is 00:09:08 They were sourced from areas like the Universal Declaration of Human Rights, Apple's Terms of Service, principles encouraging consideration of non-Western perspectives, principles inspired by DeepMind Sparrow Rules, and more. Still, when it came to what a lot of people will remember this week for, it will be this. AI, AI, AI, Generative AI, Generative AI, AI, AI, AI, AI, AI, AI, AI, AI, AI, AI, AI, AI, AI, AI, AI.

Starting point is 00:09:33 It uses AI to bring AI, AI, AI, AI, AI. Yes, this week was Google IO, their developer conference, and it was all about AI. I actually did a video ranking their top five AI announcements from their new Palm 2 model, to barred upgrades, to the integration of AI into Google workspace, to generative AI in search, which I think is the big one, given that it's Google's biggest mode and potentially the biggest change to the internet that we've seen for some time, but other people had other things that they were most excited about. Anyways, moving on through a couple more important stories before we wrap up.

Starting point is 00:10:08 One thing that's happening next week, which should be interesting, is that OpenAI CEO Sam Altman has been called to testify before Congress. This is the first time he will do so. He'll testify alongside Gary Marcus and someone from IBM. And it should give us a pretty good chance to see where that conversation, is starting right now, what crazy priors people from all sides of the aisle are bringing to the discourse, and where it seems like there might be challenges or opportunities when it comes to regulation going forward. In the meanwhile, the pace of AI just will not slow down. Another story

Starting point is 00:10:40 that had people squawking this week was that of rewind. About a month ago, they tweeted, AI is so hot right now, we've had 100 plus investors reach out. We don't have time to meet with everyone, so instead we're sharing our investor presentation with the world. More than anything, hope this transparency builds customer trust. Well, less than a month later, they had raised a Series A investment round at a $350 million valuation. They fielded over 1,000 offers, including 22 investors who wanted to invest at a billion dollar valuation. Holding aside any of the merits of Rewind specifically, this is just a perfect reflection of how hyped and hot it is from a VC perspective right now. And finally, we have to close it out with this. The woman who made the most

Starting point is 00:11:24 money from AI last week. I mean, maybe, who knows? Karen Marjorie trained a chat bot on thousands of hours of her videos from Snapchat, from Instagram, and beyond, and got the brilliant idea to start charging a dollar a minute for access. In her beta test, she made just under $72,000 in one week. Whether this is novelty, insanity, or the next great trend, kudos to Karen for being first to figure it out. Anyways, guys, that is it for the AI breakdown weekly recap. I hope that you learn something. I hope that you left inspired. I hope that you can't believe it's like this every week. Hope you're having a great weekend. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - The Most Important AI News From A Week Where We Glimpsed The Future

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.