The AI Daily Brief: Artificial Intelligence News and Analysis - AI Gaming Lights Up: The Biggest AI News This Week

Starting point is 00:00:00 Today on the AI breakdown, the weekly recap looks at everything from advances in AI gaming to sentiment around chat GPT to the state of the AI alignment conversation. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Like, subscribe and share and go to breakdown.network for more information. What's going on, guys? Welcome back to the AI breakdown's weekly recap. Today, we're kicking off with a section all about gaming. This is a major theme. this week in a couple of different ways. First of all, there was a lot of discussion around Unity's new AI platform. Unity is basically creating a text-to-game environment so you can see, give me a large-scale

Starting point is 00:00:42 terrain with a moody sky, add a dozen NPCs, make them aliens. This is just a little teaser that went live this week, but now developers are able to sign up for Unity's AI beta, which will be rolling out in the next few weeks. In an interview this week, the Unity CEO talked about all the different ways in which generative AI will transform game development from smart NPCs to infinite worlds to non-scripted interactions, infinite levels, and much faster development.

Starting point is 00:01:08 Now the cool thing is we're starting to see some amount of this type of thing actually happening. Shira here writes, not bad for learning unity in a week. Built an AI non-player character that asked you about yourself and sends you on a quest. Still a long way to go before this is something I'm proud of, but it's a start.

Starting point is 00:01:23 Working on text to speech and speech to text next. Now of course, this sort of new type of interaction was on display earlier this week when NVIDIA's CEO gave a demonstration of their new game engine. Despite a new supercomputer and an advanced AI chip, the thing that people were talking most about was a new system for making non-player characters have real human dialogue by having generative AI on the back end creating that dialogue in real time. Now, funny enough, a lot of people pointed out that the dialogue in the exact demo was kind of wooden, but it still shows the possibilities of a very

Starting point is 00:01:55 different type of gaming experience going forward. We've seen a lot of these new types of generative AI tools for helping create both gaming experiences as well as just metaversal worlds. One that got a ton of attention just a couple weeks ago was the new Skybox tool from Blockade Labs. This basically allows users to use simple text prompts as well as sketches to create entire immersive worlds. Now, speaking of gaming and AI, there was also some really interesting research called Voyager.

Starting point is 00:02:22 Dr. Jim Fan from Nvidia says, What have we set GPT4 free in Minecraft? I'm excited to announce Voyager, the first lifelong learning agent that plays Minecraft purely in context. Voyager continuously improves itself by writing, refining, committing, and retrieving code from a skill library. GPT4 unlocks a new paradigm. Training is code execution rather than gradient descent. Voyager rapidly becomes a seasoned explorer. In Minecraft, it obtains 3.3 times more unique items, travels 2.3 times longer distances,

Starting point is 00:02:50 and unlocks key tech tree milestones up to 15.3 times faster than prior methods. So the key thing here is not just that AI is playing Minecraft. It's about how AI is developing and teaching itself. It is actively rewriting its own codebase as it learns. Fan continues, generally capable autonomous agents are the next frontier of AI. They continuously explore, plan, and develop new skills in open-ended worlds driven by survival and curiosity. Minecraft is by far the best testbed with endless possibilities for agents.

Starting point is 00:03:19 Now, speaking of AI that independently learns, one really interesting piece of research this week was called GPT4 tools, teaching large language models to use tools via self-instruction. The research summarizes, using the low-rank adaptation optimization, our approach facilitates the open-source LLMs to solve a range of visual problems, including visual comprehension and image generation. Now, this is interesting on a couple levels, first in terms of how AI teaches itself to evolve, but also in the context of LLMs moving to a multimodal future. and in particular, LLM's being able to move to multimodal without requiring huge data sets or incredibly expensive computation. Obviously, the big guys are all working on big multimodal models, but the question is whether open source developers will be able to keep up on that front.

Starting point is 00:04:02 GPT4 Tools is a pretty positive development in that light. Speaking of multimodal training, one more from Dr. Jim Fan. This week, he also discussed a new data set for multisensory object-centric learning. He writes, what is a cup? To LLMs, it is a word. But to us, it is a full. sensory package, the visual appearance, the 3D topology, the ceramic texture of the handle, the sound of it landing on a table. To gain a far deeper understanding of concepts, the next-gen

Starting point is 00:04:28 AI needs to develop multimodal world models. Enter object folder, a very unique data set for multi-sensory object-centric learning, geared towards object recognition, reconstruction, and manipulation with sight, sound and touch. Features 100 real-world household objects and 1,000 neural objects. Now, obviously, the building blocks of AI models are data sets that they get trained on, And so what you have here is effectively a data set that is meant to give LLMs the ability to train on very common but sneakily complex objects. Speaking of 3D objects, there was some really exciting research on 2D video to 3D modeling that came out this week as well. Neurolangelo is a new AI model that reconstruct surfaces in incredible detail from two-dimensional videos. It comes from Nvidia and combines a couple different methodologies to take videos that you might take on your iPhone

Starting point is 00:05:13 and turn them into rich 3D objects that can be used in virtual worlds, as digital twins, as parts of gaming, and much, much more. So you might have heard of photogrammetry, and you might have heard of neural radiance fields or nerves. Each of them have some problems. Traditional photogrammetry, for example, has a problem with repetitive structures, textureless surfaces, and strong color variations, while nerves can be beautiful but lacks surface detail when they're turned back into 3D meshes. Neuroal angelo effectively combines these two methodologies into something that ultimately

Starting point is 00:05:42 ends up being very different. Lior Alpha Signal AI writes, A model uses a 2D video with multiple angles of an object or scene. It selects frames from different viewpoints to understand depth, size, and shape. The AI creates an initial 3D representation similar to a sculptor shaping a subject. The render is optimized to enhance details like a sculptor refining texture. The outcome is a 3D object or scene suitable for virtual reality, digital twins, or robotics. So you can see why they called it Neuroangelo. There is a clear parallel to the process that a sculptor

Starting point is 00:06:12 goes through of first chisling out the rough object of a shape and then finally refining it to get the exact representation that they are looking for. One other interesting 3D project that got some buzz this week was Google's Project Starline. It is a prototype face-to-face 3D video conferencing that really feels like people are there, at least that's the promise of it. We don't know much yet, but that hasn't stopped people from getting really excited about the possibilities. Speaking of excited by the possibilities, Robert Scoble shared a preview of something called the hollow deck, which is a virtual cube full of experiences created by artists. He said, I couldn't share much because this is coming later this year, but I want one sitting on my coffee table. Now,

Starting point is 00:06:49 Scoble is, of course, extremely excited about the big event next week, which is Apple's Worldwide Developer Conference or WWDC. That's slated to happen on Monday, and basically all anyone is the expected Apple headset. Now, the reason that people are so excited is that Apple has been working on AR and VR-type experiences for a very long time, but as we know, doesn't really do anything unless they're pretty convinced that they can win. When it comes to headsets, it's not just that they haven't been a category leader, it's that there's no category leader, really. Sure, you could say it's Oculus, but I think that most people would assess that when it comes to virtual reality and augmented reality devices, even if there is one that has been used by more people, relatively

Starting point is 00:07:29 speaking, none have crossed over into the mainstream. So could Apple push us into the mainstream when it comes to that? Well, we'll have to see, but in the meantime, people are still exploring what all the applications might be. Professor Ethan Mollock writes, with a new Apple headset announcement, expect lots of talk of AR and VR again. This new paper is a useful introduction to when one or the other of these technologies might be economically useful. The consumer market may still be stalled, but there is real work potential. He shares a graphic of all the potential use cases, including virtual classrooms, virtual meetings, prototype and testing new equipment, exploration of hazardous and accessible or novel environments, remote robot-aided surgery, virtual conferences, training on new equipment,

Starting point is 00:08:05 building construction and architecture, and more and more and more. The paper he's referring to is called the economics of augmented in virtual reality, and was written by professors at the Rotman School of Management at the University of Toronto and at Berkeley's Hoss Business School. Back in the realm of things that exist right now, there was a lot of chatter about a Pew research study about chat GPT. The headline statistic was that 42% of American adults still haven't heard of chat GPT at this point. What's more, only 14% of U.S. adults said they've used it for any real purpose, whether it's

Starting point is 00:08:34 entertainment to learn something new or for their work. Now, the other interesting thing from this data was who thought it was actually useful. Only a third said that it has been extremely or very useful, while 39% said that it had been somewhat useful. Around a quarter of those who tried it said it was either not very or not at all useful. Now, even of those who think that ChatsyPT is extremely useful remain in the minority that didn't stop a blog post this week about OpenAI's current plans from going extremely viral. The CEO of Human Loop, who had recently sat down as part of a group of about 20 developers with OpenAI CEO Sam Altman, wrote a blog all about his reflections on what they said. There were a number of really interesting details about that. One big one was that GPU shortage was

Starting point is 00:09:15 really impacting what OpenAI could do, which certainly makes sense in the context of Nvidia's stock price soaring. But among other things, it meant that there would be no GPT4 multimodality in 2023. Another part that I thought was really interesting given some of my recent videos, was that Sam Altman seems to agree that ChatGPT plugins currently don't have product market fit outside of browsing the web. As Sam put it really simply, he thought that a lot of companies had initially believed that they wanted their experience to be in ChatGPT, but actually what they want is ChatGPT in their experience, which is obviously very different. Now, interestingly, it seems that Raza from Human Loop gave away a little too much because

Starting point is 00:09:52 on Friday of this week, I noticed that the content had been removed at the request of OpenAI. For those of you who would like to learn more about what he said, feel free to go check out my video. Can OpenAI's new GPT training model solve math and AI alignment at the same time as I go in depth on that? Now, the model that I was referring to came from this research post improving mathematical reasoning with process supervision. Open AI sums it up. We've trained a model to achieve a new state of the art in mathematical problem solving by rewarding each correct step of reasoning, i.e. process supervision, instead of simply rewarding the correct final answer, which is outcome supervision. In addition to boosting performance relative to outcome supervision,

Starting point is 00:10:30 process supervision also has an important alignment benefit. It directly trains the model to produce a chain of thought that is endorsed by humans. Now, let me bring you back to grade school or middle school or even high school math. Most of us, if you grew up in the 90s and 2000s, probably had teachers who said something like show your work, partial credit for showing your work. That's effectively what we've got here. The outcome supervised model only rewards an AI system for getting the correct answer, but a process supervised. model rewards it along the way for understanding how it got to the conclusions that it was getting to. Basically, what this research showed is that it was better in terms of the actual

Starting point is 00:11:03 performance of the model. The process supervised model got to the correct mathematical answer in 78% of cases versus 71% for the outcome supervised process, but it also has real benefits for interpretability. In other words, our ability to understand how an AI is reasoning. As they put it, process supervision has several alignment advantages over outcome supervision. It directly rewards the model for following an aligned chain of thought, since each step in the process receives precise supervision. Process supervision is also more likely to produce interpretable reasoning, since it encourages the model to follow a human-approved process. In contrast, outcome supervision may reward an unaligned process, and it is generally harder to scrutinize.

Starting point is 00:11:41 The point that they're making is that this is not an approach to alignment that produces poorer results, but in fact, that produces better results, which encourages all people who are building these AIs to use this type of system, which has both performance and alignment benefits. And of course, AI safety and alignment were big topics this week. Time magazine put out a cover issue called The End of Humanity, How Real is the Risk? And it featured a number of different essays from prominent thinkers in the AI space. Now, of course, this all got a lot less theoretical and a lot more real. When news came out that a recent U.S. Air Force simulation saw an AI drone take control

Starting point is 00:12:14 and try to kill its operator because it viewed its operator as getting in the way of its mission. Now, this was almost a textbook example of the type of thing that has AI risk in AI. safety people worried. So much so that some of them wondered if it could possibly be true. Now, not waiting for it to be determined whether this actually had happened or not, news outlets all around the world all started running with the story. By the next day, the U.S. Air Force had denied that this was a real thing, and the colonel in the Air Force who had given the presentation about it at a recent conference, came back and said that in an actual point of fact, it was a theoretical, it wasn't an actual simulation that had been run. Now, this is absolutely

Starting point is 00:12:49 churned up discourse around this issue. The people who think that the AI safety and risk community is out of their minds, we're using it as a victory lap. While that AI risk community was trying to say that even if this didn't happen in this simulation in this case, it wasn't that far outside of things that we'd already seen. To me, it was an interesting bellwether of where the discourse is around this in the mainstream. It may be that the fact that people were so interested in writing this story is a good thing for the state of the awareness of the potential issues of AI. Now, there is a ton more that happened this week.

Starting point is 00:13:17 Japan has indicated that they won't enforce copyright when it comes to AI model training. The Australian government has initiated an eight-week consultation to see what their citizens think about AI risks and whether they should actually consider not only regulations but bannings. Getty again sued stability AI this time in the UK. And an AI-generated SpongeBob livestream racked up millions and millions and millions of views on YouTube earlier this week. And yet, in spite of all this, I kind of think it's fair to call this a relatively quiet week in AI. Maybe because it was a day shorter coming off the Memorial Day holiday in the U.S. But in any case, with Apple's WWDC event on Monday, I can't imagine that next week will be similarly quiet. Anyways, guys, that is it for today's AI breakdown weekly recap.

Starting point is 00:14:02 If you're enjoying the AI breakdown, if you're finding it useful, please like, subscribe and share. Go check out the podcast and the newsletter version. Click the notification button on YouTube so you don't miss an episode. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - AI Gaming Lights Up: The Biggest AI News This Week

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.