The AI Daily Brief: Artificial Intelligence News and Analysis - Translating Brainwaves to Images and Other Cutting Edge AI Research

Episode Date: July 2, 2023

Today on The AI Breakdown, a research recap of the most interesting recent AI research, including: DreamDiffusion translating EEG to images - https://huggingface.co/papers/2306.16934 Nemo simulati...ng life in games - https://www.ranmo.me/blog/title-digital-companionship Single Image to 3D Mesh in 45 Seconds - https://huggingface.co/papers/2306.16928 CSM any image to 3D - https://www.csm.ai/any-image-to-3d CSM Discord - https://discord.com/invite/NhJJwmk8gT Playground mixed image editing - https://playgroundai.com/ The AI Breakdown helps you understand the most important news and discussions in AI.    Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe   Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown   Join the community: bit.ly/aibreakdown   Learn more: http://breakdown.network/

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI breakdown, we're looking at research that can translate EEG brainwaves into images without first translating to text. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to breakdown.network for more information. Hello, friends, happy Sunday. A quick note before we dive into today's show. I am officially back from Europe. That means this week you will have normal content. In other words, the brief and then a main show after. but today, as as usual, with Sunday or Saturday shows, we're just doing a single topic, and today it is a research recap.
Starting point is 00:00:34 Now, my one request for you today, if you are enjoying this, is to go check out the YouTube channel. You can find it at YouTube.com slash at Symbol the AI breakdown, and a lot of the content that I do is going to be a little bit better suited to the visual, given that there is text to image or 2D to 3D creation, as is the case in this particular show. Anyway, I think you will like what I have over on the YouTube channel, and I appreciate you checking it out. With that, let's get to today's episode.
Starting point is 00:01:02 Today on the AI breakdown, the absolute latest in cutting edge AI research. Welcome back to the AI breakdown. Today, our research recap starts off with something called Dream Diffusion. Now, there have been a lot of recent research papers about translating thoughts into images, and usually it goes through a process by which thoughts are translated to text, and then that text is run through a text-to-image generator. DreamDifusion, however, is a process that generates high-quality images from brain EEG signals without actually having to translate them to text in the first place.
Starting point is 00:01:35 DreamDifusion uses pre-trained text-to-image models and employs a technique called temporal mask signal modeling to pre-train the EEG encoder. Now, mask modeling is a technique that involves masking or hiding parts of the EEG signals and training the model to predict the masked parts based on the unmasked parts. This allows the model to learn to represent EEG signals effectively. DreamDifusion also uses the clip model which was developed by OpenAI, which is a model that understands images and texts in a unified embedding space. Using the clip image encoder, the researchers can provide additional supervision to the model, which helps it align EEG signals, text, and image embeddings more effectively. Now, there are a ton of implications of this type of research.
Starting point is 00:02:15 There are medical applications such as unlocking communication with people who are paralyzed or otherwise unable to communicate in traditional ways. There are mental health applications with therapists potentially able to visualize patients' thoughts, dreams, and fears. Artists could use this technology to open new avenues for creativity. Entertainment could use it to create immersive experiences. And just more broadly, in the field of neuroscience, this could lead to breakthroughs in our understanding of human thoughts, dreams, and consciousness. Next up today on this research roundup, we have Nemo AI, a super cool new AI that allows for
Starting point is 00:02:48 characters in game to learn in ways that are, frankly, much more. human. So on a high level, Nemo AI has a couple of different capacities. First of all, it has long-term memory, so it remembers previous interactions. Second, it can learn by observation. In other words, you can teach it how to do different things. Third, it has 3D spatial awareness, so it's aware of what's going on around it. This research comes from Ran Mo. Ran previously ran teams at electronic arts and worked on games like The Sims and other mobile titles, and now he's building an interactive media company called Proxima. The blog post that this demo came from was called beyond the virtual dollhouse simulating life in games.
Starting point is 00:03:25 And where it starts was how we used to make life seem lifelike in games, which was scripting. However, as Rand points out, at the heart of it, scripting is less about true intelligence and more about deterministic responses that follow a set of predefined rules, essentially digital versions of choose-your-own adventure books. This, he points out, has huge scaling difficulties. Now, he talks about how the Sims tried to go beyond this sort of normal modality by using a utility-based AI where autonomous digital companions balance their needs and desires, using a system that could evaluate hundreds of needs and decisions. The problem, however, is that while this was
Starting point is 00:03:59 utility maximizing for those characters, it didn't allow them to forge connections with players beyond satisfying their own needs. Ran then moves through the recent history of trying to get deeper AI into games, including black and white, which was programmed by Demis Hisabas, who later found a deep mind, as well as the recent surge in AI in games like Elder Scrolls, which has embedded conversational chatbots directly into the game. Wren talks about perception, saying, we built the system that converted the 3D game world into natural language in real time, so that Nemo can perceive his world around him at any given time.
Starting point is 00:04:30 Memory, personality, and intention stored and interpreted digitally as vector files and continuously evolved through new experiences just like in real life. Finally, user input. We added speech recognition for player voice commands, but these could easily also be control inputs in any other form. Wren concludes, The simulation of life and companionship within games have important implications. commercially it has led to some of the most enduring and profitable franchises like the Sims. For players, these companions have the capacity to deepen engagement within games.
Starting point is 00:04:57 Beyond gaming, these pursuits also symbolize deeper approximation of human relationships and experiences. You can definitely tell one of the sub-thames going on with AI right now is this idea of more personal, experiential types of AI that can actually address some of these questions of human relationships and companionship. Next up, we stay in the realm of things with implications for gaming and virtual worlds with the research paper called 1-2-3-4-5, any single image to 3D mesh in 45 seconds without per-shape optimization. So what the research is trying to do is take a single two-dimensional or flat image from the real world and turn it into a 3D object representation. Now, there are many methodologies to do this already, but the researchers point out that many of these models
Starting point is 00:05:41 suffer from a lengthy optimization time, 3D inconsistency results, and poor geometry. Now, their approach, which they've tested with both images captured in the real world as well as synthetic images, promises not only to improve the results and have higher quality 3D geometry, but to do so in a much quicker amount of time. It's why they put that 45 seconds right there in the title. As with all of this research, I'll include links in the show notes so that you can go see the specific methodology that they use, but the implications if this works are obviously very clear. More efficient and accurate 3D representation could be a game changer for game development, virtual reality applications, augmented reality applications, and more. There are implications for computer vision research and solving complicated problems in that field, and a number of other implications for broader 3D reconstruction research.
Starting point is 00:06:27 Now, to the extent that you need evidence that this 2D to 3D pipeline is something that is, if nothing else, commercially interesting to people, CSM has also released a new any image to 3D model that they say is significantly better than OpenAI should. shape E. CSM's whole focus is in creating these 3D assets. They work video to 3D, image to 3D, and are releasing an API soon. In their blog post from earlier this week announcing any image to 3D, they said, turning any flat picture of an object into a 3D model has been unsolved for decades. This is because a single image doesn't give us much information about depth or how things should look from different angles. Turning images into game engine ready 3D assets has massive implications for gaming, robotics, mixed reality, VFX, and e-commerce. Moreover, this new technology opens up a world of possibilities for anyone with a spark of imagination.
Starting point is 00:07:15 Now, CSM has put out a public showcase highlighting assets that were generated using their Discord bot. You can also go join their Discord channel to start generating your own assets. And there's also a waitlist available for people to try this out. Last up, something that has jumped from the research to tool stage, but which fits some of the themes from today's show is Playground AI's new mixed image editing. This is a super powerful software suite that allows for image editing with 10. text-based natural language inputs that kind of combines a bunch of the features of other types of text image tools like generative fill that people have been getting so excited about over the last few weeks. The demo shows everything from a woman having a candle replaced with a
Starting point is 00:07:53 lightsaber to using highlighting tools and natural language to remove parts of an image and change the background. You can check this out at playgrounda.com and of course there will be a link in the show notes. All right friends, that is going to do it for today's research recap. Hopefully this was interesting. And if it was, I would love it. if you would go take the time to subscribe to the podcast version of this show, you can find it at breakdown.network, and you can also find there the newsletter version. Every weekday, I put out something called the first five, which is the five most interesting or important stories in AI.
Starting point is 00:08:24 Thanks again for watching, and until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.