The AI Daily Brief: Artificial Intelligence News and Analysis - Translating Brainwaves to Images and Other Cutting Edge AI Research
Episode Date: July 2, 2023Today on The AI Breakdown, a research recap of the most interesting recent AI research, including: DreamDiffusion translating EEG to images - https://huggingface.co/papers/2306.16934 Nemo simulati...ng life in games - https://www.ranmo.me/blog/title-digital-companionship Single Image to 3D Mesh in 45 Seconds - https://huggingface.co/papers/2306.16928 CSM any image to 3D - https://www.csm.ai/any-image-to-3d CSM Discord - https://discord.com/invite/NhJJwmk8gT Playground mixed image editing - https://playgroundai.com/ The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're looking at research that can translate EEG brainwaves into images without first translating to text.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to breakdown.network for more information.
Hello, friends, happy Sunday. A quick note before we dive into today's show.
I am officially back from Europe. That means this week you will have normal content.
In other words, the brief and then a main show after.
but today, as as usual, with Sunday or Saturday shows,
we're just doing a single topic, and today it is a research recap.
Now, my one request for you today, if you are enjoying this,
is to go check out the YouTube channel.
You can find it at YouTube.com slash at Symbol the AI breakdown,
and a lot of the content that I do is going to be a little bit better suited to the visual,
given that there is text to image or 2D to 3D creation, as is the case in this particular show.
Anyway, I think you will like what I have over on the YouTube channel,
and I appreciate you checking it out.
With that, let's get to today's episode.
Today on the AI breakdown, the absolute latest in cutting edge AI research.
Welcome back to the AI breakdown.
Today, our research recap starts off with something called Dream Diffusion.
Now, there have been a lot of recent research papers about translating thoughts into images,
and usually it goes through a process by which thoughts are translated to text,
and then that text is run through a text-to-image generator.
DreamDifusion, however, is a process that generates high-quality images from brain EEG signals
without actually having to translate them to text in the first place.
DreamDifusion uses pre-trained text-to-image models and employs a technique called temporal
mask signal modeling to pre-train the EEG encoder.
Now, mask modeling is a technique that involves masking or hiding parts of the EEG signals
and training the model to predict the masked parts based on the unmasked parts.
This allows the model to learn to represent EEG signals effectively.
DreamDifusion also uses the clip model which was developed by OpenAI, which is a model that understands images and texts in a unified embedding space.
Using the clip image encoder, the researchers can provide additional supervision to the model, which helps it align EEG signals, text, and image embeddings more effectively.
Now, there are a ton of implications of this type of research.
There are medical applications such as unlocking communication with people who are paralyzed or otherwise unable to communicate in traditional ways.
There are mental health applications with therapists potentially able to visualize patients' thoughts,
dreams, and fears.
Artists could use this technology to open new avenues for creativity.
Entertainment could use it to create immersive experiences.
And just more broadly, in the field of neuroscience, this could lead to breakthroughs in our
understanding of human thoughts, dreams, and consciousness.
Next up today on this research roundup, we have Nemo AI, a super cool new AI that allows for
characters in game to learn in ways that are, frankly, much more.
human. So on a high level, Nemo AI has a couple of different capacities. First of all, it has
long-term memory, so it remembers previous interactions. Second, it can learn by observation. In other
words, you can teach it how to do different things. Third, it has 3D spatial awareness, so it's
aware of what's going on around it. This research comes from Ran Mo. Ran previously ran teams at
electronic arts and worked on games like The Sims and other mobile titles, and now he's building
an interactive media company called Proxima. The blog post that this demo came from was called
beyond the virtual dollhouse simulating life in games.
And where it starts was how we used to make life seem lifelike in games, which was scripting.
However, as Rand points out, at the heart of it, scripting is less about true intelligence
and more about deterministic responses that follow a set of predefined rules, essentially digital
versions of choose-your-own adventure books.
This, he points out, has huge scaling difficulties.
Now, he talks about how the Sims tried to go beyond this sort of normal modality by using
a utility-based AI where autonomous digital companions balance their needs and desires, using a system
that could evaluate hundreds of needs and decisions. The problem, however, is that while this was
utility maximizing for those characters, it didn't allow them to forge connections with players
beyond satisfying their own needs. Ran then moves through the recent history of trying to get
deeper AI into games, including black and white, which was programmed by Demis Hisabas, who later found a
deep mind, as well as the recent surge in AI in games like Elder Scrolls, which has embedded conversational
chatbots directly into the game.
Wren talks about perception, saying, we built the system that converted the 3D game world
into natural language in real time, so that Nemo can perceive his world around him at any given
time.
Memory, personality, and intention stored and interpreted digitally as vector files and continuously
evolved through new experiences just like in real life.
Finally, user input.
We added speech recognition for player voice commands, but these could easily also be control
inputs in any other form.
Wren concludes, The simulation of life and companionship within games have important implications.
commercially it has led to some of the most enduring and profitable franchises like the Sims.
For players, these companions have the capacity to deepen engagement within games.
Beyond gaming, these pursuits also symbolize deeper approximation of human relationships and experiences.
You can definitely tell one of the sub-thames going on with AI right now is this idea of more personal, experiential types of AI
that can actually address some of these questions of human relationships and companionship.
Next up, we stay in the realm of things with implications for gaming and virtual worlds with
the research paper called 1-2-3-4-5, any single image to 3D mesh in 45 seconds without
per-shape optimization. So what the research is trying to do is take a single two-dimensional
or flat image from the real world and turn it into a 3D object representation. Now, there are many
methodologies to do this already, but the researchers point out that many of these models
suffer from a lengthy optimization time, 3D inconsistency results, and poor geometry.
Now, their approach, which they've tested with both images captured in the real world as well
as synthetic images, promises not only to improve the results and have higher quality 3D geometry,
but to do so in a much quicker amount of time. It's why they put that 45 seconds right there in
the title. As with all of this research, I'll include links in the show notes so that you can go
see the specific methodology that they use, but the implications if this works are obviously very clear.
More efficient and accurate 3D representation could be a game changer for game development, virtual reality applications, augmented reality applications, and more.
There are implications for computer vision research and solving complicated problems in that field, and a number of other implications for broader 3D reconstruction research.
Now, to the extent that you need evidence that this 2D to 3D pipeline is something that is, if nothing else, commercially interesting to people, CSM has also released a new any image to 3D model that they say is significantly better than OpenAI should.
shape E. CSM's whole focus is in creating these 3D assets. They work video to 3D, image to 3D,
and are releasing an API soon. In their blog post from earlier this week announcing any image to 3D,
they said, turning any flat picture of an object into a 3D model has been unsolved for decades.
This is because a single image doesn't give us much information about depth or how things
should look from different angles. Turning images into game engine ready 3D assets has massive
implications for gaming, robotics, mixed reality, VFX, and e-commerce. Moreover,
this new technology opens up a world of possibilities for anyone with a spark of imagination.
Now, CSM has put out a public showcase highlighting assets that were generated using their Discord bot.
You can also go join their Discord channel to start generating your own assets.
And there's also a waitlist available for people to try this out.
Last up, something that has jumped from the research to tool stage, but which fits some of the themes from today's show is Playground AI's new mixed image editing.
This is a super powerful software suite that allows for image editing with 10.
text-based natural language inputs that kind of combines a bunch of the features of other
types of text image tools like generative fill that people have been getting so excited about
over the last few weeks. The demo shows everything from a woman having a candle replaced with a
lightsaber to using highlighting tools and natural language to remove parts of an image and
change the background. You can check this out at playgrounda.com and of course there will be a link
in the show notes. All right friends, that is going to do it for today's research recap. Hopefully
this was interesting. And if it was, I would love it.
if you would go take the time to subscribe to the podcast version of this show, you can find it at
breakdown.network, and you can also find there the newsletter version.
Every weekday, I put out something called the first five, which is the five most interesting
or important stories in AI.
Thanks again for watching, and until next time, peace.
