The AI Daily Brief: Artificial Intelligence News and Analysis - How Just Released Sora Stacks Up to Other AI Video Generators
Episode Date: December 11, 2024Sora, OpenAI's video generator, has finally launched, promising new tools for creative expression. This video examines Sora’s features, like storyboard mode and image-to-video capabilities, while co...mparing it to competitors like Runway, Pika Labs, and Luma. While Sora excels in some areas, challenges with physics and accessibility remain. How does it stack up, and what does it mean for the future of video generation? Brought to you by: Vanta - Simplify compliance - https://vanta.com/nlw The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown
Transcript
Discussion (0)
Today on the AI Daily Brief, SORA has finally been released by OpenAI, and Google announces a breakthrough in quantum computing.
The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI.
To join the conversation, follow the Discord link in our show notes.
Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes.
There has been this longstanding pattern of Google trying to command a news cycle and Open AI swooping in and finding a way to front run them.
Yesterday, though, we kind of got the inverse, where theoretically Sorrow was the biggest announcement,
until all of a sudden Google came out with this announcement, which had people's jaws on the floor.
Google has announced a new quantum computing chip called Willow.
They claim that, quote,
Willow performed a standard benchmark computation in under five minutes that would take one of today's fastest supercomputers,
10 septillion that is 10 to the 25 years, a number that vastly exceeds the age of the universe.
Part of the announcement is a claim that they've solved the scaling.
problem with quantum computing. The chip architecture is capable of reducing errors exponentially,
as more cubits are added, which is the quantum equivalent of bits. Errors due to interaction
with the surrounding environment were the key issue with the technology and stood as an unresolved
problem for almost 30 years. Google Quantum AI founder Hart-Mutnevin wrote,
this historic accomplishment is known in the field as below threshold, being able to drive errors
down while scaling up the number of cubits. You must demonstrate being below threshold to show
real progress on error correction. And this has been an outstanding challenge since quantum error
correction was introduced by Peter Shore in 1995. If that was all completely Greek to you,
the TLDR implication is that this is the first time it appears that there's been a viable
pathway to quantum computing at scale. Until now, all experiments have been extremely small
proof of concepts, novel and important, but not the first step on the path towards building
a useful quantum computer. The big idea with quantum computing is the ability to process
certain computations at an unfathomable speed. Traditional computing only moves in a straight line,
with the processor testing solutions in sequence before coming up with an answer.
Quantum computing allows all solutions to be tested simultaneously.
In terms of why we're discussing it here, the technology could be a massive unlock for AI
training once it's commercially viable.
My colleagues sometimes ask me why I left the burgeoning field of AI to focus on quantum
computing.
My answer is that both will prove to be the most transformational technologies of our time,
but advanced AI will significantly benefit from access to quantum computing.
Quantum computing will be indispensable for collecting training data that's inaccessible
to classical machines, training and optimizing certain learning architectures, and modeling
systems where quantum effects are important. This includes helping us discover new medicines, designing
more efficient batteries for electric cars, and accelerating progress in fusion in new energy alternatives.
Many of these future game-changing applications won't be feasible on classical computers. They're
waiting to be unlocked with quantum computing. Now, it seems incredible, but what is the hype
mitigation version of this story? Former NVIDIA leader Boyan Tongu said, I've been holding off saying
more about Google's purported quantum computing breakthrough until I read a bit more about it.
It turns out, as I had suspected, it was way overhyped.
Yes, it's good science, but in terms of any kind of practical applications, we are probably
at least a decade away.
Even then, it will most likely be specialized areas of application like molecular dynamics.
A good rule of thumb is that quantum computers are really good at doing stuff that comes
naturally down to quantum mechanics, which is literally all about randomness.
Deterministic computations that are relevant for conventional computation are on par with what
conventional computers can do.
Still, he kind of just shrugged off the idea that quantum computing is a decade away.
And I think that's where a lot of the excitement is coming from.
It's about what the future timeline looks like.
Educator Paul Kruvert writes,
so in less than 24 hours we got,
Google unveiling a quantum chip that solves in five minutes
what would take the best supercomputers 10 septillion years,
open AI launching SORO with almost photorealistic AI video quality.
The timeline is unreal.
Venture investor Neil Kostla writes,
stuff is about to get really weird if AGI
and useful quantum computing timelines line up.
Add a little robotics to the mix,
and there's a growing probability the world 25 years from now
looks literally nothing like it does today.
Alex Treese points out, best part of this quantum announcement is Google saying, yeah, I don't know, we probably live in a multiverse then.
The specific line he pulled from the blog post is, it lends credence to the notion that quantum computation occurs in many parallel universes in line with the idea that we live in a multiverse.
Again, a throwaway line.
All in all, while it may not be technology for today, it is still pretty cool.
Speaking of AGI, Microsoft AI lead Mustafa Sullyman has weighed in on the AGI debate and thinks something very different than Sam Altman.
During a Reddit AMA, Soleiman said that the technology is still a decade away, adding,
I don't think it can be done on NVIDIA Blackwell GB200s.
I do think it is going to be plausible at some point in the next two to five generations.
I don't want to say I think it's high probability that it's two years away,
but I think within the next five to seven years since each generation takes 18 to 24 months now.
So five generations could be up to 10 years away depending on how things go.
This, of course, flies in the face of Sam Alman's recent comments where he said AGI was coming,
quote, sooner than most people in the world think, and it matters much less.
Suleiman also dive into the philosophical question of what AGI is, which of course has a pretty
big impact on when we think it's going to arrive. He said, to me, AGI is a general purpose learning
system that can perform well across all human level training environments. So knowledge work, by the way,
that includes physical labor. A lot of my skepticism has to do with the progress and complexity
of getting things done in robotics. But yes, I can well imagine that we'll have a system that can
learn without a great deal of handcrafted prior prompting to perform well in a very wide range
of environments. I think that that is not necessarily going to be AGI, nor does that lead to the
singularity, but it means that most human knowledge work in the next five to ten years could
likely be performed by one of the AI systems that we develop. And I think the reason why I shy away
from the language around singularity or artificial superintelligence is because I think they're very
different things. Of course, the definition of AGI matters a great deal to Microsoft as it
would trigger the termination of their deal with Open AI if the lab manages to achieve it,
or at least if Open AIS board declares that AGI has been achieved. Recent reporting, however,
suggests that this clause in the contract is being reconsidered, with open AI potentially
removing it in order to smooth the process of converting to a for-profit company.
Suleiman touched on the reported tensions between the two firms stating,
every partnership has tension. It's healthy and natural. I mean, there are a completely
different business to us. They operate independently and partnerships evolve and they have to adapt
to what works at the time. So we'll see how that changes over the next few years.
For what it's worth, this is the least deniely of any sort of the questions of tensions we've
seen, which could be a reflection of the fact that it's gotten more sour, or could also
simply reflect the fact that Sullyman was sort of brought in, as Microsoft's hedge against
whatever the heck is going to go on in Open AI. Lastly today, XAI have officially announced
the release of their cutting edge image model by the end of the week. Late on Friday night,
XAI released their in-house image model named Aurora. The model was only available for a few hours,
but that was enough time for users to be amazed by its capabilities. It produced some of the best
photo-realistic images seen to date from generative AI and seemed to excel at replicating celebrities
in particular. Once it was pulled down, users were left wondering when they would get another
their chance to play with the model, or indeed whether there had been a critical safety issue,
perhaps associated with the near complete absence of copyright and deepfake guardrails.
The team has now confirmed Aurora's release in select regions with a full rollout within the week.
In an announcement blog post, they wrote, GROC can now generate high-quality images across several
domains, where other image generation models often struggle. It can render precise visual
details of real-world entities, text, logos, and can create realistic portraits of humans.
XAI developer Ethan Knight wrote, earlier today we released a new model codenamed Aurora
that gives Rock the ability to generate extremely photorealistic images, and in the future, even edit them.
It's free to use for Olive X, try it out, and send us what you're creating.
This model was trained entirely in-house with a very small team and we're excited to finally show it off.
Elon Musk confirmed that the model was developed internally in around six months,
which settles a few questions from Friday night's sneak preview.
Primarily whether the model was a collaboration with Black Forest Labs,
who provides the flux model that currently drives XAI's image generation capabilities,
but the fact that the model was trained so quickly by a small team suggests either
that A, we're about to see a massive improvement in image generation across the board,
or that the XAI team is simply as cracked as they seem to think they are.
In either case, another contender in the image generation space.
That, however, is going to do it for today's AI Daily Brief Headlines edition.
Next up, the main episode.
Today's episode is brought to you by Plum.
Want to use AI to automate your work but don't know where to start.
Plum lets you create AI workflows by simply describing what you want.
No coding or API keys required.
Imagine typing out AI, analyze my Zerner,
Zoom meetings and send me your insights in Notion and watching it come to life before your eyes.
Whether you're an operations leader, marketer, or even a non-technical founder, Plum gives you
the power of AI without the technical hassle. Get instant access to top models like GPT40,
Claude Sonnet 3.5, assembly AI, and many more. Don't let technology hold you back. Check out
Use Plum, that's Plum with a B, for early access to the future of workflow automation.
Today's episode is brought to you by Vanta. Whether you're starting or scaling your company's
security program, demonstrating top-notch security practices, and establishing trust is more important
than ever. Vanta automates compliance for ISO-2701, SOC2, GDPR, and leading AI frameworks like ISO-402,
and NIST AI Risk Management Framework, saving you time and money while helping you build customer
trust. Plus, you can streamline security reviews by automating questionnaires and demonstrating
your security posture with a customer-facing trust center, all powered by Vanta AI. Over 8,000 global
companies like Langchain, Lila AI, and factory AI use Vanta to demonstrate AI trust and prove security
in real time. Learn more at Vanta.com slash NLW. That's Vanta.com slash NLW.
Today's episode is brought to you, as always, by Super Intelligent. Have you ever wanted an
AI daily brief but totally focused on how AI relates to your company? Is your company
struggling with AI adoption, either because you're getting stalled figuring out what use
cases will drive value or because the AI transformation that is happening is siloated individual
teams, departments, and employees and not able to change the company as a whole.
Super Intelligent has developed a new custom internal podcast product that inspires your
teams by sharing the best AI use cases from inside and outside your company.
Think of it as an AI daily brief, but just for your company's AI use cases.
If you'd like to learn more, go to Bsuper.a.i slash partner and fill out the information
request form. I am really excited about this product, so I will personally get right back to you.
Again, that's besuper.a.ai slash partner.
After months of waiting, OpenAI has finally released Sora.
When the company announced last week that they were going to be doing 12 days of shipmiss,
rumors immediately started swirling that Sora was going to be a part of it, and frankly,
a lot of people felt like if it wasn't, there was something wrong. Well, it turns out that
those rumors were correct, as yesterday OpenAI
tweeted, our holiday gift to you, SORA is here. They continue. Now you can generate entirely new
videos from text, bring images to life, or extend remix or blend videos you already have. We've
developed new interfaces to allow easier prompting creative controls and community sharing. Since
previewing SORA in February, we've been building SORA turbo, a significantly faster version of
the model to put in your hands. We're releasing it today as a standalone product to Plus and Pro
users. We hope this early version of SORA will help people explore new forms of creativity. We can't
wait to see what you create.
A couple things that I think are interesting about this.
As you just heard, the model can generate videos based on text or image inputs.
It can create clips up to 20 seconds long and up to 1080p in resolution.
But alongside the model, they also put a lot of thought into the interface.
Users can easily toggle between generated videos that are displayed on a grid or a list.
I think more interesting, though, is the storyboarding mode, which allows users to
arrange multiple clips into a continuous video.
The model will attempt to create seamless transitions between individual clips, but users
have control over the speed of cuts. Storyboard also allows frame-by-frame inputs. This is one of those
interface updates that makes a huge difference in terms of how usable the model actually is from a
real production standpoint. In terms of availability for Plus Tier subscribers, SORA is now available
with a limit of 50 videos per month at up to 480p resolution or fewer videos at 720P. Pro-tier users
can access higher resolutions, longer durations, and up to 500 videos per month. And if you manage to burn
through all 500 videos, the model can still be accessed at a lower speed.
That is, of course, assuming that you can actually sign up.
As you can see, when you go to Sora.com, account creation is not currently available.
Altman tweeted, we significantly underestimated demand for Sora.
It's going to take a while to get everyone access, trying to figure out how to do it as fast as possible.
My strong guess is that they did not underestimate demand for Sora.
It's just they are constrained, and they decided that it was better to launch and then deal with
this to create a sense of urgency and hype, then to slow trickle people who officially had access to it.
Another side note of this, SORA is not available in the EU and UK, showing once again how these
countries' regulatory stances are denying their citizens' access to the cutting edge.
When Tegakanch tweeted, bring it to Europe, please. Altman responded, we want to offer our products
in Europe and believe a strong Europe is important to the world. We also have to comply with
regulation. I would generally expect us for new products to have delayed launches in Europe,
and that there may be some we just can't offer. Now, when it comes to the delay on why it took so long
to get SORA. Part of it seemed to be a safety consideration. For now, OpenAI seems to be going with
the plan of releasing a model and fine-tuning the safety parameters as they see how people use it.
They wrote, we're introducing our video generation technology now to give society time to explore
its possibilities and co-developed norms and safeguards that ensure it's used responsibly as the field
advances. The model complies with the C2PA standard, which ensures that videos are identifiable
as AI generated and have watermarks. Uploads of people are limited at launch, they said,
but they also said that they would become available as the company refines their deep fake mitigations.
They said early feedback from artists indicate that this is a powerful creative tool that they
value, but given the potential for abuse, we are not making it initially available to all users.
During the live stream SORA product lead, Rohan Saha'i announced that this is a pretty steep tradeoff,
but said, we obviously have a pretty big target on our back as Open AI.
We want to prevent illegal activity on SORA, but we also want to balance that with creative
expression. We know that it will be an ongoing challenge. We might not get it perfect on day one.
We're starting a little conservative, and so if our moderation doesn't get it quite right, just give us that feedback.
And indeed, some users are running into that, with Nick St. Pierre, for example, failing to generate a bear eating a salmon.
But let's talk about what people are finding with this.
Marquez Brownlee, who is a famous product reviewer and has recently been ragging on a lot of products coming out of the AI space, although admittedly it's AI hardware,
tweeted a review yesterday, saying, the rumors are true.
SORA OpenAI's video generator is launching for the public today.
I've been using it for about a week now.
I've learned a lot testing this.
The video has a bunch of garbled text,
the telltale sign of AI generated videos.
But the cutaways, the moving text ticker,
the news style shots, those were all things
who are decided to do on its own,
and those news anchors looked very real.
It's still a product, though, with pros and cons,
and one of the cons is that physics is still hard.
Without a quote-unquote understanding of the objects in the video,
the model is still prone to hallucinations
in the form of movements that don't make sense,
and lack of object permanence.
He then shared a few examples of that,
where the physics of the objects in the videos just don't quite work.
And that's something we'll come back to in a moment.
Brownlee continues,
it can be really good at landscapes.
Almost any drone shot of a significant landmark could pass for stock footage
or is very close to usable for an establishing shot in a documentary or low-budget film.
Turns out it can do a passable job with cartoon style or stop-motion style,
since the irregularities in movement and physics appear more stylistic.
The other features like remixing videos or turning images into videos
can be useful tools if you know what you're doing,
but the most consistent finding for me was that the models don't
know what direction or speed makes sense for objects in that specific picture. So sometimes it gets
it right and sometimes it gets it really wrong. To the extent there was a clear critique, it was
definitely about Sora not having solved physics. Christopher Bryant showed a video of a set of birds,
flying in a way that really was a natural, to which Mr. Bizarro said the bird test, no model has
passed it yet. Anjni Midha from A16Z writes, big props to the Sora team for actually shipping.
The feeds full of embarrassing displays on how the model isn't a world simulator. But the torrent of human
preference data open AI will unlock via the new interfaces gold. If Google Cloud, Azure, AWS, Adobe,
et cetera, aren't updating their urgency priors right now, they've missed the point. That idea of a
world simulator or a world model is something that we explored in last week's Long Reads episode,
and this is a good context to go back and see why some people think that that's going to be
key for helping AI reach the next level. Victor M writes, SORAS videos look impressive, but physics
understanding and consistency is still not there yet. This prompt was quite simple, a humanoid robot
standing near the table with red, green, and blue cubes on it, performs a cube stacking task
with red in the bottom and blue on the top. It then shows the robot, kind of doing it, but not really.
And yet it's not like the physics is off always. Edwin Arbus showed a video of a prompt,
a golden retriever with a shiny wet coat skillfully balances on a surfboard as it rides a gentle wave
at Pacifica Beach. The dog's tongue hangs out in excitement and its eyes are focused on the horizon.
The backdrop includes a wide expanse of the ocean with rolling waves and clear blue sky.
And that one looks really natural. The physics are much, much closer to the real world.
The other thing that you see a lot of is people comparing their results from SORA to other models including Kling and Runway.
PGA, the CEO of Filmport.AI, said, my review of SORA after paying $200, sometimes it produces something great but you can get better results elsewhere.
Kling, Minimax, and Runway have nothing to worry about in the near future.
One specific thing that had pointed out was that Sora seemed to have more trouble for image-to-video, which is something that Marquez Brownlee had pointed out as well.
In terms of the status of where everything else is, Pika Labs released their latest version.
1.5 back in October, and it's known for peak effects like exploding items, squishing items,
and Kakeify. It seems designed to be useful for social media clips right now, as opposed to a
fully professional tool. Runway released Gen 3 Alpha Turbo in October. The big news at the time was that
they had partnered with Lionsgate, so presumably that studio at least thinks that runway is studio
quality and useful in a professional environment. Luma Labs updated Dream Machine back in November,
which has been notable for how accessible it is. There have been a lot of examples of it being used
for fashion, marketing, and filmmaking. And finally, there's Kling, a Chinese model whose latest version
was released in September and was for many people mind-blowing. And so that's where we stand. It's been less
than 24 hours with a very small number of people having access, so it's hard to know so far how it
really compares. What's for sure is that video generation in 2025 is going to be radically more accessible
and much more used than it was this year, and there are going to be some pretty big implications of that.
Later in the week, I'll come back and explore some business use cases for SORA.
For now, though, that is going to do it for today's AI Daily Brief.
Appreciate you listening or watching, as always.
Until next time, peace.
