The AI Daily Brief: Artificial Intelligence News and Analysis - Midjourney vs. DALL-E-3: Can Midjourney's New Website Compete with ChatGPT Integration?
Episode Date: October 25, 2023Midjourney has made a number of moves recently after DALL-E-3 was integrated into ChatGPT. Those include a new upscaler, a website (finally!) and even a first app (sort of!). Plus Perplexity raises fr...esh capital at a $500m valuation. ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, what mid-journey is doing to compete with the new Dolly 3 threat?
Before that, on the brief, perplexity, the AI search engine raises money at a $500 million valuation.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown.network for more information about our YouTube, our newsletter, and our Discord.
Welcome back to the AI breakdown brief, all the AI headline news you need in around five minutes.
We kick off today with a report from the information that the AI search and research
company perplexity is raising a fresh round of capital at a $500 million valuation. Now, this is a
company that raised funding just seven months ago at a $150 million valuation, so this is up
3x in seven months. The company is currently generating around $3 million in annual recurring revenue,
giving this a huge valuation multiple of around 150x ARR. Now, of course, venture capitalists aren't
usually thinking that strictly about things like ARR, but it is a notably high number. As someone
on Twitter said, AI evaluations are still in 2021. Now, what's interesting about perplexity and why
I wanted to focus on it is that a big theme that we're discussing in today's show, as you'll see
from the main episode, is around user interface, user experience, and just usability.
Perplexity is interesting because it has leaned super strong into developing an actual product
around AI-powered search that is different and blissful to use. So just as a for example,
I did a quick search this morning who is Nathaniel Wittemore. The research, the research.
The results come back with sources, so you know where it's drawing this information from,
and then a summarized answer. Nathaniel Winimore is an independent strategy and communications
consultant for leading crypto companies. He's also the creator of the breakdown network,
which includes the breakdown, the world's biggest daily crypto podcast, Bitcoin Builders,
and the AI breakdown. Nathaniel has been a principal at Learn Capital and was on the founding team
of Change.org. He founded a program design center at his alma mater, northwestern university that
helped inspire the largest donation in the school's history. Yada, yada, yada. This is all correct,
by the way. It also shows where the sourcing actually came from. Now, one of the things that I love
is that then it suggests further questions. One of the questions it suggested was what is the breakdown
network, so I said, sure, let's do that. Once again, it gives sources, Twitter, my podcast hosting
service, the podcast on Apple itself, and a site called Listen Notes. The summarization,
the breakdown network is a media network created by Nathaniel Widomor that includes the breakdown,
the breakdown, Bitcoin Builders, and the AI breakdown. Its next recommended question was what other
podcast are part of the Breakdown Network? And from there, what is the most popular episode?
Now, it wasn't until the most popular episode that it came up a little short, although, of course,
it doesn't really have access to that information because there's no publicly available
information that it can pull from. Its response when it couldn't figure things out was,
unfortunately, I could not find any specific information on the most popular episode of the
Breakdown podcast, but it did provide the number of episodes that have been published as of October
2nd. Anyways, this is barely scratching the surface of what this app can do. When you upgrade to
Pro, you can use different models including Claude 2 or GPT4. You have more access to what they call
their co-pilot boosts, which produces even better information. And I know many people who, when it
comes to research, default to perplexity versus something like chat GPT, even though the underlying
models are from other companies. In that, this also kind of breaks the trend that we've been
seeing of investors moving away from companies that are wrappers on top of models and shows just how
valuable the right user experience can be. Next up today, a new $10 million dollar AI safety
Fund and a new executive director for the Frontier Model Forum. Now, first of all, a reminder of what
the Frontier Model Forum is, this was announced a couple of months ago and is an industry body
a collaboration between OpenAI, Anthropic, Google, and Microsoft that is designed to promote
safe and responsible development of advanced AI systems and facilitate information sharing between
both these companies as well as with policymakers. The other goal is to identify best practices
for responsible development, and effectively the way that most people interpreted this is that the
competitive pressure that these companies are under to beat one another and have the most
advanced models is so intense that the Frontier Model Forum is some small effort to create the
counterflow and an actual collaboration between the companies when it comes to these very important
issues of safety and responsible development. Well, to help with that, they announced the first
executive director of the forum, Chris Messerol. Messerol was most previously the director of artificial
intelligence and emerging technology at the Brookings Institution. Now, alongside this, they also announced
a $10 million AI safety fund, which is a collaboration with a number of philanthropic partners as well
to, quote, advance research into the ongoing development of the tools for society to effectively
test and evaluate the most capable AI models. They say that the fund will support independent
researchers from around the world affiliated with academic institutions, research institutions, and
startups. Further, they write, the primary focus of the fund will be supporting the development
of new model evaluations and techniques for red-teaming AI models to help develop and test
evaluation techniques for potentially dangerous capabilities of frontier systems.
So a couple things to say about this. First of all, I think unmitigatedly, it's a good thing to
have more research dollars on these areas. I think that the disparity between the amount of funding
on how to advance models as compared to the amount of money spent on how to make these models
safely is hugely challenging. But that sort of brings me to my second point, which is that this
$10 million seems so tiny, especially because it's not presented as
$10 million to sort of test and learn with the possibility of more. It's just so, so tiny. I mean,
it's two or three days of what Apple is spending on training their model right now, for example.
It feels like there does just need to be more capital in this area. But again, I think it's good
that there are these efforts starting. I'd just like to see even more of them. Now, speaking of
capital and open AI, an interesting story in the information today. On the one hand, open AI has been
the definitive leader when it comes to advanced models. But at the same time, that has
come at a cost, a literal cost. Using OpenAI tools and the GPT4 API is extremely, extremely expensive.
And according to this article, a lot of the customers who are formerly, some of OpenAI's biggest,
are now looking for less expensive options. That includes companies like Salesforce and Wix.
Now, interestingly, in addition to just buying AI services from other cheaper competitors,
they're also apparently buying OpenAI software through Microsoft because they get discounts
with bundling their purchase with other products. Apparently, based on the term,
of Microsoft and OpenAI's deal, the lion's share of that revenue goes to Microsoft, so it still is,
once again, a problem for OpenAI. Now, this Microsoft OpenAI deal, I think, will be studied for a long
time, but the information gave a few details. Holding aside Microsoft taking a bigger piece of the
revenue when customers are buying through Azure, just in general, Microsoft gets 75% of OpenAI's
theoretical profits until their $10 billion investment is paid back, and 49% of profits after that
until it gets to a predetermined cap. With Microsoft's earnings report yesterday, we learned that
18,000 customers are now buying OpenAI software through Azure, which is up from 11,000 in August.
Good for OpenAI, but also a challenge, given that they see so much less of that revenue
because it's through Microsoft. It's also not just big companies. Smaller developers are also making
the switch. The information piece uses the story of Pete Hunt, who's the founder of tools like
Dagster and Summurized.com. Previously, he had been using OpenAI's GPT3.5 model, but recently
switched over to Mistral 7B, which is an open source model. His API costs went from more than
2,000 a month to less than 1,000 a month, and apparently users haven't complained about a change in quality.
Now, all of this gets back to what we might see at OpenAI's Dev Day on November 6th.
One of the hints is that there might be some cost-related announcements, and it certainly seems
like that could be important, given the shifting stands of the AI space.
Now, speaking of Microsoft's earnings report, both Microsoft and Google Parent Alphabet reported
earnings yesterday, and as MarketWatch put it, Microsoft and Alphabet results show Wall Street
only cares about AI. Wall Street seems to have a sense that AI is helping Microsoft's cloud
business more than it is helping Google's. The TLDR on this is that Google's cloud business
grew by the smallest amount last quarter since 2019, whereas Microsoft's cloud business grew
even more than analyst estimates. Azure had 28% growth, which was above both the company's own
forecast and the 25.6% growth that analysts were modeling. On the news, Microsoft rose 4% in after
hours trading, while Alphabet shares were down 6%. Moving over to the policy world for just a
minute. We talked a lot recently about the increasing U.S. prohibitions on exporting chips to China and
China aligned places, and NVIDIA is now saying that the U.S. is speeding up those export curves
in ways that are potentially challenging. Whereas the restrictions had been supposed to come into play
30 days from October 17th when the measures were initially unveiled, instead they went into
effect on Monday. In other words, six days after they were unveiled instead of 30 days. The question now,
of course, and the one that Wall Street will be watching is whether this actually impacts Nvidia and
AMD's bottom lines. There is a lot more to talk about in the policy world this week.
Chek Schumer held his second closed-door summit to help the Senate learn about AI, featuring some
very different opinions from people like techno-optimist Mark Andreson on the one hand, and Future
of Life Institute President Max Tegmark on the other. At some point this week, we will do a full
breakdown of any information we actually got from those sessions and what it suggests about the
state of the U.S. policy conversation. For now, though, that is where we will wrap the brief.
Up next, the main AI breakdown.
Welcome back to the AI breakdown.
Today, we are exploring the latest in competition in the image generation space.
This is obviously after LLMs like ChatGBTGBT,
perhaps the most used aspect of generative AI right now,
and the space has sort of been changed in the last month,
given the launch of OpenAI's Dolly 3 model and its integration into ChatGBT.
Now, part of the context for talking about this right now
is that as of this week, all ChatGPT plus U.S.
users have access to Dolly 3 inside chat chippy T. And so what we're going to do today is look at
the state of the conversation around these tools, and more specifically, a set of announcements
from mid-jurney over the last week or so that show how the perhaps unexpected leader in the
space is trying to either retain that lead or catch back up depending on your perspective.
Normal caveats here apply when I'm talking about something that's inherently visual like
an image generator. If you want to see what I'm discussing, please come over and check out the
YouTube channel, although of course I will try to make it to that the podcast version of this
is still just as useful. Part of what prompted me to want to do this show today was a tweet from
Nick St. Pierre, at Nick Floats on Twitter, who's of course one of the best people to follow if you
want tutorials and advanced level looks on how to use these types of tools. And he wrote,
two things I don't understand. One, runway being valued at 1.5 billion after their $141 million
series C. Two, Andresen Horowitz discussing putting $100 million into ideogram at a $500 million
dollar valuation, only a couple of months after a $16.5 million seed round. Meanwhile, Mid Journey is out
here with a way better product, millions of users, millions in revenue, and zero investors.
If Mid Journey decided to take money from investors, would Ideogram even get money in the first place?
I honestly doubt it. But they're billing themselves as a Mid Journey rival. Okay, bro.
The fact that these companies need to go out and raise hundreds of millions of dollars just to have a
fighting chance against Mid Journey, who has raised zero dollars from investors, shows you just how good
Mid Journey is and how mad the investors are they can't get a piece of it. So first, let's leave aside
the runway comparison for a moment. Obviously, Runway is playing in the text to video space,
which is its whole own beast and something that seems like it's going to be extremely valuable
and unlock mass amounts of creativity once it comes to maturity. Of course, it's not nearly at the
maturity yet that text to image generators are. I think the point about putting money into ideogram
because VCs haven't been able to get access to mid-jurney is probably true. The job,
of venture capitalists is to make strategic bets on companies that are in or are leading big
technological or societal change, and if one of the leaders in a key space doesn't want to take
investment, it's probably prudent to invest in a competitor. But what's also interesting is this
assertion that Mid Journey is so far ahead. And the reality is, as cool as it is that Mid Journey has now
made hundreds of millions of dollars and done it with no investment and something like 40
employees, that doesn't change the fact that they are now staring down their biggest challenge ever
in the form of Dolly 3 and its integration into chat GPT.
Now, we'll talk a little bit more about Dolly 3 and its integration in a few minutes,
but let's talk first now about what Mid Journey is doing to compete.
One feature updated that Mid Journey has introduced a new 2x and 4x upscale feature.
Now, for a lot of folks, they're just using these images for fun, they're just putting them
online, they don't really need that super high resolution.
But others actually do, either because they're going to be printing images, or just because
the context requires it. This is one of those updates that's not nearly as sexy as some advanced
new model, but is actually super, super functional and expands meaningfully the range of ways in
which images created by Mid Journey can be used. The feature has had pretty rave reviews
from the people who have used it so far, and so although I think in the long run this is sort of
one of those table stakes type features, the fact that it's here is not insignificant. Now, what's
an even more noticeable update is that Mid Journey has launched a new version of its website. One of
that is absolutely crazy about how successful Mid Journey has been, is that all of the creation
has to happen inside of Discord. You actually have to create a login for another service,
which is in and of itself an overwhelming and confusing experience, and then either generate
images inside the Mid Journey server or set it up so that you have a bot in your own server.
If your eyes and ears glazed over with me just describing that, you can imagine how it sounds
to a mainstream in Normie Crowd. Well, the new Mid Journey website is not yet an image creation
tool. To create images, you still have to do that from Discord. But what it does show is how the
company is starting to think about the community and social aspect of image generation. If you go and
check out your profile, it is effectively a portfolio of all the images you've created. So for those of
you who are watching the video, what you're seeing here is my profile, which has all of my generations.
You can toggle between grids, upscales, or all. And you can see that in my case, a lot of the
usage has to do with thumbnails for different videos. If you toggle over an image, you can see the
prompt, AI Safety versus AI Progress, Tim Cook with a robot, Chinese flag on an old computer
screen. These are all images that I've used recently in thumbnails either for YouTube videos or for my
newsletters. Over here is neural network stylized, an image that I used for a background of a YouTube
video. Then there's the image for a party invitation for my five-year-old's Mermaid Halloween
Under the Sea Birthday party, which is coming up this weekend, some custom Magic the Gathering cards that I
made for a custom design set that I've been working on for the past few years, and so on and so
forth. What's cool about this is that ultimately image generation isn't just functional. It's also
fun. It's a creative expression as much as it is a utility. And even someone like me who uses it
all of the time for actual just hyperfunctional uses also ends up getting lost spending time
creating images around certain themes that I find interesting or exciting. What this new MidGourney
site does is it turns it into a live look at my
personality, my interests, my creations. It's not hard to see how this could be the basis for a new
type of social network. Now, other things that you can do are go check out a community feed,
which is organized by hot, rising, new, or top, or you can also rank pairs. Mid Journey seems to
randomly select two images and you select which one you like more. Sometimes, but not always, they're
related to one another. So in many ways, what it feels like to me we have with this new Mid Journey
site is really the first indications of what an eventual mobile app will look like. However, this
is actually not the only mobile app experience teaser that we've recently gotten from the company.
As Venturebeat wrote about a week ago, Mid Journey's first mobile app is here, sort of.
So this app is called Niji Journey. According to Mid Journey founder David Holes, the app was
announced during Mid Journey office hours, which happened regularly within Discord, and is a partnership
between Mid Journey and a Japanese game company, Sizigi Studios, and basically what it is is a custom
purpose-built version of Mid Journey that specifically in and around Korea,
anima style art, also known as Niji.
Now, there are a few interesting things about this.
First, you don't have to use Discord for this.
It comes with a free trial of 20 images,
and you can actually subscribe in the normal ways that you would
via any other iOS or Google Play Store app.
The creative experience has all the same features as MidGourney,
but organized with a very different interface,
given that because they know the type of outputs
that most people are looking for in this anime style,
it's easier for them to custom design filters and toggles within the prompt that get users closer
to the images that they actually want. They've also really optimized the community feed to help
people get inspiration for their own images, giving people the ability to drag and add tokens from
those feed prompts. Now, I think the way that people are thinking about this is again as a preview
of what an ultimate Mid Journey app might look like. One of the things that I think this shows off
is how powerful it can be if you know roughly what type of image a user is trying to go for. Now,
whether they'll be able to bring that sort of user interface to a more generalist mid-jurney app remains to be
seen, but it's clear that they're testing some things out. So where is this leave Dolly 3 and Mid-Jurdy
vis-vis v. Dolly 3? There have been approximately 10,000 articles and tweet threads comparing images
from these different tools, but I sort of think that the image outputs aren't really the vector of
competition. Instead, what it's all about in my estimation, is the difference between prompting,
which is what you do in Mid Journey, a natural language conversation that happens inside ChatGyBT
BT when you're using Dolly 3. Let's walk through a quick example again from that birthday party that I was
talking about. If we go back and look at my profile on Mid Journey, I spent a long time, as you can see,
from these endless grids trying to get something that actually worked as an illustration. I wanted it to
look vaguely like my daughter, so big blue eyes and curly hair. I wanted it to not be an over-sexualized
mermaid, which was harder than it seems. I wanted the jackal lanterns to actually look like
jackal lanterns, even though they were under the sea, and I wanted it to be clear that you were
under the sea. What this required of me with mid-jurney was basically endless prompting.
Little tweaks and experiments. Halloween underwater mermaid pumpkins, jack-o-lantern kids
illustration Disney Disney Disney cartoon. Halloween Halloween Halloween-Hawnee, Happy Disney Animation,
Halloween underwater mermaid pumpkins, jack-lantron kids illustration display, etc., etc., etc.
Just endless combinations and permutation.
rolling it until I got something that worked.
Now, I did get something that worked.
I got an amazing image that I ended up using for the card
that I think has probably inspired at least a few of the adults in her class
to go out and maybe try these tools themselves.
So to be clear, it's not that I had any problem with the mid-jurney results.
In fact, they came back great.
But just to demonstrate how different the experiences of Dolly 3,
I did a version of the same experience.
Instead of starting with this vague prompt,
Mermaid, Underwater, Halloween, Jackal, Lantern,
I wrote,
cartoon image of a mermaid celebrating Halloween under the sea surrounded by jackal lanterns.
Now, I don't like the quality of any of these quite as much as I did, the Mid Journey
style, but they are, especially this one, closer to the actual prompt of Disney.
Now, what's interesting is you can actually see the prompt that Dolly 3 translates your
natural language into. For example, here we have Disney-inspired cartoon of a mermaid in a
Halloween-themed costume, perhaps as a witch or a vampire, swimming gracefully under the sea,
surrounding her are jack-lanterns card from blah, blah, blah, blah.
I didn't write that, right? You heard what I wrote before. It was much simpler.
Dolly 3 slash ChatGBTBT did the work of figuring out how to actually translate what I was looking for into a prompt.
Now, from there, I was able to refine again with natural language. My next request was, can we make the mermaid be a five-year-old girl with curly brown hair and blue eyes?
This is something that you just can't do with Mid Journey. You can't ask it to change slightly. You have to come up with a new prompt.
You can see how it translated that again into the prompt that it had written, Disney-inspired cartoon of a five-year-old mermaid girl with
curly brown hair and blue eyes in a Halloween-themed costume, perhaps as a witcher vampire swimming gracefully.
Now, this image I loved, and if I had been actually doing this for the card, I probably would have
used it. But I wanted to keep working to see if we could get a little bit more out of it.
I said, great, let's work with number two. Can you change the aspect ratio to 916 for a card,
and can you write Alden is 5 in a fun font at the bottom? This, it had some trouble with.
It did change the aspect ratio, but it also changed the orientation so that the image was totally
sideways. I tried to fix it and it did not get it and I didn't want to spend the time figuring out how
that would work. So we went back to a square version and decided to focus again on just getting the right
text in there. The first one instead of saying Alden is five, said five is. The next one said
Alden is five just like I wanted, but the mermaid in question didn't have the cool witch hat that I
had liked from previous images. So I said, great, but can we please put the witch hat back on the mermaid,
which led finally to the ultimate image, or at least the image where I stopped. Given that she's five,
I probably would have changed the Disney-style kongshell bikini, but you get the idea.
The point here is that there is something so fundamentally different about being able to use
natural language to refine what you were trying to get and help the system understand,
that it really is a tremendously different experience.
Now, I still use Mid Journey a ton, but this capacity to get more precisely to what I'm looking
for has pushed Dolly 3 significantly into my workflows into ways I hadn't imagined it would.
Now, there is something else to note here, which is the fact that,
I could ask it to say Alden is 5 and it was able to do so.
Mid Journey, of course, does not have any text capacity and it's one of its biggest weaknesses.
Many people anticipate that when it comes to a Mid Journey version 6, that is going to have
to be one of the key features to keep parity with these other tools.
Now, of course, this race is not just Mid Journey and Dolly 3, even though those might be the
ones that get the lion's share of the attention.
Adobe released its new Firefly 2 model a couple weeks ago, and it's extremely good at certain
types of things, including especially photorealism, and has the benefit for enterprises being
trained on Adobe's library of images, i.e. doesn't come with the same rights concerns that some of these
other tools might. I think we'll wrap there, but if you take away anything from this episode,
it's how fast these image generators are advancing, and how much competition is pushing them to get
better in ways that outdo one another. Now, of course, there is a whole additional conversation
about at what point we will truly not be able to tell the difference between AI generated and non-AI
generated, but that is a subject for a different show. For now, a big thanks once again to all of you
listeners or watchers out there. And until next time, peace.
