The AI Daily Brief: Artificial Intelligence News and Analysis - Midjourney vs. DALL-E-3: Can Midjourney's New Website Compete with ChatGPT Integration?

Starting point is 00:00:00 Today on the AI breakdown, what mid-journey is doing to compete with the new Dolly 3 threat? Before that, on the brief, perplexity, the AI search engine raises money at a $500 million valuation. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our YouTube, our newsletter, and our Discord. Welcome back to the AI breakdown brief, all the AI headline news you need in around five minutes. We kick off today with a report from the information that the AI search and research company perplexity is raising a fresh round of capital at a $500 million valuation. Now, this is a company that raised funding just seven months ago at a $150 million valuation, so this is up

Starting point is 00:00:45 3x in seven months. The company is currently generating around $3 million in annual recurring revenue, giving this a huge valuation multiple of around 150x ARR. Now, of course, venture capitalists aren't usually thinking that strictly about things like ARR, but it is a notably high number. As someone on Twitter said, AI evaluations are still in 2021. Now, what's interesting about perplexity and why I wanted to focus on it is that a big theme that we're discussing in today's show, as you'll see from the main episode, is around user interface, user experience, and just usability. Perplexity is interesting because it has leaned super strong into developing an actual product around AI-powered search that is different and blissful to use. So just as a for example,

Starting point is 00:01:28 I did a quick search this morning who is Nathaniel Wittemore. The research, the research. The results come back with sources, so you know where it's drawing this information from, and then a summarized answer. Nathaniel Winimore is an independent strategy and communications consultant for leading crypto companies. He's also the creator of the breakdown network, which includes the breakdown, the world's biggest daily crypto podcast, Bitcoin Builders, and the AI breakdown. Nathaniel has been a principal at Learn Capital and was on the founding team of Change.org. He founded a program design center at his alma mater, northwestern university that helped inspire the largest donation in the school's history. Yada, yada, yada. This is all correct,

Starting point is 00:01:58 by the way. It also shows where the sourcing actually came from. Now, one of the things that I love is that then it suggests further questions. One of the questions it suggested was what is the breakdown network, so I said, sure, let's do that. Once again, it gives sources, Twitter, my podcast hosting service, the podcast on Apple itself, and a site called Listen Notes. The summarization, the breakdown network is a media network created by Nathaniel Widomor that includes the breakdown, the breakdown, Bitcoin Builders, and the AI breakdown. Its next recommended question was what other podcast are part of the Breakdown Network? And from there, what is the most popular episode? Now, it wasn't until the most popular episode that it came up a little short, although, of course,

Starting point is 00:02:34 it doesn't really have access to that information because there's no publicly available information that it can pull from. Its response when it couldn't figure things out was, unfortunately, I could not find any specific information on the most popular episode of the Breakdown podcast, but it did provide the number of episodes that have been published as of October 2nd. Anyways, this is barely scratching the surface of what this app can do. When you upgrade to Pro, you can use different models including Claude 2 or GPT4. You have more access to what they call their co-pilot boosts, which produces even better information. And I know many people who, when it comes to research, default to perplexity versus something like chat GPT, even though the underlying

Starting point is 00:03:10 models are from other companies. In that, this also kind of breaks the trend that we've been seeing of investors moving away from companies that are wrappers on top of models and shows just how valuable the right user experience can be. Next up today, a new $10 million dollar AI safety Fund and a new executive director for the Frontier Model Forum. Now, first of all, a reminder of what the Frontier Model Forum is, this was announced a couple of months ago and is an industry body a collaboration between OpenAI, Anthropic, Google, and Microsoft that is designed to promote safe and responsible development of advanced AI systems and facilitate information sharing between both these companies as well as with policymakers. The other goal is to identify best practices

Starting point is 00:03:50 for responsible development, and effectively the way that most people interpreted this is that the competitive pressure that these companies are under to beat one another and have the most advanced models is so intense that the Frontier Model Forum is some small effort to create the counterflow and an actual collaboration between the companies when it comes to these very important issues of safety and responsible development. Well, to help with that, they announced the first executive director of the forum, Chris Messerol. Messerol was most previously the director of artificial intelligence and emerging technology at the Brookings Institution. Now, alongside this, they also announced a $10 million AI safety fund, which is a collaboration with a number of philanthropic partners as well

Starting point is 00:04:28 to, quote, advance research into the ongoing development of the tools for society to effectively test and evaluate the most capable AI models. They say that the fund will support independent researchers from around the world affiliated with academic institutions, research institutions, and startups. Further, they write, the primary focus of the fund will be supporting the development of new model evaluations and techniques for red-teaming AI models to help develop and test evaluation techniques for potentially dangerous capabilities of frontier systems. So a couple things to say about this. First of all, I think unmitigatedly, it's a good thing to have more research dollars on these areas. I think that the disparity between the amount of funding

Starting point is 00:05:05 on how to advance models as compared to the amount of money spent on how to make these models safely is hugely challenging. But that sort of brings me to my second point, which is that this $10 million seems so tiny, especially because it's not presented as $10 million to sort of test and learn with the possibility of more. It's just so, so tiny. I mean, it's two or three days of what Apple is spending on training their model right now, for example. It feels like there does just need to be more capital in this area. But again, I think it's good that there are these efforts starting. I'd just like to see even more of them. Now, speaking of capital and open AI, an interesting story in the information today. On the one hand, open AI has been

Starting point is 00:05:44 the definitive leader when it comes to advanced models. But at the same time, that has come at a cost, a literal cost. Using OpenAI tools and the GPT4 API is extremely, extremely expensive. And according to this article, a lot of the customers who are formerly, some of OpenAI's biggest, are now looking for less expensive options. That includes companies like Salesforce and Wix. Now, interestingly, in addition to just buying AI services from other cheaper competitors, they're also apparently buying OpenAI software through Microsoft because they get discounts with bundling their purchase with other products. Apparently, based on the term, of Microsoft and OpenAI's deal, the lion's share of that revenue goes to Microsoft, so it still is,

Starting point is 00:06:24 once again, a problem for OpenAI. Now, this Microsoft OpenAI deal, I think, will be studied for a long time, but the information gave a few details. Holding aside Microsoft taking a bigger piece of the revenue when customers are buying through Azure, just in general, Microsoft gets 75% of OpenAI's theoretical profits until their $10 billion investment is paid back, and 49% of profits after that until it gets to a predetermined cap. With Microsoft's earnings report yesterday, we learned that 18,000 customers are now buying OpenAI software through Azure, which is up from 11,000 in August. Good for OpenAI, but also a challenge, given that they see so much less of that revenue because it's through Microsoft. It's also not just big companies. Smaller developers are also making

Starting point is 00:07:04 the switch. The information piece uses the story of Pete Hunt, who's the founder of tools like Dagster and Summurized.com. Previously, he had been using OpenAI's GPT3.5 model, but recently switched over to Mistral 7B, which is an open source model. His API costs went from more than 2,000 a month to less than 1,000 a month, and apparently users haven't complained about a change in quality. Now, all of this gets back to what we might see at OpenAI's Dev Day on November 6th. One of the hints is that there might be some cost-related announcements, and it certainly seems like that could be important, given the shifting stands of the AI space. Now, speaking of Microsoft's earnings report, both Microsoft and Google Parent Alphabet reported

Starting point is 00:07:41 earnings yesterday, and as MarketWatch put it, Microsoft and Alphabet results show Wall Street only cares about AI. Wall Street seems to have a sense that AI is helping Microsoft's cloud business more than it is helping Google's. The TLDR on this is that Google's cloud business grew by the smallest amount last quarter since 2019, whereas Microsoft's cloud business grew even more than analyst estimates. Azure had 28% growth, which was above both the company's own forecast and the 25.6% growth that analysts were modeling. On the news, Microsoft rose 4% in after hours trading, while Alphabet shares were down 6%. Moving over to the policy world for just a minute. We talked a lot recently about the increasing U.S. prohibitions on exporting chips to China and

Starting point is 00:08:21 China aligned places, and NVIDIA is now saying that the U.S. is speeding up those export curves in ways that are potentially challenging. Whereas the restrictions had been supposed to come into play 30 days from October 17th when the measures were initially unveiled, instead they went into effect on Monday. In other words, six days after they were unveiled instead of 30 days. The question now, of course, and the one that Wall Street will be watching is whether this actually impacts Nvidia and AMD's bottom lines. There is a lot more to talk about in the policy world this week. Chek Schumer held his second closed-door summit to help the Senate learn about AI, featuring some very different opinions from people like techno-optimist Mark Andreson on the one hand, and Future

Starting point is 00:08:59 of Life Institute President Max Tegmark on the other. At some point this week, we will do a full breakdown of any information we actually got from those sessions and what it suggests about the state of the U.S. policy conversation. For now, though, that is where we will wrap the brief. Up next, the main AI breakdown. Welcome back to the AI breakdown. Today, we are exploring the latest in competition in the image generation space. This is obviously after LLMs like ChatGBTGBT, perhaps the most used aspect of generative AI right now,

Starting point is 00:09:30 and the space has sort of been changed in the last month, given the launch of OpenAI's Dolly 3 model and its integration into ChatGBT. Now, part of the context for talking about this right now is that as of this week, all ChatGPT plus U.S. users have access to Dolly 3 inside chat chippy T. And so what we're going to do today is look at the state of the conversation around these tools, and more specifically, a set of announcements from mid-jurney over the last week or so that show how the perhaps unexpected leader in the space is trying to either retain that lead or catch back up depending on your perspective.

Starting point is 00:10:06 Normal caveats here apply when I'm talking about something that's inherently visual like an image generator. If you want to see what I'm discussing, please come over and check out the YouTube channel, although of course I will try to make it to that the podcast version of this is still just as useful. Part of what prompted me to want to do this show today was a tweet from Nick St. Pierre, at Nick Floats on Twitter, who's of course one of the best people to follow if you want tutorials and advanced level looks on how to use these types of tools. And he wrote, two things I don't understand. One, runway being valued at 1.5 billion after their $141 million series C. Two, Andresen Horowitz discussing putting $100 million into ideogram at a $500 million

Starting point is 00:10:44 dollar valuation, only a couple of months after a $16.5 million seed round. Meanwhile, Mid Journey is out here with a way better product, millions of users, millions in revenue, and zero investors. If Mid Journey decided to take money from investors, would Ideogram even get money in the first place? I honestly doubt it. But they're billing themselves as a Mid Journey rival. Okay, bro. The fact that these companies need to go out and raise hundreds of millions of dollars just to have a fighting chance against Mid Journey, who has raised zero dollars from investors, shows you just how good Mid Journey is and how mad the investors are they can't get a piece of it. So first, let's leave aside the runway comparison for a moment. Obviously, Runway is playing in the text to video space,

Starting point is 00:11:22 which is its whole own beast and something that seems like it's going to be extremely valuable and unlock mass amounts of creativity once it comes to maturity. Of course, it's not nearly at the maturity yet that text to image generators are. I think the point about putting money into ideogram because VCs haven't been able to get access to mid-jurney is probably true. The job, of venture capitalists is to make strategic bets on companies that are in or are leading big technological or societal change, and if one of the leaders in a key space doesn't want to take investment, it's probably prudent to invest in a competitor. But what's also interesting is this assertion that Mid Journey is so far ahead. And the reality is, as cool as it is that Mid Journey has now

Starting point is 00:12:01 made hundreds of millions of dollars and done it with no investment and something like 40 employees, that doesn't change the fact that they are now staring down their biggest challenge ever in the form of Dolly 3 and its integration into chat GPT. Now, we'll talk a little bit more about Dolly 3 and its integration in a few minutes, but let's talk first now about what Mid Journey is doing to compete. One feature updated that Mid Journey has introduced a new 2x and 4x upscale feature. Now, for a lot of folks, they're just using these images for fun, they're just putting them online, they don't really need that super high resolution.

Starting point is 00:12:34 But others actually do, either because they're going to be printing images, or just because the context requires it. This is one of those updates that's not nearly as sexy as some advanced new model, but is actually super, super functional and expands meaningfully the range of ways in which images created by Mid Journey can be used. The feature has had pretty rave reviews from the people who have used it so far, and so although I think in the long run this is sort of one of those table stakes type features, the fact that it's here is not insignificant. Now, what's an even more noticeable update is that Mid Journey has launched a new version of its website. One of that is absolutely crazy about how successful Mid Journey has been, is that all of the creation

Starting point is 00:13:13 has to happen inside of Discord. You actually have to create a login for another service, which is in and of itself an overwhelming and confusing experience, and then either generate images inside the Mid Journey server or set it up so that you have a bot in your own server. If your eyes and ears glazed over with me just describing that, you can imagine how it sounds to a mainstream in Normie Crowd. Well, the new Mid Journey website is not yet an image creation tool. To create images, you still have to do that from Discord. But what it does show is how the company is starting to think about the community and social aspect of image generation. If you go and check out your profile, it is effectively a portfolio of all the images you've created. So for those of

Starting point is 00:13:55 you who are watching the video, what you're seeing here is my profile, which has all of my generations. You can toggle between grids, upscales, or all. And you can see that in my case, a lot of the usage has to do with thumbnails for different videos. If you toggle over an image, you can see the prompt, AI Safety versus AI Progress, Tim Cook with a robot, Chinese flag on an old computer screen. These are all images that I've used recently in thumbnails either for YouTube videos or for my newsletters. Over here is neural network stylized, an image that I used for a background of a YouTube video. Then there's the image for a party invitation for my five-year-old's Mermaid Halloween Under the Sea Birthday party, which is coming up this weekend, some custom Magic the Gathering cards that I

Starting point is 00:14:35 made for a custom design set that I've been working on for the past few years, and so on and so forth. What's cool about this is that ultimately image generation isn't just functional. It's also fun. It's a creative expression as much as it is a utility. And even someone like me who uses it all of the time for actual just hyperfunctional uses also ends up getting lost spending time creating images around certain themes that I find interesting or exciting. What this new MidGourney site does is it turns it into a live look at my personality, my interests, my creations. It's not hard to see how this could be the basis for a new type of social network. Now, other things that you can do are go check out a community feed,

Starting point is 00:15:15 which is organized by hot, rising, new, or top, or you can also rank pairs. Mid Journey seems to randomly select two images and you select which one you like more. Sometimes, but not always, they're related to one another. So in many ways, what it feels like to me we have with this new Mid Journey site is really the first indications of what an eventual mobile app will look like. However, this is actually not the only mobile app experience teaser that we've recently gotten from the company. As Venturebeat wrote about a week ago, Mid Journey's first mobile app is here, sort of. So this app is called Niji Journey. According to Mid Journey founder David Holes, the app was announced during Mid Journey office hours, which happened regularly within Discord, and is a partnership

Starting point is 00:15:56 between Mid Journey and a Japanese game company, Sizigi Studios, and basically what it is is a custom purpose-built version of Mid Journey that specifically in and around Korea, anima style art, also known as Niji. Now, there are a few interesting things about this. First, you don't have to use Discord for this. It comes with a free trial of 20 images, and you can actually subscribe in the normal ways that you would via any other iOS or Google Play Store app.

Starting point is 00:16:21 The creative experience has all the same features as MidGourney, but organized with a very different interface, given that because they know the type of outputs that most people are looking for in this anime style, it's easier for them to custom design filters and toggles within the prompt that get users closer to the images that they actually want. They've also really optimized the community feed to help people get inspiration for their own images, giving people the ability to drag and add tokens from those feed prompts. Now, I think the way that people are thinking about this is again as a preview

Starting point is 00:16:52 of what an ultimate Mid Journey app might look like. One of the things that I think this shows off is how powerful it can be if you know roughly what type of image a user is trying to go for. Now, whether they'll be able to bring that sort of user interface to a more generalist mid-jurney app remains to be seen, but it's clear that they're testing some things out. So where is this leave Dolly 3 and Mid-Jurdy vis-vis v. Dolly 3? There have been approximately 10,000 articles and tweet threads comparing images from these different tools, but I sort of think that the image outputs aren't really the vector of competition. Instead, what it's all about in my estimation, is the difference between prompting, which is what you do in Mid Journey, a natural language conversation that happens inside ChatGyBT

Starting point is 00:17:34 BT when you're using Dolly 3. Let's walk through a quick example again from that birthday party that I was talking about. If we go back and look at my profile on Mid Journey, I spent a long time, as you can see, from these endless grids trying to get something that actually worked as an illustration. I wanted it to look vaguely like my daughter, so big blue eyes and curly hair. I wanted it to not be an over-sexualized mermaid, which was harder than it seems. I wanted the jackal lanterns to actually look like jackal lanterns, even though they were under the sea, and I wanted it to be clear that you were under the sea. What this required of me with mid-jurney was basically endless prompting. Little tweaks and experiments. Halloween underwater mermaid pumpkins, jack-o-lantern kids

Starting point is 00:18:15 illustration Disney Disney Disney cartoon. Halloween Halloween Halloween-Hawnee, Happy Disney Animation, Halloween underwater mermaid pumpkins, jack-lantron kids illustration display, etc., etc., etc. Just endless combinations and permutation. rolling it until I got something that worked. Now, I did get something that worked. I got an amazing image that I ended up using for the card that I think has probably inspired at least a few of the adults in her class to go out and maybe try these tools themselves.

Starting point is 00:18:41 So to be clear, it's not that I had any problem with the mid-jurney results. In fact, they came back great. But just to demonstrate how different the experiences of Dolly 3, I did a version of the same experience. Instead of starting with this vague prompt, Mermaid, Underwater, Halloween, Jackal, Lantern, I wrote, cartoon image of a mermaid celebrating Halloween under the sea surrounded by jackal lanterns.

Starting point is 00:19:03 Now, I don't like the quality of any of these quite as much as I did, the Mid Journey style, but they are, especially this one, closer to the actual prompt of Disney. Now, what's interesting is you can actually see the prompt that Dolly 3 translates your natural language into. For example, here we have Disney-inspired cartoon of a mermaid in a Halloween-themed costume, perhaps as a witch or a vampire, swimming gracefully under the sea, surrounding her are jack-lanterns card from blah, blah, blah, blah. I didn't write that, right? You heard what I wrote before. It was much simpler. Dolly 3 slash ChatGBTBT did the work of figuring out how to actually translate what I was looking for into a prompt.

Starting point is 00:19:36 Now, from there, I was able to refine again with natural language. My next request was, can we make the mermaid be a five-year-old girl with curly brown hair and blue eyes? This is something that you just can't do with Mid Journey. You can't ask it to change slightly. You have to come up with a new prompt. You can see how it translated that again into the prompt that it had written, Disney-inspired cartoon of a five-year-old mermaid girl with curly brown hair and blue eyes in a Halloween-themed costume, perhaps as a witcher vampire swimming gracefully. Now, this image I loved, and if I had been actually doing this for the card, I probably would have used it. But I wanted to keep working to see if we could get a little bit more out of it. I said, great, let's work with number two. Can you change the aspect ratio to 916 for a card, and can you write Alden is 5 in a fun font at the bottom? This, it had some trouble with.

Starting point is 00:20:18 It did change the aspect ratio, but it also changed the orientation so that the image was totally sideways. I tried to fix it and it did not get it and I didn't want to spend the time figuring out how that would work. So we went back to a square version and decided to focus again on just getting the right text in there. The first one instead of saying Alden is five, said five is. The next one said Alden is five just like I wanted, but the mermaid in question didn't have the cool witch hat that I had liked from previous images. So I said, great, but can we please put the witch hat back on the mermaid, which led finally to the ultimate image, or at least the image where I stopped. Given that she's five, I probably would have changed the Disney-style kongshell bikini, but you get the idea.

Starting point is 00:20:57 The point here is that there is something so fundamentally different about being able to use natural language to refine what you were trying to get and help the system understand, that it really is a tremendously different experience. Now, I still use Mid Journey a ton, but this capacity to get more precisely to what I'm looking for has pushed Dolly 3 significantly into my workflows into ways I hadn't imagined it would. Now, there is something else to note here, which is the fact that, I could ask it to say Alden is 5 and it was able to do so. Mid Journey, of course, does not have any text capacity and it's one of its biggest weaknesses.

Starting point is 00:21:31 Many people anticipate that when it comes to a Mid Journey version 6, that is going to have to be one of the key features to keep parity with these other tools. Now, of course, this race is not just Mid Journey and Dolly 3, even though those might be the ones that get the lion's share of the attention. Adobe released its new Firefly 2 model a couple weeks ago, and it's extremely good at certain types of things, including especially photorealism, and has the benefit for enterprises being trained on Adobe's library of images, i.e. doesn't come with the same rights concerns that some of these other tools might. I think we'll wrap there, but if you take away anything from this episode,

Starting point is 00:22:04 it's how fast these image generators are advancing, and how much competition is pushing them to get better in ways that outdo one another. Now, of course, there is a whole additional conversation about at what point we will truly not be able to tell the difference between AI generated and non-AI generated, but that is a subject for a different show. For now, a big thanks once again to all of you listeners or watchers out there. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Midjourney vs. DALL-E-3: Can Midjourney's New Website Compete with ChatGPT Integration?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.