The AI Daily Brief: Artificial Intelligence News and Analysis - 2024 Generative AI Trend: Moving Beyond Chat Interfaces

Starting point is 00:00:00 Today on the AI breakdown, we're looking at one of the key interface trends that I think is going to shape AI next year. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown. Not Network for more information about our YouTube, our newsletter, and our Discord. Welcome back to the AI breakdown. Now, we are coming up on the end of the year, and that is an inherently reflective time. This is made even more the case by the fact that we're just over the one-year anniversary of generative AI if you cite the beginning of Chatchip-T as the beginning of this phase that we've been in, which I do. And so in the context of that exploration, and also with one other context, which is the beta

Starting point is 00:00:48 AI learning community experiment that I've been running, through which I've been sharing tutorials and case studies with the community of AI learners, I've been thinking a lot about the ways that different tools are organizing themselves, how they're thinking about people using them. And there's a trend that really stands out to me that's manifesting itself in lots of different ways. But to understand it, first we have to look at something that happened when chat GPT launched. What's on your screen right now is, of course, the main chat GPT interface. It just says, how can I help you today and has a message field? There are some starters, but more or less what you do when you come to chat GPT is you send it a

Starting point is 00:01:23 message. Can you help me write a recruitment email for my new AI education project? I'm trying to find content producers. ChatGPT then spits back results, and from there you talk to it as you would a friend or a colleague or an employee or a contractor. In other words, in natural language. You say, here's what I did or didn't like about that. Maybe you use some tricks.

Starting point is 00:01:52 And that gets us to the idea of prompting and prompt engineering. Basically, this is a way to say specific things and specific ways that get better results from these tools, but still always in the context of, using natural language inputs. A slight variation on this is seen in something like Mid Journey, where to get what we wanted, we needed to do a combination of natural language, but also some specific toggles or triggers that were native to the platform. So in Mid Journey's case, that's things like tagging aspect ratio or talking about how much chaos you wanted to have, which is their measure of variability. Still the idea, the central feature of these new user

Starting point is 00:02:30 interfaces is the chatbot, is the conversation, is the natural language input field. And this is very different than the way that we've interacted with software in the past. In the past, we haven't told computers what to do using natural language. We've told computers what to do using a combination of clicks and pointers and specific controls from dedicated programs, each of which had its own learning curve that we had to overcome to get the most out of those tools. You don't use natural language in Photoshop to explain what you're trying to do to an image, and you don't use natural language in Ableton to explain how you want to modify a sound. For a little while, it's felt like generative AI is switching that all on its head, and making natural language the default interface

Starting point is 00:03:10 for how we will interact with computers going forward. However, the trend that I'm observing and talking to you about today is a trend to recombine natural language input with user interfaces that are designed specifically for particular use cases and which aren't exclusive. exclusively natural language. The reason, by and large, that I'm seeing companies and projects do this is that as we get more granular, not just about the very foundational programs of generative AI, but into the realm of applications that are dedicated to specific use cases or specific types of users, it turns out that the combination of natural language input, plus some purpose-built user interface experiences, can really unlock different opportunities and provide different types of

Starting point is 00:03:54 value. So one of the tools that I did a demo slash tutorial of in this beta earlier this week was called KREA. Kria's tagline is designed at the speed of thought, and it's a real-time image generation platform. Now, there is a lot that's very cool about Kria. One, when they talk about the speed with which things render, literally as you're typing, the image that is generating is changing. So right now we have pink frog on top of a blue mushroom, clouds, background, blue, white. Whatever, I'm typing nonsense, but you get the point is that that's a little bit of the That's how fast it's changing. Now, interestingly, beyond just the text input, which again is still at the center of this

Starting point is 00:04:30 experience, what Crea is doing is using a bunch of different really novel ways for you to get more precision control over what you actually design. You can use shapes to indicate the core elements of your image and actually move them around. So for example, now I'm dragging this pink ball to the side and it's become a moon hovering in the background. What if I expand the size of this blue square? It changes the image in real time. You can also use a paintbrush tool. Let's see if we can't get there to be some ocean or waves

Starting point is 00:04:59 at the bottom of this image. No, but we did create a big blue mushroom. Let's see if we can have some plants growing out of it. Vaguely. I bet if we get rid of this, it will work. Anyways, you get the idea. I'm not going to spend too much time on this because obviously some people are listening as a podcast, and this is probably not making any sense. What matters here is that Kria has designed an experience where there are totally different ways on top of the natural language prompt to modify how the image looks to have more control around what you're creating, and this just creates a ton of dynamism in the image generation experience. Quickly a brief word from today's sponsor.

Starting point is 00:05:35 As a listener of this show, I suspect you like to stay up to date on all things AI and tech, which is why you have to check out the chart-topping podcast Web3 with A16Z crypto. Produced by venture firm Andresen Horowitz, Web3 with A16Z is the perfect companion podcast to the AI breakdown. Web 3 with A16Z Crypto is your definitive resource for the future of the internet, whether you're interested in the convergence of AI and crypto or simply curious about what's next. If you need a place to start, they recently released an excellent episode with Stanford Cryptography Professor Dan Boney and former Google X engineer Aliya in conversation with host

Starting point is 00:06:10 Sonal Choxi about the intersection of AI and crypto. From fighting deepfakes and proving humanity to large language models like ChatGBT, they cover it all. I highly recommend checking it out, especially if you'd like to learn. more about how AI and crypto will impact our everyday lives. Beyond crypto and AI, the show is for creators seeking more ways to truly own their work, for business leaders trying to prepare for the future today, and for innovators exploring trending tech topics. Don't miss out. Follow Web3 with A16Z crypto on Apple Podcasts, Spotify, or your favorite listening app. Another tool that is also explicitly rejecting the tyranny of the exclusive text-based prompt is a company called Visual Electric.

Starting point is 00:06:53 Let's start with this seed image of a silhouette of a man and a hat standing on a top of a hill. From here, we're going to remix it, and we're going to change it to wide, view, extensive planes, 9 by 16, ultra wide. And then go. Now if we want, we can pick when we like and make variations. Let's put the creativity at 100 and let's go. Now, again, as this loads, you can do all sorts of interesting and different things from here. But the point is that while we are still starting with that,

Starting point is 00:07:23 natural language prompt, we are now moving into a totally different type of toolset and user interface and experience that's much more designed for the sort of creative process where one idea begets another idea, and eventually this could turn into a massive Figma-style campus of hundreds or even thousands of interlocking images that all built off from one another. Now for Visual Electric, they are very intentionally framing themselves as separate and distinct from the chat interface. Their announcement post reads a breakthrough interface for generative AI and says, Technological breakthroughs have a history of enabling new forms of creative expression. Today we find ourselves at a similar moment at the dawn of generative AI, an innovation

Starting point is 00:08:02 with unprecedented creative opportunity. We just need the right tools to unlock its potential. Visual Electric is the first image generator designed for creatives, a canvas that facilitates the flow of ideas so you can truly spread out and see where the tool takes you. It's designed to help give form to the vision in your mind's eye or lead you to something better. We believe that in order to be truly useful, AI needs to augment our existing creative process with all its winding paths, switchbacks, and U-turns. This requires tools that embrace ambiguity over certainty.

Starting point is 00:08:29 Many existing AI tools present results as answers to a user's question, leaving little room to build or tweak or meander. So once again, we have here a new interface designed with a different type of user, in this case creatives in mind, that is breaking away once again from the strict use of the chat-based interface. Another type of tool that we've done a tutorial of during this AI education beta, is Framer. People have been very excited about Framer as an AI-based website builder, but in a lot of ways, that's only one little piece of it. If you go to a new project on Framer,

Starting point is 00:08:59 which you're seeing on the screen now, it's actually just a different organization of a lot of templates. This is not dissimilar from Wix or Squarespace or anything else. You can go through and choose pages like the landing page, a portfolio page, a teaser page, a blog. You can go to sections, which again give you the chance to copy paste and drag things in. Indeed, it's not until you go up to this little top button with a little lightning symbol, where you click and you can go to AI and generate page. From there, you now have that classic text-based input. Let's do a landing page for an annual party at which a group of friends cuts down their Christmas tree called the Fetching. This year is the seventh annual. Start. And then, yes,

Starting point is 00:09:49 here it is actively generating the page with AI, and this is incredibly valuable. It's a really cool approach, especially when it comes to getting the juices flowing. But Framer is clearly making a bet that this is just one tiny piece of a full, sophisticated, modern, generative AI era website builder suite, and that ultimately they have to be thinking about the totality of the use case for website building. They can't just be handing people cool AI pages that they may or may not use. Another example of this, we've sometimes pejoratively categorized categorized companies into either building their own models or just being wrappers on top of them. But in some cases, a different interface that can pull from multiple different models and that

Starting point is 00:10:29 is custom designed for a specific purpose, just kicks the slats for that purpose out of the base models on their own. An example that I think a lot of people would agree with is perplexity. One way of looking at perplexity is as a research wrapper on top of LLMs like GPT4 and Claude 2. But perplexity has a bunch of different features. One, it gives sources so that when you're seeing the answers to questions, you can go to the original sources. Two, it suggests follow-up questions that lead you down interesting research pathways. Four, you can not only search everything, but can focus on specific sources of information, such as Reddit, YouTube, or academic papers.

Starting point is 00:11:05 All of this leads to a tool that is in many ways much more useful for that custom research purpose than our chaty-be-t or Claude on their own. And so again, this isn't some crazy, sophisticated, unique point of view. It's just simply to say that as we move into the integration phase of AI, to the actual use case phase of AI, to the workflow phase of AI, where people are not just dabbling and experimenting, but actively figuring out how to use these things day and day out to improve outcomes, whether it's in their personal life or in their professional life. The application interfaces are adapting to match those specific use cases, integrations, and workflows.

Starting point is 00:11:40 That doesn't mean that the shift to a natural language interface with the computer or the chatbot interface isn't significant, not at all. It's just that rather than overwhelming absolutely every other way of interacting with computers, it's now going to be an essential part and an essential tool and option in user interface design. I think the combination of natural language interfaces plus interfaces that are designed with custom purposes, with specific purposes in mind, is going to be where the rubber really hits the road. But anyways, hopefully that's a little interesting discussion. If you want to try any of these tools, I will of course link to them. And if these tools have your perked up for that beta, go to bit.ly slash AI beta. I am opening up registration for January

Starting point is 00:12:17 next week. For now, I appreciate you listening as always. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - 2024 Generative AI Trend: Moving Beyond Chat Interfaces

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.