The AI Daily Brief: Artificial Intelligence News and Analysis - 2024 Generative AI Trend: Moving Beyond Chat Interfaces
Episode Date: December 17, 2023In today's episode, NLW explores how companies are increasingly combining natural language chat interfaces with other types of interfaces that are custom designed for specific use cases. Today's Spon...sors: Listen to the chart-topping podcast 'web3 with a16z crypto' wherever you get your podcasts or here: https://link.chtbl.com/xz5kFVEK?sid=AIBreakdown ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're looking at one of the key interface trends that I think is going to shape AI next year.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown. Not Network for more information about our YouTube, our newsletter, and our Discord.
Welcome back to the AI breakdown.
Now, we are coming up on the end of the year, and that is an inherently reflective time.
This is made even more the case by the fact that we're just over the one-year anniversary of generative AI
if you cite the beginning of Chatchip-T as the beginning of this phase that we've been in, which
I do. And so in the context of that exploration, and also with one other context, which is the beta
AI learning community experiment that I've been running, through which I've been sharing tutorials
and case studies with the community of AI learners, I've been thinking a lot about the ways that
different tools are organizing themselves, how they're thinking about people using them. And there's a
trend that really stands out to me that's manifesting itself in lots of different ways.
But to understand it, first we have to look at something that happened when chat GPT launched.
What's on your screen right now is, of course, the main chat GPT interface.
It just says, how can I help you today and has a message field?
There are some starters, but more or less what you do when you come to chat GPT is you send it a
message.
Can you help me write a recruitment email for my new AI education project?
I'm trying to find content producers.
ChatGPT then spits back results, and from there you talk to it as you would a friend or a colleague
or an employee or a contractor.
In other words, in natural language.
You say, here's what I did or didn't like about that.
Maybe you use some tricks.
And that gets us to the idea of prompting and prompt engineering.
Basically, this is a way to say specific things and specific ways that get better results
from these tools, but still always in the context of,
using natural language inputs. A slight variation on this is seen in something like Mid Journey,
where to get what we wanted, we needed to do a combination of natural language, but also
some specific toggles or triggers that were native to the platform. So in Mid Journey's case,
that's things like tagging aspect ratio or talking about how much chaos you wanted to have,
which is their measure of variability. Still the idea, the central feature of these new user
interfaces is the chatbot, is the conversation, is the natural language input field. And this is
very different than the way that we've interacted with software in the past. In the past, we haven't
told computers what to do using natural language. We've told computers what to do using a combination
of clicks and pointers and specific controls from dedicated programs, each of which had its own learning
curve that we had to overcome to get the most out of those tools. You don't use natural language
in Photoshop to explain what you're trying to do to an image, and you don't use natural
language in Ableton to explain how you want to modify a sound. For a little while, it's felt like
generative AI is switching that all on its head, and making natural language the default interface
for how we will interact with computers going forward. However, the trend that I'm observing and talking
to you about today is a trend to recombine natural language input with user interfaces
that are designed specifically for particular use cases and which aren't exclusive.
exclusively natural language. The reason, by and large, that I'm seeing companies and projects do this
is that as we get more granular, not just about the very foundational programs of generative AI,
but into the realm of applications that are dedicated to specific use cases or specific types of
users, it turns out that the combination of natural language input, plus some purpose-built
user interface experiences, can really unlock different opportunities and provide different types of
value. So one of the tools that I did a demo slash tutorial of in this beta earlier this week was
called KREA. Kria's tagline is designed at the speed of thought, and it's a real-time image
generation platform. Now, there is a lot that's very cool about Kria. One, when they talk about the
speed with which things render, literally as you're typing, the image that is generating is changing.
So right now we have pink frog on top of a blue mushroom, clouds, background, blue, white.
Whatever, I'm typing nonsense, but you get the point is that that's a little bit of the
That's how fast it's changing.
Now, interestingly, beyond just the text input, which again is still at the center of this
experience, what Crea is doing is using a bunch of different really novel ways for you to get
more precision control over what you actually design.
You can use shapes to indicate the core elements of your image and actually move them around.
So for example, now I'm dragging this pink ball to the side and it's become a moon hovering
in the background.
What if I expand the size of this blue square?
It changes the image in real time.
You can also use a paintbrush tool. Let's see if we can't get there to be some ocean or waves
at the bottom of this image. No, but we did create a big blue mushroom. Let's see if we can have
some plants growing out of it. Vaguely. I bet if we get rid of this, it will work. Anyways,
you get the idea. I'm not going to spend too much time on this because obviously some people
are listening as a podcast, and this is probably not making any sense. What matters here is that
Kria has designed an experience where there are totally different ways on top of the natural language
prompt to modify how the image looks to have more control around what you're creating, and this just
creates a ton of dynamism in the image generation experience.
Quickly a brief word from today's sponsor.
As a listener of this show, I suspect you like to stay up to date on all things AI and tech,
which is why you have to check out the chart-topping podcast Web3 with A16Z crypto.
Produced by venture firm Andresen Horowitz, Web3 with A16Z is the perfect companion podcast to the
AI breakdown.
Web 3 with A16Z Crypto is your definitive resource for the future of the internet,
whether you're interested in the convergence of AI and crypto or simply curious about what's next.
If you need a place to start, they recently released an excellent episode with Stanford
Cryptography Professor Dan Boney and former Google X engineer Aliya in conversation with host
Sonal Choxi about the intersection of AI and crypto.
From fighting deepfakes and proving humanity to large language models like ChatGBT, they cover it all.
I highly recommend checking it out, especially if you'd like to learn.
more about how AI and crypto will impact our everyday lives. Beyond crypto and AI, the show is for
creators seeking more ways to truly own their work, for business leaders trying to prepare for the
future today, and for innovators exploring trending tech topics. Don't miss out. Follow Web3 with
A16Z crypto on Apple Podcasts, Spotify, or your favorite listening app. Another tool that is also
explicitly rejecting the tyranny of the exclusive text-based prompt is a company called Visual Electric.
Let's start with this seed image of a silhouette of a man and a hat standing on a top of a hill.
From here, we're going to remix it, and we're going to change it to
wide, view, extensive planes, 9 by 16, ultra wide.
And then go.
Now if we want, we can pick when we like and make variations.
Let's put the creativity at 100 and let's go.
Now, again, as this loads, you can do all sorts of interesting and different things from here.
But the point is that while we are still starting with that,
natural language prompt, we are now moving into a totally different type of toolset and user interface
and experience that's much more designed for the sort of creative process where one idea begets
another idea, and eventually this could turn into a massive Figma-style campus of hundreds or even
thousands of interlocking images that all built off from one another. Now for Visual Electric,
they are very intentionally framing themselves as separate and distinct from the chat interface.
Their announcement post reads a breakthrough interface for generative AI and says,
Technological breakthroughs have a history of enabling new forms of creative expression.
Today we find ourselves at a similar moment at the dawn of generative AI, an innovation
with unprecedented creative opportunity.
We just need the right tools to unlock its potential.
Visual Electric is the first image generator designed for creatives, a canvas that facilitates
the flow of ideas so you can truly spread out and see where the tool takes you.
It's designed to help give form to the vision in your mind's eye or lead you to something better.
We believe that in order to be truly useful, AI needs to augment our existing creative process
with all its winding paths, switchbacks, and U-turns.
This requires tools that embrace ambiguity over certainty.
Many existing AI tools present results as answers to a user's question, leaving little room
to build or tweak or meander.
So once again, we have here a new interface designed with a different type of user, in this
case creatives in mind, that is breaking away once again from the strict use of the chat-based
interface.
Another type of tool that we've done a tutorial of during this AI education beta,
is Framer. People have been very excited about Framer as an AI-based website builder,
but in a lot of ways, that's only one little piece of it. If you go to a new project on Framer,
which you're seeing on the screen now, it's actually just a different organization of a lot of
templates. This is not dissimilar from Wix or Squarespace or anything else. You can go through
and choose pages like the landing page, a portfolio page, a teaser page, a blog. You can go to
sections, which again give you the chance to copy paste and drag things in. Indeed, it's not until
you go up to this little top button with a little lightning symbol, where you click and you can go to
AI and generate page. From there, you now have that classic text-based input. Let's do a landing page for an
annual party at which a group of friends cuts down their Christmas tree called the Fetching.
This year is the seventh annual. Start. And then, yes,
here it is actively generating the page with AI, and this is incredibly valuable. It's a really
cool approach, especially when it comes to getting the juices flowing. But Framer is clearly
making a bet that this is just one tiny piece of a full, sophisticated, modern, generative
AI era website builder suite, and that ultimately they have to be thinking about the totality
of the use case for website building. They can't just be handing people cool AI pages that they
may or may not use. Another example of this, we've sometimes pejoratively categorized
categorized companies into either building their own models or just being wrappers on top of them.
But in some cases, a different interface that can pull from multiple different models and that
is custom designed for a specific purpose, just kicks the slats for that purpose out of the base
models on their own. An example that I think a lot of people would agree with is perplexity.
One way of looking at perplexity is as a research wrapper on top of LLMs like GPT4 and Claude 2.
But perplexity has a bunch of different features. One, it gives
sources so that when you're seeing the answers to questions, you can go to the original sources.
Two, it suggests follow-up questions that lead you down interesting research pathways.
Four, you can not only search everything, but can focus on specific sources of information,
such as Reddit, YouTube, or academic papers.
All of this leads to a tool that is in many ways much more useful for that custom research
purpose than our chaty-be-t or Claude on their own.
And so again, this isn't some crazy, sophisticated, unique point of view.
It's just simply to say that as we move into the
integration phase of AI, to the actual use case phase of AI, to the workflow phase of AI, where people
are not just dabbling and experimenting, but actively figuring out how to use these things
day and day out to improve outcomes, whether it's in their personal life or in their professional life.
The application interfaces are adapting to match those specific use cases, integrations, and workflows.
That doesn't mean that the shift to a natural language interface with the computer or the chatbot
interface isn't significant, not at all. It's just that rather than overwhelming absolutely every
other way of interacting with computers, it's now going to be an essential part and an essential
tool and option in user interface design. I think the combination of natural language interfaces
plus interfaces that are designed with custom purposes, with specific purposes in mind, is going to be
where the rubber really hits the road. But anyways, hopefully that's a little interesting discussion.
If you want to try any of these tools, I will of course link to them. And if these tools have your
perked up for that beta, go to bit.ly slash AI beta. I am opening up registration for January
next week. For now, I appreciate you listening as always. And until next time, peace.
