The AI Daily Brief: Artificial Intelligence News and Analysis - Is Google's New Bard Really the ChatGPT Killer?
Episode Date: May 15, 2023NLW tests ChatGPT and the newly updated and opened Bard across 5 categories including creative, travel planning, research, business and more. On the Brief he covers new forthcoming regulations in the ...EU, Amazon robot leaks, and why you shouldn't become a prompt engineer. Mentioned: https://tech.co/news/google-bard-vs-chatgpt The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown
Transcript
Discussion (0)
On today's AI breakdown, we test Google Bard versus ChatGPT.
Before that, on the brief, we cover Microsoft prompting news, Unreal Engine 5.2, and new AI regulation coming out of the EU.
This is the AI breakdown, your daily AI news analysis show.
Please like, subscribe, rate review, and share with your friends.
Welcome back to the AI breakdown brief, all the AI headline news you need in five minutes or less.
We start today with something that's become a bit of a debate over the last few months,
which is how much people should invest in this new skill set of prompt engineering.
Here you have Logan, a developer relations person from Open AI, saying,
hot take, you should not become a prompt engineer even if someone paid you to be one.
So by Logan's reasoning, he thinks that the genesis of people being interested is correct.
He writes,
many people are looking at AI, thinking about how it will disrupt the jobs market,
and trying to position themselves well for that future.
This is 100% the right approach.
However, he says just because there has been a lot of media narrative around prompt
engineering, that doesn't mean that will actually happen in practice. He says, the problem is that more and more,
prompt engineering will be done by AI systems themselves. I've already seen a bunch of great examples of
this in production today, and it's only going to get better. I imagine that this will be a skill that is
learned as part of people's standard education path, not some special talent that only a few people
have like it is today. Dr. Ethan Malick says, when the folks at OpenAI are telling you that prompt
engineering is not going to be the job of the future, because AI will be able to figure out what you need,
believe it. Every trend is pointing that way as well. AI systems today are 20x easier to use than even
four months ago. Now, for any of you who have watched any of my videos around AutoGPT, you'll know that part of
what makes those systems so interesting is that they are doing self-prompting, and while they aren't
always perfect, they certainly show that future in which AI is doing a lot of the prompting itself.
Well, today we got new research from Microsoft that's all about APO or automatic prompt optimization.
They call it a simple and general purpose framework for the automatic optimization of LLM prompts.
As part of their abstract, they write,
According to Google Trends data,
prompt engineering has seen a steep rise in popularity
over the past six months.
Several guides and templates are available on social media networks
for creating persuasive prompts.
However, developing prompts entirely by trial and error methods
might not be the most effective strategy.
To solve this problem, Microsoft researchers
have developed a new prompt optimization method
called automatic prompt optimization APO
to solve this problem.
And so if you were someone who was wandering down the path of prompt engineering,
what should you do instead?
Well, Jason Gou here has an interesting answer.
He says, unpopular opinion, but I think it's way more difficult to use generative AI in an area where you are a rank beginner than in a field where you have some subject matter expertise in.
For example, you can ask GPT4 to build you a front end, but if you don't know anything about Tailwind CSS or React, you're going to get something very generic.
Conversely, if you've never performed a SWAT or sensitivity analysis, you wouldn't be able to tell if the output you got has any real insight.
On the image generation side, sure, you can get some neat pictures out of the box, but if you understand principles of lighting,
different artist styles, exposure, aperture, depth of field, camera variants, you can get some truly
stunning results. Generative AI is not a tool for the ignorant, but it truly rewards those who
are patient and curious. So the logic here, I believe, is that instead of viewing generative
AI as a way to shortcut your way into a field, it's a way to go deeper in a field and do more things
than you would be otherwise able to while still having a grounding in that field. Next up,
everyone who had a Jetson's fantasy is getting a little bit excited today because leaked documents from
Amazon show that they're working on a new robot that has a generative AI technology underneath
it called Burnham that could make its Astro home robot a whole lot smarter. So one part of that
means that they're adding a conversational spoken interface to the home robot. You'd probably expect that,
given what else they're doing in the AI space. But it sounds like they're trying to use Burnham to
give Astro a better contextual understanding of what's around. For example, in these leaked documents,
they describe this Astro product as being able to use Burnham to find a stove that's left on or a faucet
that's left running or identify an owner that has fallen down and needs 911 to be called.
And in addition to that very basic sort of help, they're also seeing if it can initiate more
complex tasks as well. An example given was a robot that sees broken glass on the floor,
knows that it presents a hazard, and prioritizes sweeping it up before someone steps on it.
Essentially, spot problems and solve them. Now, still, it sounds from these documents that this is
not an immediately to be available product. It is something still for the future.
Next on the brief, a lot of people are getting excited about Unreal Engine 5.2.
We've had a ton of games that are starting to leak using the new Unreal Engine.
And over the weekend, we got this demo of its deformer tools, which allow for simulation of muscle, flesh, and cloth.
Finally, in the brief today, we told you last week that the European Parliament had agreed to increase the severity of regulation proposed in its AI Act,
and we're getting some more details of what that meant.
Marco Mascoro tweets, a new European AI regulation proposal would make any American open source developer that hosts an unlicensed
LLM on GitHub and available in Europe liable for 20 million euro or 4% of worldwide revenue.
Basically, if an American open source developer places a model or code using an API on GitHub, and that code becomes available in the EU,
the developer is then liable for releasing an unlicensed model, and GitHub would be liable for hosting an unlicensed model.
Now, that is not the only red flag in this AI act. It's 144 pages long.
people are saying that it has one very broad jurisdiction. Two, it requires registration of
quote high risk projects or foundational models with the government. It requires expensive risk
testing. It defines the risks very vaguely. It says open source LLMs are not exempt and APIs
are essentially banned. There's much more in here as well. Those are just some of the highlights.
Zanderstein-Brugg writes, the EU is about to pass legislation that will make it impossible for
generative AI startups to compete with large tech. Knowing how transformative this technology will be,
I'm furious these decisions are made by legislators who simply don't fully understand what they are doing.
Why Combinators Paul Graham wrote, I knew EU regulators would be freaking out about AI.
I didn't anticipate that this freaking out would take the form of unbelievably stupid draft regulations,
though in retrospect it's obvious. Regulators going to regulate.
At this point, if I were a European founder planning to do an AI startup,
I might just preemptively move elsewhere.
The chance that the EU will botch regulation is just too high.
Even if they noticed and corrected the error, it would take years.
Now I think about it, this could be a huge opportunity for the UK.
If the UK avoided making the same mistake, they could be a haven from EU-AI regulations that
was just a short flight away.
It would be fascinating if the most important thing about Brexit, historically, turned out
to be its interaction with the AI revolution.
But history often surprises you like that.
Now, the problem, of course, for folks even who are sympathetic to the idea of wanting
better regulatory guardrails, is that compliance usually rewards incumbents.
The more burdensome the compliance regime, the more it costs to actually comply, which
means that it favors companies that have a lot of money already.
That tends to be contrary to the spirit of innovation and contra to the spirit of
decentralization that regulations sometimes are going for, which leaves them in something of an ironic
place. Anyway, I think many of us watching in the U.S. aren't necessarily that much more optimistic,
particularly seeing regulators dealing with other areas like crypto, so we'll have to just wait and see
what happens. That's it for today's AI breakdown brief. If you're enjoying this, please like this,
subscribe to the channel, and share it with your friends. And I'll see you back here soon for the main
AI breakdown. Ever since Google I.O.'s conference last week, there have been so many
threads claiming that Bard is now better than ChatGPT. Is that really true? And if it is,
in what? Today we're testing Bard versus ChatGPT across a number of different areas from research
to travel planning to see which actually outperforms. Welcome back to the AI breakdown. Well, as I said in
the intro, last week, Google had its IO conference with some major upgrades for Bard and the LLM that
underlies it. And since then, we've had a ton of tweets and threads, all claiming that Bard is somehow now better
than chat GPT, or at least on that trajectory. Now, part of what's making this interesting to do right now
is that obviously barred being connected to the internet has some serious advantages over chat GPT,
which is only trained on data up until 2021. However, OpenAI has said that they're now rolling out
web browsing across all chat GPT plus users. We're all supposed to be able to have access to it this
week. And so that is going to get one-to-one parity for that access to the internet, which could make a
huge difference. So what are we going to test? Well, obviously, this is a
is totally subjective, but I decided to test five categories. The first is creative or storytelling.
The second is coding for non-coters, basically asking these services to help me build a website when I
don't know how to build it. Number three is business strategy or entrepreneurial design. Number four is
travel planning and number five is research. And for that research, I'm going to choose it in an area
where I have some existing knowledge so I can actually grade it a little bit more confidently.
Let's start with creative or storytelling because I think that part of the magic for a lot of folks around
ChatGBTGPT is that it doesn't just seem like an encyclopedia talking to you. It actually feels like
there's interest and verve and personality and Joav DeVie. So part of that comes out in its creativity.
So I asked both Bard and ChatGBT, can you help me write a story? The story is about a nine-year-old
boy and a dragon who happens to be his best friend. I'd like a story about them in the style of
classic children's books by authors like Roald Dahl. Now, the simplest way to put what came out of
Bard was that instead of writing a story about a nine-year-old boy, it gave me a
a story that sounds like it was written by a nine-year-old boy. Of course, I don't want to be
overly critical. These are technologies that didn't exist just a couple years ago, but it's still
not particularly good. It does, in fact, follow the prompt roughly. It has a character
named Billy, who's the nine-year-old boy, and it has a dragon that he meets. It doesn't
really have much of a narrative throughline, and it certainly doesn't have anything in the way of poetry,
but it exists. Chad GBT, on the other hand, actually put something together that's pretty
quality. The title of this one is The Adventures of Leo and Ember, the unlikely duo.
Once upon a time in the small and quaint town of Dunhaven, there lived a nine-year-old boy named Leo.
Leo wasn't like other children.
He didn't have time for schoolyard games or bicycle races, for he had a most unusual best friend, a dragon named Ember.
Now, one thing to note here is that technically my prompt did say that the dragon was already his best friend, and in Google Bard, it actually was an origin story.
It's neither here nor there may be quibling, but Chachypte seems to have gotten that little detail better.
The story that it puts forward actually has some interesting details, like Ember was a miniature dragon the size of a point.
Plump Labrador. The other thing that ChatGPT does much better than Bard, which is something that
I've seen before, is that it actually has a narrative arc. In this case, the pivotal moment is that
Leo and Ember discover a luminescent flower blooming by the water's edge known in Dragon lore as the
moon pedal. Adventures transpire and then, quote, in the end, Leo and Ember discover the greatest
treasure of all, the value of true friendship. Comfortable points on this one to ChatGPT.
Next up, we have coding for a non-coder. This is one of those use cases that people are most excited
about. Not just that they can code things that they didn't have any business coding before,
but that they can actually learn how to code alongside these tools. So I gave it a pretty simple
request. Can you help me create a website that is like news.wicominator.com, but for AI stories.
Now, for those of you who don't know Hacker News, it's an extremely simple website that has been
very popular for going on a decade and a half at this point. All you can do is upload an article,
vote it up or down, and comment on it. And this creates a really interesting list of what
stories people are most interested in, particularly from the developer and entrepreneurial community.
So you can imagine why it would be valuable to have this type of thing, but customized for
whatever particular niche you happen to be interested in. So here's what Bard came back with.
Yes, I can help you create a website that is like Hacker News, but for AI stories. The steps involved
are one, choose a domain name. Two, install a content management system such as WordPress or Drupal.
Three, choose a theme or template. Four, create pages and post. Five, add content to your pages and post.
Six, customize the look and feel for your website. Seven, promote your website. This I have to say
was pretty disappointing. And of course, I think I could push Bard to get a lot more out of this.
I'm not trying to hone it in and narrow it and focus it. But what I will note is that part of
what is supposed to be better about Bard is that it is natively connected to the internet. So it should
be able to go check out news.ycombinator.com. I gave it the URL after all and see what the website
actually is. And if it did that, it would know that it's not an interface that you can just replicate
with something like WordPress unless you found a theme that was exactly that theme. Compare this to
chat GPT, which says, let's assume you want a simple version, keeping the key features such as user
registration, posting stories, voting and commenting. Those are, in fact, as I just said, the key
features. ChatGPT says here's a basic plan of actions. One, define your requirements. Two, choose your
tech stack. Three, design the database. Four, develop your back end. Five, develop your front end. Six,
testing, seven, deploy, eight, iterate. And for those of you who are listening, not watching,
in each case, there is a set of steps underneath this that it's suggesting. For example, under the
your tech stack. It has recommendations for front end, backend, and database. Underdevelop your
backend, it gives the basic endpoints we'll need. So for user, register, log in, log out,
view, profile, edit profile as just an example. And so again, just as a learner who's trying to
figure out what to do next, this is a much more comprehensive, clearer picture of what I actually
have to do that would give me a better jumping off point to go in from there. So again, point in chat
GPT's favor. Next up, business strategy or entrepreneurial design. This is the type of thing that's been
getting a lot more headlines, I think, and video views than it is actually useful in the real
world. But because of its prevalence, I decided it was a good fit as well. So I said, I want to
create an e-commerce business using generative AI. Imagine you're a great entrepreneur and have $1,000
to start the business. What business would you start and how would you do it? Bard comes back and says,
I would start a business that creates custom design clothing and accessories. I would use generative
AI to create a variety of designs that are unique and stylish, and I would sell them online through my own
website. I would also use generative AI to help me with marketing and advertising so that I could reach
a wider audience and generate more sales. It says here are the steps I would take to start my business.
Research and identify a target audience, develop a business plan, choose a platform for my e-commerce
store, design and develop my website, market and advertise my business, fulfill orders and
provide customer service. ChatGPT's response on this one looks quite similar. They say one possible
idea could be a personalized product design business where the AI generates unique designs based on customer
preferences. Let's break this idea down into actionable steps. Business concept, the business
could focus on creating personalized digital assets like posters, logos, apparel designs, and more.
The generative AI would use customers inputs, eG color preferences, style themes to create unique designs.
Then it says steps to start, and this is very similar to what we saw from Bard.
One area where it was perhaps slightly ahead of Bard in terms of the specificity was its budget allocation.
With that $1,000, it suggests $300 to $500 for AI model development, which is a little bit more complex than it's probably necessary.
$29 a month for website development.
It's suggesting print-on-demand services, so there shouldn't be upfront, cost.
for that, and it says $300 for marketing for running a few targeted ads.
Now, in this one, I was pretty inclined to give Bard and chat GPT a tie, and I saw a few other
people who had done something similar. Florian Camiade, for example, did a Twitter thread about
how he worked with Bard to figure out how to start a business that could bring up to $15,000
a month in revenue. And one of the things that he does that I think is very important if you
were actually trying to do this is work with the AI to hone it in and focus it. He says
BART is very powerful in generating ideas, but this time we'll challenge him on something more
specific. When the results are too vague, he challenges it to go a little bit more niche. He asks for an
assessment of the competition, and even does a little bit of strategy design, asking for a buyer
awareness matrix. I think this is a good example of how you would actually advance this ball down the
field if you wanted to use Bard to build a business. And the key here is continued specialization
and refinement. Min Choy also wrote that one of the things that makes Bard better potentially than
chat GPT around this sort of business ideation is that because it has access to the internet,
it can leverage the latest information. He says, while chat GPT can generate good ideas, it tends to be
verbose and occasionally out of touch. Now again, chat GP2 with brows potentially changes this dynamic,
but for now, maybe there's a slight edge for Bard. Next we go to the travel or personal assistant
use case. The prompt is, I have three nights and two days in Paris with my wife at the end of June.
We don't like lines. We have a medium to high budget. We like La Maree and
Monmart, sorry about the pronunciation, best, but like all of the city. Love classic cuisine,
and are fans of the city's literary and artistic past. Can you create a recommended itinerary?
Bard's answer is in a word generic. Day one, it says start your day with a leisurely breakfast
at a charming cafe in La Moray. Yeah, but which one? Then it does give a specific museum suggestion
in a specific afternoon attraction, but in the evening again, it says enjoy a romantic dinner
at a Michelin-starred restaurant in Monmart. But which one? Meanwhile, chat GPT just knocked this one out of the
Park. Day one, morning, begin your day with breakfast at one of the charming cafes in the Le Mareé
neighborhood. I recommend Coret, known for its excellent pastries and coffee. Basically, it goes
through and gives all that sort of detail over and over. For lunch, try Chez Jeannu. It's a
provincial bistro that serves classic French dishes and has a charming terrace. Chat, GPT also really
nailed the literary history part of the prompt. Day three, it suggests start your day with
breakfast at La Procope in the Latin Quarter. It's the oldest cafe in Paris and was frequented
by Voltaire, Rousseau, and other literary figures. Visit the Shakespeare and Company bookstore,
a historic gathering place for writers like Hemingway and Fitzgerald. Now, neither of these are super
unknown or anything like that, but they were both features of the last trip that we took to Paris,
which was specifically designed to have these literary touchpoints. Anyways, it is entirely possible
to me that Bard could end up producing similar results if I used more specific prompts, but the
chat GPT got a lot closer right out of the gate. The final area I wanted to test was research,
and as I said, I wanted to do something where I had good information, so I actually had the basis
to know which one was performing better. I asked both Bard and ChatGPT,
about the Nigeria-Biafra Civil War in the late 60s and early 1970s, which is what I happened to
write my undergraduate history thesis on. I asked what were the main causes of the conflict,
how did it come to an end, and how was the international community involved? What is the
conflict remembered for? What are the resources would you point to to to learn more? Now, the biggest
thing for you guys to note for the sake of this video is that this is basically the conflict
where we got that famous idea of having to finish your food because there are kids starving in
Africa. The Biafran War was really the first place that international humanitarian aid was
born as we know it. It was the first time that there were televised images of starvation,
and there was mass humanitarian involvement in a way that simply hadn't been the case for previous
conflicts. Google Bard missed a lot of that. It did recognize that it was remembered for the use of
starvation as a weapon and the international outcry being condemned by the international community,
but it didn't really pick up on how integral it was to the development of the humanitarian industry.
It was also a little reductive in terms of causes, but hey, this is a summary I wasn't expecting
much more than that. Chad CPT just did a much better job. It pointed to a number of
specific incidents rather than just sort of vaguely articulated grievances, did a better job explaining
how the war came to its end through a blockade that led to starvation, and it nailed that those
images of starvation were one of the most important legacies of the conflict. Chat ChpT
writes, this helps spur the development of international humanitarian law and the principle of responsibility
to protect. So summing up on four out of my five categories, creative and storytelling, coding for
non-coters, travel planning and research, chat Chachypt was just very clearly ahead of Bard. On business
strategy, they were roughly equivalent, although I'm willing to give that one to BARD based on what
other people had said. Now, I looked for a couple of other comparisons as well to see, one, if people
were having similar results to me, and two, whether there were categories that I had missed.
Brian Kent on the Apricot blog did a similar comparison and found notably that ChatGPT did a better
job of summarizing long-form content, and ChatGPT did a better job of writing a Python function.
And then Tech.com did a long comparison, which I'll link to that had self-awareness, ethical reasons,
small talk and conversation skills, retrieving facts, generating formulas, creative flare,
idea generation, linear planning, ability to summarize small extracts, ability to summarize broad topics,
ability to simplify text, and ability to paraphrase text. I'm not going to go through all of them,
but there were six that they declared Bard the winner, five that they declared ChatGPT the winner,
and two that they declared were a tie, so really close overall, but the edge goes to Bard in their
accounting if you weighted these all equally. Now, there are other feature reasons why you might
want to use something like Bard over ChatGPT. For example, some have pointed out,
that Bard can process images as prompts, which could be a useful tool. And I think more than anything
else, the thing that Bard has going for it is that it's integrated with the Google suite of tools.
You can export Bard answers directly into Gmail or directly into Google Docs, and you can extend
prompts with search by pressing Google it. So where I want to close it is back with Nate Chan,
whose tweet kicked off this whole conversation. And I think he makes a really salient point in his
second tweet. He writes, the arguments that GPT4 is better than Bard miss the point that these
LLMs will eventually converge to, quote, unquote, good enough at everything LLMs can do.
If BARD came first and devs were building on its API first and the prompt community was
exploring Bard first, considering all of Google's inevitable BARD integrations into it all of its
ubiquitous services, there would have been little to no chance that the entire community
would have switched to a little-known startup named OpenAI to use a slightly better LLM.
Google missed the first mover boat, but they still have every opportunity to retake the lead.
That might be true, but OpenAI has a ton of energy around it, and for now,
ChatGPT4 is absolutely no doubt about it better than Bard.
How long that will last is another question.
That's it for the AI breakdown.
Hope this was helpful.
Let me know what you've found.
Are there areas where Bard clearly beats ChatGPT?
I'm super interested to know.
Hit me up in the comments.
And if you're enjoying the AI breakdown, please like, subscribe, and share it.
Until next time, guys.
Peace.
