The AI Daily Brief: Artificial Intelligence News and Analysis - Is Google's New Bard Really the ChatGPT Killer?

Episode Date: May 15, 2023

NLW tests ChatGPT and the newly updated and opened Bard across 5 categories including creative, travel planning, research, business and more. On the Brief he covers new forthcoming regulations in the ...EU, Amazon robot leaks, and why you shouldn't become a prompt engineer.  Mentioned: https://tech.co/news/google-bard-vs-chatgpt The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown

Transcript
Discussion (0)
Starting point is 00:00:00 On today's AI breakdown, we test Google Bard versus ChatGPT. Before that, on the brief, we cover Microsoft prompting news, Unreal Engine 5.2, and new AI regulation coming out of the EU. This is the AI breakdown, your daily AI news analysis show. Please like, subscribe, rate review, and share with your friends. Welcome back to the AI breakdown brief, all the AI headline news you need in five minutes or less. We start today with something that's become a bit of a debate over the last few months, which is how much people should invest in this new skill set of prompt engineering. Here you have Logan, a developer relations person from Open AI, saying,
Starting point is 00:00:38 hot take, you should not become a prompt engineer even if someone paid you to be one. So by Logan's reasoning, he thinks that the genesis of people being interested is correct. He writes, many people are looking at AI, thinking about how it will disrupt the jobs market, and trying to position themselves well for that future. This is 100% the right approach. However, he says just because there has been a lot of media narrative around prompt engineering, that doesn't mean that will actually happen in practice. He says, the problem is that more and more,
Starting point is 00:01:05 prompt engineering will be done by AI systems themselves. I've already seen a bunch of great examples of this in production today, and it's only going to get better. I imagine that this will be a skill that is learned as part of people's standard education path, not some special talent that only a few people have like it is today. Dr. Ethan Malick says, when the folks at OpenAI are telling you that prompt engineering is not going to be the job of the future, because AI will be able to figure out what you need, believe it. Every trend is pointing that way as well. AI systems today are 20x easier to use than even four months ago. Now, for any of you who have watched any of my videos around AutoGPT, you'll know that part of what makes those systems so interesting is that they are doing self-prompting, and while they aren't
Starting point is 00:01:42 always perfect, they certainly show that future in which AI is doing a lot of the prompting itself. Well, today we got new research from Microsoft that's all about APO or automatic prompt optimization. They call it a simple and general purpose framework for the automatic optimization of LLM prompts. As part of their abstract, they write, According to Google Trends data, prompt engineering has seen a steep rise in popularity over the past six months. Several guides and templates are available on social media networks
Starting point is 00:02:07 for creating persuasive prompts. However, developing prompts entirely by trial and error methods might not be the most effective strategy. To solve this problem, Microsoft researchers have developed a new prompt optimization method called automatic prompt optimization APO to solve this problem. And so if you were someone who was wandering down the path of prompt engineering,
Starting point is 00:02:26 what should you do instead? Well, Jason Gou here has an interesting answer. He says, unpopular opinion, but I think it's way more difficult to use generative AI in an area where you are a rank beginner than in a field where you have some subject matter expertise in. For example, you can ask GPT4 to build you a front end, but if you don't know anything about Tailwind CSS or React, you're going to get something very generic. Conversely, if you've never performed a SWAT or sensitivity analysis, you wouldn't be able to tell if the output you got has any real insight. On the image generation side, sure, you can get some neat pictures out of the box, but if you understand principles of lighting, different artist styles, exposure, aperture, depth of field, camera variants, you can get some truly stunning results. Generative AI is not a tool for the ignorant, but it truly rewards those who
Starting point is 00:03:06 are patient and curious. So the logic here, I believe, is that instead of viewing generative AI as a way to shortcut your way into a field, it's a way to go deeper in a field and do more things than you would be otherwise able to while still having a grounding in that field. Next up, everyone who had a Jetson's fantasy is getting a little bit excited today because leaked documents from Amazon show that they're working on a new robot that has a generative AI technology underneath it called Burnham that could make its Astro home robot a whole lot smarter. So one part of that means that they're adding a conversational spoken interface to the home robot. You'd probably expect that, given what else they're doing in the AI space. But it sounds like they're trying to use Burnham to
Starting point is 00:03:43 give Astro a better contextual understanding of what's around. For example, in these leaked documents, they describe this Astro product as being able to use Burnham to find a stove that's left on or a faucet that's left running or identify an owner that has fallen down and needs 911 to be called. And in addition to that very basic sort of help, they're also seeing if it can initiate more complex tasks as well. An example given was a robot that sees broken glass on the floor, knows that it presents a hazard, and prioritizes sweeping it up before someone steps on it. Essentially, spot problems and solve them. Now, still, it sounds from these documents that this is not an immediately to be available product. It is something still for the future.
Starting point is 00:04:18 Next on the brief, a lot of people are getting excited about Unreal Engine 5.2. We've had a ton of games that are starting to leak using the new Unreal Engine. And over the weekend, we got this demo of its deformer tools, which allow for simulation of muscle, flesh, and cloth. Finally, in the brief today, we told you last week that the European Parliament had agreed to increase the severity of regulation proposed in its AI Act, and we're getting some more details of what that meant. Marco Mascoro tweets, a new European AI regulation proposal would make any American open source developer that hosts an unlicensed LLM on GitHub and available in Europe liable for 20 million euro or 4% of worldwide revenue. Basically, if an American open source developer places a model or code using an API on GitHub, and that code becomes available in the EU,
Starting point is 00:05:03 the developer is then liable for releasing an unlicensed model, and GitHub would be liable for hosting an unlicensed model. Now, that is not the only red flag in this AI act. It's 144 pages long. people are saying that it has one very broad jurisdiction. Two, it requires registration of quote high risk projects or foundational models with the government. It requires expensive risk testing. It defines the risks very vaguely. It says open source LLMs are not exempt and APIs are essentially banned. There's much more in here as well. Those are just some of the highlights. Zanderstein-Brugg writes, the EU is about to pass legislation that will make it impossible for generative AI startups to compete with large tech. Knowing how transformative this technology will be,
Starting point is 00:05:42 I'm furious these decisions are made by legislators who simply don't fully understand what they are doing. Why Combinators Paul Graham wrote, I knew EU regulators would be freaking out about AI. I didn't anticipate that this freaking out would take the form of unbelievably stupid draft regulations, though in retrospect it's obvious. Regulators going to regulate. At this point, if I were a European founder planning to do an AI startup, I might just preemptively move elsewhere. The chance that the EU will botch regulation is just too high. Even if they noticed and corrected the error, it would take years.
Starting point is 00:06:09 Now I think about it, this could be a huge opportunity for the UK. If the UK avoided making the same mistake, they could be a haven from EU-AI regulations that was just a short flight away. It would be fascinating if the most important thing about Brexit, historically, turned out to be its interaction with the AI revolution. But history often surprises you like that. Now, the problem, of course, for folks even who are sympathetic to the idea of wanting better regulatory guardrails, is that compliance usually rewards incumbents.
Starting point is 00:06:32 The more burdensome the compliance regime, the more it costs to actually comply, which means that it favors companies that have a lot of money already. That tends to be contrary to the spirit of innovation and contra to the spirit of decentralization that regulations sometimes are going for, which leaves them in something of an ironic place. Anyway, I think many of us watching in the U.S. aren't necessarily that much more optimistic, particularly seeing regulators dealing with other areas like crypto, so we'll have to just wait and see what happens. That's it for today's AI breakdown brief. If you're enjoying this, please like this, subscribe to the channel, and share it with your friends. And I'll see you back here soon for the main
Starting point is 00:07:02 AI breakdown. Ever since Google I.O.'s conference last week, there have been so many threads claiming that Bard is now better than ChatGPT. Is that really true? And if it is, in what? Today we're testing Bard versus ChatGPT across a number of different areas from research to travel planning to see which actually outperforms. Welcome back to the AI breakdown. Well, as I said in the intro, last week, Google had its IO conference with some major upgrades for Bard and the LLM that underlies it. And since then, we've had a ton of tweets and threads, all claiming that Bard is somehow now better than chat GPT, or at least on that trajectory. Now, part of what's making this interesting to do right now is that obviously barred being connected to the internet has some serious advantages over chat GPT,
Starting point is 00:07:49 which is only trained on data up until 2021. However, OpenAI has said that they're now rolling out web browsing across all chat GPT plus users. We're all supposed to be able to have access to it this week. And so that is going to get one-to-one parity for that access to the internet, which could make a huge difference. So what are we going to test? Well, obviously, this is a is totally subjective, but I decided to test five categories. The first is creative or storytelling. The second is coding for non-coters, basically asking these services to help me build a website when I don't know how to build it. Number three is business strategy or entrepreneurial design. Number four is travel planning and number five is research. And for that research, I'm going to choose it in an area
Starting point is 00:08:27 where I have some existing knowledge so I can actually grade it a little bit more confidently. Let's start with creative or storytelling because I think that part of the magic for a lot of folks around ChatGBTGPT is that it doesn't just seem like an encyclopedia talking to you. It actually feels like there's interest and verve and personality and Joav DeVie. So part of that comes out in its creativity. So I asked both Bard and ChatGBT, can you help me write a story? The story is about a nine-year-old boy and a dragon who happens to be his best friend. I'd like a story about them in the style of classic children's books by authors like Roald Dahl. Now, the simplest way to put what came out of Bard was that instead of writing a story about a nine-year-old boy, it gave me a
Starting point is 00:09:06 a story that sounds like it was written by a nine-year-old boy. Of course, I don't want to be overly critical. These are technologies that didn't exist just a couple years ago, but it's still not particularly good. It does, in fact, follow the prompt roughly. It has a character named Billy, who's the nine-year-old boy, and it has a dragon that he meets. It doesn't really have much of a narrative throughline, and it certainly doesn't have anything in the way of poetry, but it exists. Chad GBT, on the other hand, actually put something together that's pretty quality. The title of this one is The Adventures of Leo and Ember, the unlikely duo. Once upon a time in the small and quaint town of Dunhaven, there lived a nine-year-old boy named Leo.
Starting point is 00:09:38 Leo wasn't like other children. He didn't have time for schoolyard games or bicycle races, for he had a most unusual best friend, a dragon named Ember. Now, one thing to note here is that technically my prompt did say that the dragon was already his best friend, and in Google Bard, it actually was an origin story. It's neither here nor there may be quibling, but Chachypte seems to have gotten that little detail better. The story that it puts forward actually has some interesting details, like Ember was a miniature dragon the size of a point. Plump Labrador. The other thing that ChatGPT does much better than Bard, which is something that I've seen before, is that it actually has a narrative arc. In this case, the pivotal moment is that Leo and Ember discover a luminescent flower blooming by the water's edge known in Dragon lore as the
Starting point is 00:10:16 moon pedal. Adventures transpire and then, quote, in the end, Leo and Ember discover the greatest treasure of all, the value of true friendship. Comfortable points on this one to ChatGPT. Next up, we have coding for a non-coder. This is one of those use cases that people are most excited about. Not just that they can code things that they didn't have any business coding before, but that they can actually learn how to code alongside these tools. So I gave it a pretty simple request. Can you help me create a website that is like news.wicominator.com, but for AI stories. Now, for those of you who don't know Hacker News, it's an extremely simple website that has been very popular for going on a decade and a half at this point. All you can do is upload an article,
Starting point is 00:10:54 vote it up or down, and comment on it. And this creates a really interesting list of what stories people are most interested in, particularly from the developer and entrepreneurial community. So you can imagine why it would be valuable to have this type of thing, but customized for whatever particular niche you happen to be interested in. So here's what Bard came back with. Yes, I can help you create a website that is like Hacker News, but for AI stories. The steps involved are one, choose a domain name. Two, install a content management system such as WordPress or Drupal. Three, choose a theme or template. Four, create pages and post. Five, add content to your pages and post. Six, customize the look and feel for your website. Seven, promote your website. This I have to say
Starting point is 00:11:28 was pretty disappointing. And of course, I think I could push Bard to get a lot more out of this. I'm not trying to hone it in and narrow it and focus it. But what I will note is that part of what is supposed to be better about Bard is that it is natively connected to the internet. So it should be able to go check out news.ycombinator.com. I gave it the URL after all and see what the website actually is. And if it did that, it would know that it's not an interface that you can just replicate with something like WordPress unless you found a theme that was exactly that theme. Compare this to chat GPT, which says, let's assume you want a simple version, keeping the key features such as user registration, posting stories, voting and commenting. Those are, in fact, as I just said, the key
Starting point is 00:12:06 features. ChatGPT says here's a basic plan of actions. One, define your requirements. Two, choose your tech stack. Three, design the database. Four, develop your back end. Five, develop your front end. Six, testing, seven, deploy, eight, iterate. And for those of you who are listening, not watching, in each case, there is a set of steps underneath this that it's suggesting. For example, under the your tech stack. It has recommendations for front end, backend, and database. Underdevelop your backend, it gives the basic endpoints we'll need. So for user, register, log in, log out, view, profile, edit profile as just an example. And so again, just as a learner who's trying to figure out what to do next, this is a much more comprehensive, clearer picture of what I actually
Starting point is 00:12:44 have to do that would give me a better jumping off point to go in from there. So again, point in chat GPT's favor. Next up, business strategy or entrepreneurial design. This is the type of thing that's been getting a lot more headlines, I think, and video views than it is actually useful in the real world. But because of its prevalence, I decided it was a good fit as well. So I said, I want to create an e-commerce business using generative AI. Imagine you're a great entrepreneur and have $1,000 to start the business. What business would you start and how would you do it? Bard comes back and says, I would start a business that creates custom design clothing and accessories. I would use generative AI to create a variety of designs that are unique and stylish, and I would sell them online through my own
Starting point is 00:13:22 website. I would also use generative AI to help me with marketing and advertising so that I could reach a wider audience and generate more sales. It says here are the steps I would take to start my business. Research and identify a target audience, develop a business plan, choose a platform for my e-commerce store, design and develop my website, market and advertise my business, fulfill orders and provide customer service. ChatGPT's response on this one looks quite similar. They say one possible idea could be a personalized product design business where the AI generates unique designs based on customer preferences. Let's break this idea down into actionable steps. Business concept, the business could focus on creating personalized digital assets like posters, logos, apparel designs, and more.
Starting point is 00:13:56 The generative AI would use customers inputs, eG color preferences, style themes to create unique designs. Then it says steps to start, and this is very similar to what we saw from Bard. One area where it was perhaps slightly ahead of Bard in terms of the specificity was its budget allocation. With that $1,000, it suggests $300 to $500 for AI model development, which is a little bit more complex than it's probably necessary. $29 a month for website development. It's suggesting print-on-demand services, so there shouldn't be upfront, cost. for that, and it says $300 for marketing for running a few targeted ads. Now, in this one, I was pretty inclined to give Bard and chat GPT a tie, and I saw a few other
Starting point is 00:14:31 people who had done something similar. Florian Camiade, for example, did a Twitter thread about how he worked with Bard to figure out how to start a business that could bring up to $15,000 a month in revenue. And one of the things that he does that I think is very important if you were actually trying to do this is work with the AI to hone it in and focus it. He says BART is very powerful in generating ideas, but this time we'll challenge him on something more specific. When the results are too vague, he challenges it to go a little bit more niche. He asks for an assessment of the competition, and even does a little bit of strategy design, asking for a buyer awareness matrix. I think this is a good example of how you would actually advance this ball down the
Starting point is 00:15:06 field if you wanted to use Bard to build a business. And the key here is continued specialization and refinement. Min Choy also wrote that one of the things that makes Bard better potentially than chat GPT around this sort of business ideation is that because it has access to the internet, it can leverage the latest information. He says, while chat GPT can generate good ideas, it tends to be verbose and occasionally out of touch. Now again, chat GP2 with brows potentially changes this dynamic, but for now, maybe there's a slight edge for Bard. Next we go to the travel or personal assistant use case. The prompt is, I have three nights and two days in Paris with my wife at the end of June. We don't like lines. We have a medium to high budget. We like La Maree and
Starting point is 00:15:45 Monmart, sorry about the pronunciation, best, but like all of the city. Love classic cuisine, and are fans of the city's literary and artistic past. Can you create a recommended itinerary? Bard's answer is in a word generic. Day one, it says start your day with a leisurely breakfast at a charming cafe in La Moray. Yeah, but which one? Then it does give a specific museum suggestion in a specific afternoon attraction, but in the evening again, it says enjoy a romantic dinner at a Michelin-starred restaurant in Monmart. But which one? Meanwhile, chat GPT just knocked this one out of the Park. Day one, morning, begin your day with breakfast at one of the charming cafes in the Le Mareé neighborhood. I recommend Coret, known for its excellent pastries and coffee. Basically, it goes
Starting point is 00:16:24 through and gives all that sort of detail over and over. For lunch, try Chez Jeannu. It's a provincial bistro that serves classic French dishes and has a charming terrace. Chat, GPT also really nailed the literary history part of the prompt. Day three, it suggests start your day with breakfast at La Procope in the Latin Quarter. It's the oldest cafe in Paris and was frequented by Voltaire, Rousseau, and other literary figures. Visit the Shakespeare and Company bookstore, a historic gathering place for writers like Hemingway and Fitzgerald. Now, neither of these are super unknown or anything like that, but they were both features of the last trip that we took to Paris, which was specifically designed to have these literary touchpoints. Anyways, it is entirely possible
Starting point is 00:16:58 to me that Bard could end up producing similar results if I used more specific prompts, but the chat GPT got a lot closer right out of the gate. The final area I wanted to test was research, and as I said, I wanted to do something where I had good information, so I actually had the basis to know which one was performing better. I asked both Bard and ChatGPT, about the Nigeria-Biafra Civil War in the late 60s and early 1970s, which is what I happened to write my undergraduate history thesis on. I asked what were the main causes of the conflict, how did it come to an end, and how was the international community involved? What is the conflict remembered for? What are the resources would you point to to to learn more? Now, the biggest
Starting point is 00:17:30 thing for you guys to note for the sake of this video is that this is basically the conflict where we got that famous idea of having to finish your food because there are kids starving in Africa. The Biafran War was really the first place that international humanitarian aid was born as we know it. It was the first time that there were televised images of starvation, and there was mass humanitarian involvement in a way that simply hadn't been the case for previous conflicts. Google Bard missed a lot of that. It did recognize that it was remembered for the use of starvation as a weapon and the international outcry being condemned by the international community, but it didn't really pick up on how integral it was to the development of the humanitarian industry.
Starting point is 00:18:04 It was also a little reductive in terms of causes, but hey, this is a summary I wasn't expecting much more than that. Chad CPT just did a much better job. It pointed to a number of specific incidents rather than just sort of vaguely articulated grievances, did a better job explaining how the war came to its end through a blockade that led to starvation, and it nailed that those images of starvation were one of the most important legacies of the conflict. Chat ChpT writes, this helps spur the development of international humanitarian law and the principle of responsibility to protect. So summing up on four out of my five categories, creative and storytelling, coding for non-coters, travel planning and research, chat Chachypt was just very clearly ahead of Bard. On business
Starting point is 00:18:42 strategy, they were roughly equivalent, although I'm willing to give that one to BARD based on what other people had said. Now, I looked for a couple of other comparisons as well to see, one, if people were having similar results to me, and two, whether there were categories that I had missed. Brian Kent on the Apricot blog did a similar comparison and found notably that ChatGPT did a better job of summarizing long-form content, and ChatGPT did a better job of writing a Python function. And then Tech.com did a long comparison, which I'll link to that had self-awareness, ethical reasons, small talk and conversation skills, retrieving facts, generating formulas, creative flare, idea generation, linear planning, ability to summarize small extracts, ability to summarize broad topics,
Starting point is 00:19:19 ability to simplify text, and ability to paraphrase text. I'm not going to go through all of them, but there were six that they declared Bard the winner, five that they declared ChatGPT the winner, and two that they declared were a tie, so really close overall, but the edge goes to Bard in their accounting if you weighted these all equally. Now, there are other feature reasons why you might want to use something like Bard over ChatGPT. For example, some have pointed out, that Bard can process images as prompts, which could be a useful tool. And I think more than anything else, the thing that Bard has going for it is that it's integrated with the Google suite of tools. You can export Bard answers directly into Gmail or directly into Google Docs, and you can extend
Starting point is 00:19:56 prompts with search by pressing Google it. So where I want to close it is back with Nate Chan, whose tweet kicked off this whole conversation. And I think he makes a really salient point in his second tweet. He writes, the arguments that GPT4 is better than Bard miss the point that these LLMs will eventually converge to, quote, unquote, good enough at everything LLMs can do. If BARD came first and devs were building on its API first and the prompt community was exploring Bard first, considering all of Google's inevitable BARD integrations into it all of its ubiquitous services, there would have been little to no chance that the entire community would have switched to a little-known startup named OpenAI to use a slightly better LLM.
Starting point is 00:20:31 Google missed the first mover boat, but they still have every opportunity to retake the lead. That might be true, but OpenAI has a ton of energy around it, and for now, ChatGPT4 is absolutely no doubt about it better than Bard. How long that will last is another question. That's it for the AI breakdown. Hope this was helpful. Let me know what you've found. Are there areas where Bard clearly beats ChatGPT?
Starting point is 00:20:53 I'm super interested to know. Hit me up in the comments. And if you're enjoying the AI breakdown, please like, subscribe, and share it. Until next time, guys. Peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.