The AI Daily Brief: Artificial Intelligence News and Analysis - OpenAI DevDay: Everything You Need To Know

Episode Date: November 7, 2023

Yesterday OpenAI announces 128k GPT-4 Turbo at 1/3rd the price; a new Text-to-Speech model; Whisper 3; and proto-agent features like the Assistants API and Custom GPTs. Today's Sponsors: Listen to th...e chart-topping podcast 'web3 with a16z crypto' wherever you get your podcasts or here: https://link.chtbl.com/xz5kFVEK?sid=AIBreakdown  Interested in the opportunity mentioned in today's show? jobs@breakdown.network ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

Transcript
Discussion (0)
Starting point is 00:00:01 Today on the AI breakdown, we're talking about all of the biggest announcements from OpenAI's Dev Day, which happened yesterday. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown. Network for more information about our YouTube, our Discord, and our newsletter. Welcome back to the AI breakdown. We are back in studio, finally, after a few days of travel. But today will still be a slightly different episode than normal. There is so much to cover from yesterday's OpenAI Dev Day, as well as a little follow-up from the weekend,
Starting point is 00:00:39 that instead of having the normal convention of a brief followed by a main episode, we will just be doing the main episode today. Our normal format will be back, I anticipate, tomorrow. Today we are getting into everything you need to know from OpenAI's Dev Day, which was held yesterday in San Francisco. But I would be remiss if we didn't at least mention the announcement from over the weekend, which was, of course, Elon Musk and XAI announcing their new GROC.
Starting point is 00:01:03 Now, you might have caught the episode that I did entirely with AI, including a Hey Gen Video Avatar that I posted as a bonus episode over the weekend, so I won't get into this much, I just wanted to touch on it briefly as myself. The way that the XAI team described GROC was as an AI modeled after the Hitchhiker's Guide to the Galaxy. So they say intended to answer almost anything and far harder even suggest what questions to ask. GROC, they say, is designed to answer questions with a bit of wit and has a rebellious streak, so please don't use it if you hate humor. A unique and fundamental advantage of GROC is that it has a real-time knowledge of the world via the X platform. It will also answer spicy questions that are rejected by most other AI systems.
Starting point is 00:01:41 GROC is still a very early beta product, the best we could do with two months of training, so expected to improve rapidly with each passing week with your help. So even just in that announcement, they hit on a number of the main things. The first is the idea of it having humor and its responses. I mentioned before the post that Elon shared where someone asked Grock how to make cocaine step by step. And Grock said, oh sure, just a moment while I pull up the recipe for homemade cocaine, you know, because I'm totally going to help you with that. The announcement also points out its access to real-time data, which Elon showed off with a set of questions about Elon's most recent interview on Joe Rogan, which happened just a couple weeks ago.
Starting point is 00:02:16 Now, of course, much has also been made of Elon's anti-woke positioning of the chatbot. That's something that I noted in that previous piece about just how much that seems to be a clear emphasis in the way that they're trying to differentiate from chat GPT and all the others. Now, one thing that got a little bit lost in the announcement was the prompt IDE. XAI describes it like this. The XAI prompt IDE is an integrated development environment for prompt engineering and interpretability research. It accelerates prompt engineering through an SDK that allows implementing complex prompting techniques and rich analytics that visualize the network's outputs. We originally created prompt IDE to accelerate our development
Starting point is 00:02:52 of GROC. It has helped us iterate quickly over different prompts and prompting techniques. We're now making the IDE available to members of our GROC Early Access program. Now, I think this is relevant in the context of today's OpenAI Dev Day focus, given that what that event represents and what we're seeing more broadly is competition for developer affiliation as these AI models get more advanced, and especially as closed models, face more competition from their open source peers. Now, there is going to be a ton more to talk about when it comes to GROC in the coming weeks I anticipate, but that is really not the focus of today's show. The focus of today's show is, of course, OpenAI's first developer conference dev day happened in San Francisco yesterday, and there was
Starting point is 00:03:31 so much to cover. In fact, basically the entirety of the AI breakdown first five this morning was about announcements from that event and there could have been a lot more. So what we're going to actually do today is break the announcement into four categories. And even with this, there are still some things that I'm missing, but this should give you the high level of the most important parts of the announcements and the reactions from the developer and the larger AI community. So first, let's talk about whisper and text to speech. Whisper is open AI speech to text model. If you've ever used the chat GPT mobile app, you'll know that Whisper is frankly what you expect Siri to be. It's an unbelievably accurate model that is unbelievably fast and can even handle environments where there's a lot of background
Starting point is 00:04:12 noise. Whisper is getting a third update which will be released to open source, although it was one of the only things that wasn't actually available upon the announcement of it, and so should be something that we get in the weeks to come. Now, in addition to Whisper's speech to text model, we also got a new text-to-speech model that can convert text into spoken audio. Now, out-of-the-gate text-speech comes with six highly realistic voices, but they're also apparently going to allow people to have professional voice cloning of their own voices, as well as creating up to 30 custom voices. At first glance, AI developer Nate Chan wrote that OpenAI's text-to-speech looked to be around 10 to 20x cheaper than 11 labs, which, if that's true, would obviously have huge implications
Starting point is 00:04:51 for that company, which, as you guys know, is one that I use very frequently. Now, some people have already begun the comparison. Justin de Guzman writes, just try to open AI text to speech. Non-HD about the same speed as 11 Labs, HD is slower. Audio quality, even HD, is lower. 11 Labs' expressiveness of voice is still a lot better. No voice cloning yet makes prototyping less fun, but 10x cheaper is compelling. It's interesting because this is the first time that we've seen OpenAI really, really compete on price first versus just on quality. Now, we will in just a moment get to another part of the announcement where again open AI is being at least conscious of cost, but that's different than just focusing on it as a main competitive advantage.
Starting point is 00:05:32 Just benchmark the new OpenAI text to speech, coming in at around one second for TTS1. However, they've massively undercut others on price. Daniel Monge summed it up, Open AI came out of left field and today revealed an 11 labs-level TTS that costs one-sixth of the price or 1-12th if you use the standard model. That's kind of crazy. Now, this was a big theme of the discourse surrounding the event, summed up quite well by Stability AIs and Mod Mostok, who wrote, Open Eye killing it, where it is all those dead AI startups.
Starting point is 00:06:03 Now, these whisper and text-to-speech announcements would be huge on any other day, but were totally overshadowed by some of the other announcements at the event. The next one that I'll mention is the new GPT4 Turbo. Now, this was a huge update for the GPT4 API. First of all, we are clearly heading towards multimodal, given that the Vision API, API is integrated as is Dolly 3 and this new text-to-speech model, but the biggest news comes in terms of the context window and in terms of the price. GBT4's API context window used to be 32K, but it is now 128K. That's effectively as long as most any book that you might read, holding aside, I don't know, Brandon Sanderson novels or something like that.
Starting point is 00:06:41 The cost for input tokens is down 3x and the cost for output tokens is down 2x. And this is what I was referring to when I said that Open AI is clearly conscious of cost questions when it comes to devise. developer affiliation. In fact, one of the loudest and most raucous sections of applause at the event was when Sam Altman announced those price cuts. Every co-founder Dan Shipper pointed out another set of benefits from the new GPT4 Turbo, including, quote, more control, GPT4 will respond with valid JSON and can call multiple functions. Better knowledge. The API now comes with retrieval built in, and knowledge cutoff is now April 2023. He also points out the multimodal that we just talked about with Dolly, text to speech, and vision all being in the API as well. He also points out that
Starting point is 00:07:21 fine-tuning for GPD4 is coming out today in experimental access. Now, another announcement that once again went a little bit under the radar is that OpenAI is following the model of a number of companies in the space, including Microsoft and Adobe, in that they're guaranteeing that they will pay legal fees for developers who build on top of their platform and ultimately get sued for copyright infringement. Said Sam Altman, we can defend our customers and pay the costs incurred if you face legal claims around copyright infringement, and this applies both the ChatGPT Enterprise and the API. This Copyright Shield program, which is what they're calling it, applies to, as Sam pointed out, the enterprise users and to developers using the API, but not to free chat GPT or chat GPT plus users.
Starting point is 00:08:00 People are, of course, already racing to figure out how they might use the new expanded context window. And to get a sense of one of those use cases, let's listen to this short video from AI content creator and educator, Riley Brown. The new chat GPT is here. Let's break it down. This video is How to Write. And it has eight million. With one click, I can download the transcript, I can go back to chat GBT, I can pull up the text file, and the text file is just a giant wall of text. You see this? This is an hour and 20 minutes of a lecture. And we're going to hit control A, control C, copy everything, go back to chat GBT, and we are going to type summarize every section of this video.
Starting point is 00:08:44 Break down every important point. that entire transcript in here and press enter. Now let's see how this bad boy does. So here is critique of traditional writing, challenges faced by expert writers, interference in reader comprehension, and it's basically going through every single part of the video, creating value in writing, differences in academic writing, and standardized testing. Now every piece of text that you own now has so much more value with this chat GPT, and this doesn't even include the feature of creating GPTs where you can create a chatbot on hundreds of these documents right here. Then you can ask it to expand on any
Starting point is 00:09:23 part, right? So effective problem construction. We're going to copy this right here. And please expand on this part based on the text I provided earlier. And we're going to hit enter. And this is very, very just great information for writing scripts for any form of social media or for any storytelling in any manner. And now a word from today's sponsor. Are you interested in how two top of mind trends AI and crypto can work together? If so, I have the perfect podcast recommendation for you. Web 3 with A16Z Crypto, the chart-topping show brought to you by venture firm Andresen Horowitz. Web 3 with A16Z Crypto is your definitive resource for the future of the internet.
Starting point is 00:10:04 Whether you're already building in these spaces or simply curious about what's next. If you need a place to start, they recently released an excellent episode with Stanford Cryptography Professor Dan Bonay and former Google Xer Aliya in conversation with host Sonal Choxi about the intersection of AI and crypto. From fighting deepfakes and proving humanity to large language models like chat GPT, they cover it all. I highly recommend checking it out, especially if you'd like to learn more about how AI and crypto will impact our everyday lives. Beyond crypto and AI, this show is for creators seeking more ways to truly own their work, for business leaders trying to prepare for the future today, and for innovators exploring trending tech topics. So go ahead, listen to Web3 with A16Z
Starting point is 00:10:43 crypto wherever you get your podcasts. Now, somehow, we still haven't really gotten to the part of the presentation that the most people were excited about, but that's where we're headed next. In his keynote presentation, OpenAI CEO Sam Altman talked a lot about the future of AI agents. This has been one of the hottest areas of AI development throughout the entire year. You might have heard an episode or seen a video about AutoGPT or Baby AGI or any of the numerous other projects that are trying to build autonomous agents that can actually go out and solve problems on their own and complete tasks on behalf of their creators and users.
Starting point is 00:11:23 Albin shared the company's belief that these agents were going to play an increasingly important role in society and the economy, as well as the company's belief that when it comes to these types of disruptive changes, the best way to figure out how to adapt to them is to slowly walk down the path towards them and figure out how to adapt as we get real-life evidence of what happens. With that in mind, two of the biggest parts of the announcements from Dev Day were the Assistance API and Custom GPTs, both of which they viewed as very first steps towards that AI agent future. Let's start with the Assistance API, and in terms of a summary, I'll share a tweet from Sully Omar that reads, Open AI literally flipped the entire AI landscape on its
Starting point is 00:12:02 head. Most AI code is likely tech debt. Companies have spent hundreds of millions of dollars building their own assistance API, and now it's available to everyone. This is huge for the little guys, brutal for the big guys. So what is the assistance API? Well, it's an API for building agent interfaces. It has a set of core primitives including threading, retrieval, code interpreter, and function calling. Here's how Conrad Nat sums it up. Threads and messages for each user create a stateful thread add messages. Improved function calling allows AI to respond with actions on your front end. Function calling also includes guaranteed JSON output, and retrieval allows for the upload of related documentation. Finally, code interpreter figures out if there is a need to write custom code, and then can actually go generate those files.
Starting point is 00:12:48 OpenAI writes, the Assistance API allows you to build AI assistance within your own applications. An assistant has instructions and can leverage models, tools, and knowledge to respond to user queries. The Assistance API currently supports three types of tools. Code interpreter, retrieval, and function calling. In the future, we plan to release more OpenAI-built tools and allow you to provide your own tools on our platform. At a high level, a typical integration of the Assistance API has the following flow. One, create an assistant in the API by defining it custom instructions and picking a model. If helpful, enable tools like code interpreter, retrieval, and function calling.
Starting point is 00:13:23 Two, create a thread when a user starts a conversation. Three, add messages to the thread as the user asks questions. Four, run the assistant on the thread to trigger responses. This automatically calls the relevant tool. So almost immediately people started hacking. Yohei, who's built things like Baby AGI, writes, The Assistance API is awesome, had to build something, open sourcing GBT versus GPT,
Starting point is 00:13:45 a simple template to have two AI assistants converse. What's exciting is the ability to extend these with retrieval data and custom functions. You set the parameters for two assistants, then feed that in with the topic and number of messages you want. You'll get something like this. Topic, global warming. Pirates speaking.
Starting point is 00:14:01 Arm 80, the seas be arising, and the winds be a blowing. and the winds be a blowing, for we be facing the wicked scourge of global warming. Let's set sail on a treacherous journey to explore this dire threat to our world. Mermaid speaking. Like, oh my gosh, global warming is like seriously the worst. Anyway, kind of a cheesy application, but just something that was thrown together to demonstrate what this could do. Another application that looks a little bit more like something that someone might use comes from Brian Sunter, who writes, using the OpenAI Assistance API to make an AI chatbot from my blog post in less than five minutes.
Starting point is 00:14:31 We see a video in the Assistance Builder Playground where Brian writes, Instructions, you are the public-facing AI assistant for Brian Sunter, answering questions in the style of the uploaded documents. Answer questions about the uploaded documents. The documents are writings from his personal blog. He then selects the model, GPT4116 preview, and uploads a set of a dozen files. Then in the preview section, when a user asks,
Starting point is 00:14:53 what is Brian Sunder's blog about? The anew assistant API chatbot responds with information that comes from the post that he uploaded. It doesn't take much imagination to see how this could be applied to any content creator on the web, or frankly, any company with products or services that people might want to know more about. Now, what about limitations? Well, Zhao Aguiam gets into some of those. He writes, a maximum of 20 file uploads per assistant. Each file can be up to 512 megabytes with a 100 gigabyte max at the organization level. Function calling has a maximum wait time of 10 minutes for execution.
Starting point is 00:15:24 There's no support for streaming output. Image generation is not supported. You need to call the Dali 3 API separately, image analysis is not supported. You need to call the vision API separately. Retrieval capabilities do not extend to XLS or CSV files. But still, and this is something that Bennett's strategy pointed out, as opposed to the way that tech presentations have trended recently, where products are announced weeks before they're actually available, open AI was by and large, with the one exception of the Whisper 3 API, actually putting all these tools out into the world as soon as they had announced them. Given that, there's going to be, I imagine, a lot more patience for some of those limitations that we just listed. Still, I think for me and for many people,
Starting point is 00:16:01 the most potentially game-changing aspect of the announcement was the announcement of custom GPs, which are basically customized specific purpose versions of chat GPT that anyone can create with natural language. Here's how Sam Altman described them. GPs are tailored version of chat GPT for a specific purpose. You can build a GPT, a customized version of chat GPT for almost anything, with instructions, expanded knowledge and actions, and then you could publish it for others to use. use. And because they combine instructions, expanded knowledge, and actions, they can be more helpful to you. They can work better in many contexts and they can give you better control. They'll make it easier for you to accomplish all sorts of tasks or just have more fun, and you'll be able to use them right
Starting point is 00:16:40 within chat GPT. You can, in effect, program a chat GPT with language just by talking to it. OpenAI CTTL Mira Morati writes, GBTs are not omniscient. They're custom versions of chat GPT tuned for specific tasks, smart tools that I'm certain we won't be able to live without. OpenAI describes GPTs as a new way for anyone to create a tailored version of chat GBT to be more helpful in their daily life, at specific tasks, at work, or at home, and then share that creation with others, no code required. So what are some examples of this? Well, one that Sam Altman himself built live was a startup mentor that used previous speeches of his from his time leading Y Combinator to help give founders of startups advice that he might give them, but automatically through a chatbot.
Starting point is 00:17:23 To do this, Sam goes to the GPT builder, where it asks him to describe what he wants to build. Sam writes that he wants to build a chatbot that gives advice to founders, and in a preview window on the side of the builder, you can actually see how the GBT builder is interpreting his instructions and starting to turn it into what will eventually be the custom GPT. The GPT builder comes up with a name and a suggested icon, both of which Sam accepts, calling it startup mentor, and then he goes into the configure menu, where, among other things, he can upload a file, in this case the transcript from a speech he's previously given about startups. From that same configure menu, he can also add or subtract capabilities, including
Starting point is 00:17:59 web browsing, Dali image generation, and code interpreter. And within just a few minutes, he's got a working version of one of these custom GPs. Professor Ethan Malik gave another example. He writes, here's a little GPT, the name for the new agent-like thing released by OpenAI, that I threw together in less than a minute. It looks up the latest trends for a product category on the web and then creates prototype images for it. It takes less than 90 seconds end to end. So in the demo video, Ethan shares the tool, which is called trend analyzer, asks the user what type of product they're interested in.
Starting point is 00:18:30 It says, are you thinking about technology, fashion, home goods, or something else entirely? Once I know the product category, I can look into the latest trends for you. Ethan writes sneakers, and the trend analyzer goes off and sources some trends around 2023 that could be built into the product design. It comes back with six different trends, all featuring sources, and then asks, which, if any, the user wants to proceed with. When Ethan writes, you decide, trend analyzer says, we'll go with a high top silhouette with chunky elements and bold and vibrant colorways.
Starting point is 00:18:57 From there, it says, next I'll create realistic photo shoots of our futuristic sneaker concept incorporating these trends. That, of course, is where Dolly 3 comes in, and boom, all of a sudden you have this futuristic shoe concept. Nick Dobos had another example. He writes, playing with the new chat GPT custom GPs, introducing GIFPT, automatically turned Dolly images into GIFs. Now, as Nick points out, quote,
Starting point is 00:19:19 this was based on some of my earlier experiments using Dolly and code interpreter to make gifts, but now I can package that entire workflow into a GPT that does it all for you. And this is something that I think people are initially perhaps missing a little bit, at least those who are inclined to be contrarian about all of the excitement surrounding this. For example, if you go back to Ethan's post, someone writes, how is this any different than Bing chat with image generation? But as Ethan points out, it uses the same tool, so it just makes it easier to share and to work with large prompts. It also includes a lot of features that Bing hasn't implemented yet.
Starting point is 00:19:51 Connection to outside systems, CI, and working with files. I think it would be incorrect to underestimate how much it matters to simplify workflows in the way that these custom GPs do. My general belief is that every simplification of a workflow leads to a massive increase in the number and variety of use cases for the thing that's underlying that workflow. In other words, open AI making this a lot easier, means a lot more people are going to use it, and for a lot different purposes than they might have before. Now, to the extent that there was any skepticism around custom GPTs, it wasn't around their usefulness, but about whether there will actually be demand to buy these. As part of their announcement, Open AI said that within a month, we would have a custom GPT store. X user and AI
Starting point is 00:20:31 Trendwatcher Boris writes, the idea of building a private library of custom GPs is a great and very useful one. I'm a little skeptical of the store, but it might turn out to be great as well. AI entrepreneur Bindu Reddy as part of a much larger critique post, one of frankly the only ones that I saw, was also skeptical of this. She writes, To be honest, you need tens of millions of subscribers before you start worrying about a paid developer ecosystem on top of chat GPT. Like plugins, it seems premature. I also suspect this is to compete with Barr that promises to unify information from all your Google products.
Starting point is 00:21:03 If I were OpenAI, I wouldn't worry. No one seems to be using Bard at the moment, and unless Gemini is launched and is really good, no one will continue to use it. Robert Scoble, however, disagrees. He writes, The store encourages developers to build for it, which gives OpenAI quite a bit of lock-in even after competitors arrive.
Starting point is 00:21:19 Think about it this way. If you're getting paid $1,000 a month for building a useful GPT, will you leave just because Elon Musk has a better AI? Nope. So the race is on to get developers addicted to your ecosystem today. I could see buying quite a few GPTs to help me run my business in life.
Starting point is 00:21:34 That revenue, even if expected to be small for a while, like Bindu says, will provide lock-in. Now, speaking of Google and Gemini, Nvidia's Dr. Jim Fan writes, expectation for Google Gemini is now ridiculously high. Gemini has to check off at least one of the following. 120% IQ of textual GPT4, or 100% of GPT4 but at half the cost or 2x speed of turbo,
Starting point is 00:21:56 or 100% of visual GPT4 or natively support long videos, and ship the API in Q1 of 2024. It's about time that DeepMind recovers the glory of AlphaGo in 2016. I'm looking forward to it. However, they have their work cut out for them. Investor Ali Miller writes, I'm at Open AI Dev Day in San Francisco. I was front and center for the keynote.
Starting point is 00:22:18 I've tweeted so many tweets, but there is one big takeaway that you're not going to see over the live stream, one feeling that you only get in person. And that is, compared to every other big tech event I've been to, OpenAI Dev Day is the highest, okay, I have to go build something with this new release immediately score. I'm talking 11 out of 10 builder activation score. It's incredible.
Starting point is 00:22:39 Putting an exclamation point on that, AI entrepreneur and developer Sam Whitmore writes, when you run away from Dev Day to go integrate everything as soon as humanly possible, dream day for people building AI powered products. Thank you, OpenAI. This is certainly what I felt when I was watching these announcements, which I was doing in an airport and an airplane on my way back from Mexico. Right now, Open AI has builder imagination in a huge way.
Starting point is 00:23:04 They are moving quickly. They're introducing new things. They are to the point of the video that I'm, I released over the weekend. Walking down a path to a very different type of relationship with computing, and frankly, it's exciting to watch. I'm sure that over the course of this week, we will come back to these topics and see what people are hacking on and what early experiments are showing the possibility of things like the Assistance API and Custom GPs. But for now, after this long episode, we are going to wrap it there. I appreciate you guys listening or
Starting point is 00:23:29 watching as always. I'm very excited to be back with you all. And until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.