The AI Daily Brief: Artificial Intelligence News and Analysis - ChatGPT All Tools and What OpenAI Is Really Building

Episode Date: November 5, 2023

In the lead up to November 6th's OpenAI DevDay, NLW looks at their recent launch of PDF reading and the All Tools model and explores that it suggests about the company's bigger plans. Today's Spon...sor: Listen to the chart-topping podcast 'web3 with a16z crypto' wherever you get your podcasts or here: https://link.chtbl.com/xz5kFVEK?sid=AIBreakdown  ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI breakdown, we're looking at what OpenAI is really building and why it's less about an application and more about the future of computing in general. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our YouTube channel, our Discord, and our newsletter. Welcome back to the AI breakdown. Undisputedly, undoubtedly, the biggest event coming up in artificial intelligence in the near future is OpenA. AI's Dev Day, which is happening on November 6th. This event was announced back in September, and it's the first time that OpenAI has held a developer-focused event. Now, speculation about what is going to be announced at this event is running rampant. Altman himself, when he announced
Starting point is 00:00:52 it, made sure to say that there was not going to be any sort of GPT 4.5 or GPT5, but he thought that people would still be excited anyways. One of the areas of speculation, which we've talked about previously on the show is around autonomous agents. Summed up here in this peeky blinder meme, where furious Killian Murphy is saying, no effing autonomous agents. Today, we will go to a little bit of speculation at the end about what might get announced on Debday, but that's actually not so much the point of today's episode. Instead, what we're looking at is a set of recent updates from OpenAI around ChatGPT in advance of the event and what they suggest about how OpenAI sees what it's actually building. So first, let's look at what has changed.
Starting point is 00:01:36 If you are a plus user of chat GPT, when you go to GPT4, you have a set of different models that you can choose between. You can choose default, which is their model that is not connected to the internet, and that has been trained on data up to January 22, or as some people have reported perhaps a little bit later, but the point being that that is their default model, and again, it is not connected to the internet. That is in contrast with the next model option that you have, which is Browse with Bing. Now, Browse with Bing was in fact turned off for quite some time after users figured out how to use it to get around paywalls. It's only a few weeks ago that it opened back up, but for many people, if they are doing tasks that involve understanding current information,
Starting point is 00:02:16 Browse with Bing is in some ways the default model that they're likely to choose. Two of the models that made waves earlier in the year are Advanced Data Analysis and Plugins. Now, Advanced Data Analysis used to be called Code Interpreter. Advanced Data Analysis is probably a slightly better name, but neither really fully captures the idea that in this version of GPT4, the system is able to actually write code to help it figure out how to solve problems given to it by the user. I and others have done just innumerable videos about how different people use Code Interpreter or Advanced Data Analysis, with some of the most frequently cited examples being things like segmenting customers, taking a spreadsheet of data, and then using it to figure out different clusters of a market.
Starting point is 00:02:54 Another capacity that people have shown is to feed advanced data analysis some set of data and just ask it to figure out what it thinks is interesting. To come up with hypotheses, for example, it can create basic visualizations, graph public data, and visualize data in a ton of different ways. Even though it's slightly old news at this point, it does remain one of the most impressive features of any AI product. Now, of course, plugins add functionality to chat chip ET by creating some access to different datasets or platforms.
Starting point is 00:03:22 The one that I use most often is Xpapers, which plugs into, Cornell University's archive site, which is a free open access archive of nearly 2.3 million scholarly articles. Whenever a new AI paper is released, it ends up pretty quickly on archive, and the XPapers plugin allows me to get summaries of those things, just by sharing the link to the particular piece I'm interested in. Now, at this stage, there are an insane number of plugins that you can use. I'm scrolling through a list currently across all different categories that someone put together on an open Google document, and we are currently well over a thousand and plugins ranging from music to travel, to SEO and marketing, to education, to productivity,
Starting point is 00:03:58 to finance, to crypto, to science, to business, and more. The last GPT4 model is, of course, Dali 3, and that's certainly the one that I've been using most frequently. I already had mid-journey deeply in my workflow, particularly when it comes to YouTube thumbnails or images for newsletters, and the ability to interact with Dali 3 through chat GPT where I can actually use English language rather than just traditional prompting, gives it a much better ability for me to get my weird request across, such as these images on the screen of a Thanksgiving table with Bitcoin. I really love this huge Bitcoin pumpkin pie with the little
Starting point is 00:04:29 tiny Bitcoin pumpkin pies all around it. So the new update is two parts. First of all, there is a new ability to upload and analyze files. Specifically, you can chat now with PDFs. Now, this is a big deal because there are numerous plugins that do exactly that, and it is a very, very compelling feature with just a ton of different use cases. However, the second update, the second update, is what some people are calling all tools, which basically means that now, instead of manually switching between these different GPT4 models, chat GBT itself can actually figure out which are most useful based on the instructions of the user. So functionally, what this allows for is all of these different capacities to come together much more fluidly. Let's look at a couple examples of how
Starting point is 00:05:10 people have demonstrated this. Daragwalsh writes, Multimodal AI is the future. I uploaded the great Gatsby PDF to chat GBT, then asked Dali 3 to create images from the key scene. Gatsby's Mansion left off the page. Having all tools in one chat is a game changer. Indeed, I think this sort of layer one use case of uploading an image and asking for a modified version of it or an image that resembles it is one of the simplest but clearest benefits of this all tools update. Previously, the Dolly 3 model didn't give you the ability to upload an image as a reference point,
Starting point is 00:05:41 which was one of its biggest limitations. Now, Anu Akash also demonstrated how the integration of Browse with Bing could be useful, asking GPT4 all tools, what is the price of it? MacBook Pro M3, creative visual of MacBook Pro M3 with that price tag. Now, of course, for those of you who have been following along, you know that the MacBook M3 is the new advanced chip that Apple is actually promoting as significant and important to the AI space in what really represents their first time talking actively about the industry and trying to compete directly for people who are in it. So with this prompt, obviously ChatGPD had to both browse with Bing to figure
Starting point is 00:06:14 out the price, and then it used Dolly 3 to create the image that had that information that it had searched up, integrated into the output. Now, these are all very basic demonstration uses. They're literally all, I think, designed to figure out what the capacity of all tools is, rather than really stretch and get creative about how all these tools in one spot changes what we can do. But there is one more thing that some eagle-eyed folks noticed, which is that apparently all tools has an increased token context window as well. Whereas GPD4 previously had an 8K token context window, it appears that all tools has a 32K token window. Now, it makes sense that they would try to roll out a larger context window given that they are now allowing people to upload PDFs. But still, this is a huge
Starting point is 00:06:54 update and one that people have been eagerly awaiting for months and months and months now. But where I really want to get to is the conversation that surrounds this. And now a word from today's sponsor. Are you interested in how two top-of-mind trends AI and crypto can work together? If so, I have the perfect podcast recommendation for you. Web 3 with A16Z Crypto, the chart-topping show brought to you by venture firm Andresen Horowitz. Web 3 with A16Z Crypto is your definitive resource for the future of the internet, whether you're already building in these spaces or simply curious about what's next. If you need a place to start, they recently released an excellent episode with Stanford Cryptography
Starting point is 00:07:34 Professor Dan Bonay and former Google Xer Aliya in conversation with host Sonal Choxi about the intersection of AI and crypto. from fighting deepfakes and proving humanity to large language models like chat GPT, they cover it all. I highly recommend checking it out, especially if you'd like to learn more about how AI and crypto will impact our everyday lives. Beyond crypto and AI, this show is for creators seeking more ways to truly own their work, for business leaders trying to prepare for the future today, and for innovators exploring trending tech topics. So go ahead, listen to Web3 with A16Z crypto, wherever you get your podcasts.
Starting point is 00:08:08 Alex Kerr summed up a huge number of tweets that I saw when he wrote, Many startups just died today because OpenAI added PDF chat. You can also chat with data files and other document types. We had a wave of products better suited as features rather than standalone companies. Rappers are being squeezed by OpenAI on one side and incumbents on the other. It's a rough world out there. Now, of course, what Alex is referring to is the way in which companies that filled in the gaps of the chat GPT product are now being out-competed as chat GPT and OpenAI adds that functionality natively.
Starting point is 00:08:41 This is always the risk of building on someone else's platform. What makes it more brutal, and I think what Alex is recognizing, is that there have been a lot of different pressures on startups that perhaps they didn't anticipate going into the AI space earlier this year. Some of those are the platforms like OpenAI themselves, deciding to compete in a particular area, but some of them are just the speed with which incumbents and big players have adapted and integrated versions of a lot of different services
Starting point is 00:09:04 that previously startups might have been focused on into the tool sets that people are already using. Mike Butcher from TechCrunch wrote, New ChatGPT demonstrates how small tech companies and products that relied on parsing PDFs will eventually be wiped out by AI platforms. AI isn't just coming for blue-collar jobs, it's also coming for engineers. Now, my note is that actually I think it's coming more for engineers than blue-collar jobs, but that's the subject of a different show as well.
Starting point is 00:09:27 Now, there are two analyses that I wanted to share in completion because I think they do a really good job of summarizing and putting into context these moves in a bigger way. Itapai quote tweeted Alex Kerr and wrote, I don't know why there is any surprise. Here's OpenAI's product strategy for the next two years. You will be able to upload anything to chat GPT. You will be able to link to any external service like Gmail and Slack.
Starting point is 00:09:49 ChatGPT will have persistent memory, no more multiple chats unless you want it. ChatGPT will have a consistent user customizable personality, including political bias. ChatGPT will be able to respond by text, voice, images, diagrams, and video still in this time frame. ChatGPT will become much faster until you feel it's a real person, around a 50-millimeter response time. Hallucinations and non-factual errors will decline rapidly. As self-moderation improves, question rejection will decline. Now, Adipai also added a little nuance to the agent conversation. When browsing and able responded to them and said, Little Birdie told me AI is pretty far ahead in developing agents that run continuously and
Starting point is 00:10:26 complete tasks. Adding memory and learning over time, reasoning, different fine-tuning, and determination of output quality is far from real true loops. Browsing Nabled wrote, Little Bertie told me that OpenAI is pretty far ahead in developing agents that run continuously and complete tasks. To that, Adipai responded, AI initiated actions. I hesitate to say agents because it's a loaded word. Current browsing and data analysis tools are already AI initiated.
Starting point is 00:10:49 Would expect to see that functionality be extended to more tools. Now, expanding this line of thinking even further was Rob Phillips at I Was Robbed on Twitter. Rob wrote, As an ex-Viv with Siri team engineer, let me help ease everyone's future trauma as well with the fundamentals of assisted intelligence. Make no mistake, OpenAI is building a new kind of computer, beyond just an LLM for a middleware and front end. Key parts they'll need to pull it off. Persistent user preferences. The biggest unlock of assistance has always been to deeply understand what someone wants in the most specific way.
Starting point is 00:11:21 This is the wow moment where computers stop being scary and start being truly helpful. We did this in 2016 on VIV when our AI knew what you liked for each and every single. every service you used via Viv and mix that in with context like what kind of flowers you told us your mom liked. This will need to include access to your personal information to infer preference as well. External real-time data. 50% of the utility of an LLM comes from the base training and RLHF fine-tuning, but much more comes from extending its available data with external sources. Zapier, Airbyte, and others will help, but expect deep integration with third-party apps and data pipelines. Chat with PDF is a tiny, tiny part of this. If you're only building that, think much bigger.
Starting point is 00:11:56 actual computing on virtual machines. Context windows are limiting, so AI providers will continue benefiting from running tasks directly on a Python or no-Deno virtual environment so it can consume huge amounts of data just like a computer today can. Today, these are short-lived environments used by data analysts and Julius, but over time they'll become a new type of Dropbox where your data is persisted long-term for additional processing or cross-file inference and insights. Agent Task and Flow Planning. Planning can't function without intent.
Starting point is 00:12:22 Understanding intent has always been a holy grail, and LLMs finally helped us unlock what we spent years approximating at bib with NLP tricks. Once intent is accurate, planning can start. Creating an agent planner is incredibly nuanced and will take significant integration with user preferences, third-party data sets, knowledge of compute capabilities, etc. An app store of experts. Apple initially made the mistake of building a closed app store, then it realized it could monetize a cornucopia of creativity if they opened it up. Regardless of OpenAI saying they're focused on chat GPT and only chat GPT, it's inevitable they'll re-scope it and enable a long tail of specialized agents. Builders will be able to compose
Starting point is 00:12:56 multiple tools together into workflows that can specialize, and AIs over time will be able to auto-complete these tools together as well, learning from the builders that came before them. Persistent contextual memory. Embeddings are helpful, but they are missing fundamental parts like context switching, conversational centroid, summarization, enrichment, etc. Most of the cost of LLMs today comes from prompts, but as history and persistence is embedded and the inference cached, this will unlock the ability to have long-term memory with pointers to critical subjects, topics, feelings, and tone. Core memory is just the beginning. We still need all the rich information our minds conjure when we think about a past sunset, a breakup, a scientific understanding,
Starting point is 00:13:30 or sensitive context for people we interact with. Now, guys, I know that this is long, but I want to continue because I think it's hugely, hugely relevant. Rob goes on. Long polling tasks. Agent is a loaded word, but part of the intent is to have tasks that can be scheduled in self-completing regardless of the time horizon required. E.G. Let me know when flights from Montreal to Hawaii are less than $500.
Starting point is 00:13:51 This will require coordination of compute across API providers as well as virtual environments in the cloud. Dynamic UI. Chat is not the final end-all interface. There's a reason apps have affordances like buttons, date-pickers, images. It simplifies. AI will be a co-pilot, but to be a co-pilot, it'll need to adjust to what works best for a given user.
Starting point is 00:14:10 The future is personalized as optimizations require it, so UI will be dynamic. API and tool composition. Expect AIs to generate custom, quote-unquote, apps in the future, where we can build our our own workflows to compose together APIs without waiting for a big startup to do so. Fewer apps and startups will be needed to generate front-ends, and AI will be better at composing an array of tools and APIs together coupled with a gas-fee or tax. Assistant to Assistant interactions. There will be countless assistants in the future,
Starting point is 00:14:37 with each assisting humans and other assistants towards some greater intent. Alongside this, assistance will need to learn to interface across text, APIs, file systems, and other modalities used by both agents-slash-startops and humans as integration flows deeper into our world. Plug-in slash tool stores. Specialized assistance can only be made possible by composed. using tools, APIs, prompts, data preferences, and much more. The current plugin store is super early days, so expect much more work to come and expect many of those plugins to be rolled in
Starting point is 00:15:04 house as they become more mission-critical. And this is just a 10-minute brain dump. Much, much more is needed behind the scenes including internet search and scraping, community for intent, building, RLHF, etc. Dynamic API generators and connectors, gas fees, tool builders, ingestion via glasses, earbuds, et cetera. If you think it's too late to be an AI, just know the above is about 25% of what it will actually take with much more to come as we iterate and get even more creative. The point of all of this, this very long and super interesting perspective on where things are going, is that what you see with Chatchip-T is the very first iteration, now maybe the second or third iteration, depending on how you look at it, of a clean break with a former
Starting point is 00:15:45 modality of computing. Thinking about it simply as an application that does some set of things completely misses how comprehensively it's challenging the way that we interact with computers, and indeed the way that we interact with the world and get anything done in general. Every time OpenAI or ChatGBTBTBT adds another feature, it has this feeling of having meant to have always been there. Of course, browsing the internet was supposed to be wired with image input was supposed to be wired with image output. How could it ever have not been so? And so to the extent that one is building a startup in this space, the old Wayne Gretzky notion of skating to where the puck is going has never been more significant. Those who design for the static world in which some feature or another
Starting point is 00:16:25 is missing from what we see now are too likely to be steamrolled by the inevitable spur of progress. I, for one, am excited to see what OpenAI announces at their dev day, and I'm sure we'll be talking about it here. For now, I appreciate you listening or watching as always, and until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.