The AI Daily Brief: Artificial Intelligence News and Analysis - An Open GPT-4 Level Model? Meet Falcon 180B

Episode Date: September 6, 2023

Falcon 180B is a new open access foundation model that reportedly performs between GPT3.5 and GPT-4 level. NLW looks at the release and explores the implications for the broader discussion of whether ...advanced models should be released open source. Before that on the Brief, new Zoom AI tools plus a Pentagon speech on AI defense systems. Today's Sponsor: Superintelligent - Advanced 1-on-1 AI mentorship for creators ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Breakdown, we're talking about Falcon 180B, which appears to be one of the most powerful open foundation models yet released. Before that on the brief, new AI features from Zoom and Inuit, the Pentagon making big AI plans, and much, much more. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.net network for more information about our newsletter, our Discord, and our YouTube channel. Welcome back to the AI breakdown brief. All the AI headline news you need in around. five minutes. Now, you guys will know that I am pretty allergic to hyperbole, but today really is one of those days where there's pretty much more news than we can possibly fit into a single
Starting point is 00:00:46 video. We kick it off with the definition of getting back on the horse. Just weeks, it feels like after they had a major customer flare up around their AI training practices, Zoom is back with some new AI features. So first, what this controversy was before is that people noticed a month or two ago that Zoom had changed its terms of service, and it implied, or at least it seemed to non-legal experts, that Zoom was reserving the right to use its customers' videos to train its AI models. Now, Zoom at first went to pains to say that that was only opt-in, but when that didn't quell the controversy, they basically just deleted that entirely. As TechCrunch writes, Zoom updated its policy to explicitly state that communications like customer data won't be used in training AI
Starting point is 00:01:31 apps and services for Zoom or its outside partners. But clearly, they're not. push to integrate AI features is too strong to back off because now Zoom has announced that they have rebranded their Zoom IQ to something that they're now calling AI companion. The AI companion takes advantage of a number of different technology suites from third parties like Meta OpenAI and Anthropic as well as Zoom's own in-house generative AI and also includes a number of features. There are AI tools inside the Zoom whiteboard, Zoom chat, and Zoom mail, but there's now also a chat GPT-like bot that will be a conversational interface, sitting inside the Zoom experience. Here's how TechCrunch sums it up. Users will be able to query
Starting point is 00:02:09 the AI companion for the status of projects, pulling on transcribed meetings, chats, whiteboard emails, documents, and even third-party apps. They'll be able to ask the AI companion questions during a meeting to catch up on key points, create and file support tickets, and draft responses to inquiries. And finally, they say AI companion will help summarize meetings, identify action items, and surface additional next steps. Going even farther, apparently next year, the AI companion will even give real-time feedback on people's presence in meetings, including coaching on conversational and presentation skills, which basically seems destined to create yet another uproar when that feature is actually enabled, especially by employers who are going to see, I guarantee, employees
Starting point is 00:02:47 grumbling about their robot overlords giving them critiques on how they talked during the meeting. Now, overall, this is just one company launching one set of features, but what I think it reflects is the idea that companies are going all in, that this AI tooling isn't just a hypey thing to have in 2023, but a fundamentally different type of user interface that people are making big bets on will shape how people interact with computers going forward. Reinforcing the idea that right now every big tech company is just going through a process of converting their products for a generative AI world, Intuit has launched a new AI-powered digital assistant for small businesses and consumers. Intuit Assist sits inside TurboTax, Credit Karma, QuickBooks, and MailChimp, and can offer
Starting point is 00:03:26 personalized recommendations, checklists, and basically act like an assistant for financial matters. Speaking of AI products launching, as we've been following along with, in the wake of China beginning to approve tech companies to release AI products last week, a slew of products have come to market. One company that was notably absent from a lot of that news was Tencent. They weren't among the first companies to get a license. Today, however, Reuters reports that Tencent is teasing the launch of an AI chatbot through social media posts, and it appears that it'll happen at a two-day summit that kicks off tomorrow on Thursday. Now, obviously, when it comes to China, one big dimension of that story is the battle around AI chip access.
Starting point is 00:04:05 The AI chip space has dominated by Nvidia, but that hasn't stopped many startups from trying to compete and offer something new in the space. Roiders again reports about another AI chip startup Dematrix that has just raised $110 million from backers that include Microsoft. The value proposition of Dematrix's chips is that they supposedly use less energy to process more data, and they're specifically trying to aim at the inference portion of AI processing, rather than than trying to compete with NVIDIA for chips that would be used to train large AI models. As part of the investment, Microsoft is also committed to evaluating the chip for its own use when it launches next year. Now, speaking of the U.S. and China's AI-Tet-a-Tet, the Wall Street
Starting point is 00:04:44 Journal reports that the Pentagon is planning a, quote, vast AI fleet to counter China threat. The Defense Department seeks an array of air, land, and sea-based autonomous systems to keep pace with adversaries. Apparently later today on Wednesday, we are going to get a speech from Deputy Secretary of Defense Kathleen Hicks about the DOD's plans to spend hundreds of millions of dollars to develop a new array of artificial intelligence defense systems. In an interview on Tuesday referring to China, Hicks said, we're not at war, we're not seeking to be at war, but we have to be able to get this department to move with that same kind of urgency because the People's Republic of China isn't waiting. We'll check in tomorrow about whether there is more in that speech that is
Starting point is 00:05:20 worth noting. In the policy sphere, there is a major movement to put pressure on Congress to address issues surrounding AI-generated kitty porn. In a letter to Congress, the attorneys general from all 50 states are asking for a commission that would investigate the impact of AI on child exploitation. The letter discusses the problems of deep-faked child sexual images, and the attorneys general are hoping to explicitly, quote, expand existing restrictions on CSAM to explicitly cover AI-generated CSAM. The letter says, while we know Congress is aware of concerns surrounding AI and legislation has been recently proposed at both the state and federal level to regulate AI generally, much of the focus has been on national security and education concerns. While those interests are worthy
Starting point is 00:06:00 of consideration, the safety of children should not fall through the cracks when evaluating the risks of AI. This strikes me as one of those areas of AI policy, where it would be extremely easy to get quick consensus bipartisan agreement on common sense updates to the rules, even in advance of more comprehensive legislation. Lastly, today, we close on AI in the entertainment space. earlier this year, Anonymous artist Ghostwriter showed up on TikTok with an AI-created song called Heart on My Sleeve. The track, which was a certified banger, had an AI-generated Drake and an AI-generated the weekend performing, and really sent the music industry into a tizzy. It was not just some cute TikTok thing. It was an unignorable force. Well, Ghostwriter is now back with a new
Starting point is 00:06:43 song, this one called Whiplash and using AI versions of Travis Scott and 21 Savage. And an accompanying New York Times article says that Ghostwriter is apparently meeting with music industry executives behind the scenes. If I had to predict one industry that will figure out how to economically co-opt AI creation in their space, it is definitely the music industry. But that might not be a bad thing. Creating approved places for people to actually make AI generated music that is approved or at least can be a part of an approval process, seems like an upgrade from total bannings and endless whack-a-mole legal procedures. The Ghost Rider team has apparently even submitted hard on my sleeve for Grammy Awards for Best Rap Song and Song of the Year.
Starting point is 00:07:23 Anyways, friends, that is going to do it for today's AI Breakdown Brief. I appreciate you listening or watching as always, and I'll be back soon with the main AI breakdown. Hello, friends, briefly before we get into our main episode, I want to share an opportunity that I have coming up. I'm spending a lot of time right now thinking about how I cannot just help you guys understand what's going on in artificial intelligence broadly, but get a little bit more hands-on and applied, particularly when it comes to helping content create.
Starting point is 00:07:48 that could be independent content creators, other podcasters, other YouTubers, or even just people who work in digital media, social media marketing, et cetera, figure out which AI tools you actually need to use and how to best take advantage of them. I'm going to be offering a small handful of one-on-one sessions focused on exactly that, and these will be paid sessions because I want to add a lot of value as you are figuring out how to transition your life or career or your content creation to the world of AI. If you are interested in being one of the select few to get access to these one-on-ones, shoot me a note at nLW at breakdown.network, and I will share more information. Can't tell you how much I appreciate you guys listening, and I'm really excited to
Starting point is 00:08:28 help you take your own steps towards a more artificially intelligent career and future. With that, back to the show. Welcome back to the AI breakdown. One of the things that is almost guaranteed to get people in the artificial intelligence space excited is when we get real powerful foundation model developments. GPT 3.5 and then GPT4, obviously kicked off a lot of what has been this AI boom this year, but then more recently, Lama 2 coming out and being a very powerful open source-ish model has really been an energizer for the space. What's more, as you've heard on recent episodes, there is a ton of rumor and intrigue around Google Gemini, which they are very clearly trying to position, at least behind the scenes,
Starting point is 00:09:11 as a GPT4 killer, given how much compute they're using to train it. Well, today we got another model. And people's first impressions are, well, impressed. Hyperite CEO Matt Schumer says, this is shocking. Falcon 180B has been released, trained on 4X the compute of Lama 2, 70B. It sits between GPT 3.5 and GPT4 in terms of capabilities. We're now less than two months away from GPT4 level open source models. So what we're going to do today is talk a little bit about this new release in Falcon 180B, and then what it means in terms of the larger open source conversation.
Starting point is 00:09:47 And then we'll talk about what it means in the context of the debate around whether these super-powerful advanced models should be open source. First, let's go to the blog post on Hugging Face called Spread Your Wings. Falcon 18B is here. The post kicks off. Today, we're excited to welcome TII's Falcon 180B to Hugging Face. Falcon 18B sets a new state-of-the-art for open models. It's the largest open-available language model with 180 billion parameters and was trained on a massive 3.5 trillion tokens. This represents the longest single epic pre-training for an open model. In terms of capabilities,
Starting point is 00:10:23 Falcon 180B achieves state-of-the-art results across natural language tasks. It tops the leaderboard for pre-trained open-access models, and rivals proprietary models like Palm 2. While difficult to rank definitively yet, it is considered on par with Palm 2 large, making Falcon 180B one of the most capable LLMs publicly known. Now, Falcon is the latest in a series of open models that have come from T-I-I-Hugging Face writes that architecture-wise, Falcon 180B is a scaled-up version of Falcon 40B, and builds on its innovations such as multi-quiry attention for improved scalability. The training data set, they say, consists predominantly of web data from refined web around 85%. In addition, it has been trained on a mix of curated data such as conversations,
Starting point is 00:11:05 technical papers, and a small fraction of code around 3%. Now, when it comes to how good Falcon 180B is, they say, Falcon 180B is the best openly released LLM today, outperforming Lama 270B and OpenAI's GPT3.5 on MMLU. Now, in more practical terms for the average listener, quote, Falcon 180B typically sits somewhere between GBT3.5 and GPT4, depending on the evaluation benchmark, and further fine-tuning from the community will be very interesting to follow now that it's openly released. Another measure which Falcon 180B tops is the Hugging Face leaderboard. Its leaderboard score is 68.74, which comes ahead of Lama 2.
Starting point is 00:11:41 67.35. Now, the rest of the blog post has lots of information about how to actually use it, where to test it, where to demo it. But the really interesting piece that I want to return to is again from Matt Schumer's tweet, where he says we're now less than two months away from GPT4 level open source models. Now, admittedly, when someone asked him where he pulled the two months from, Matt said, just a guess based on watching the space progress over the last few years. But for the sake of our conversation, let's not get caught up in the specifics and more in the point that he's trying to make, which is that we are very, very close to GPT4-level open-source models, which seems at this point with the release of Falcon 180B, pretty certifiably true.
Starting point is 00:12:19 Now, the competition between closed-source models and open-source models has been a key theme of the entire year. One of the most read and referenced documents of the year has to be the internal memo that was leaked from Google called We Have No Mote and Neither Does Open AI. The document, which was published on semi-analyst, basically argued that what companies like Google and OpenAI hadn't anticipated is the extent to which people would be able to make advances with publicly available open source models. Now, they attributed a lot of that to the full leak of Facebook's Lama model, but regardless of the reason, the author said, things we considered major open problems are solved and in people's hands today. Plainly put,
Starting point is 00:12:56 they are lapping us. What's more, that was before Lama 2 was released with a commercially available version. And rumors are that meta is also speeding ahead with their next models. A number of times I've quoted a tweet from Jason at AGI Kowala who said, overheard at a meta-gen-a-i social. We have compute to train Lama 3 and 4. The plan is for Lama 3 to be as good as GPT4. When asked if they would still open-source it, if it was that good, the meta person said, yeah, we will. Sorry, alignment people. Now, this is where we intersect with this debate around whether these frontier models should be released in open source. More recent commentary on that came from Mustafa Sullyman, again from the 80,000
Starting point is 00:13:33 hours podcast that I quoted recently. On that show, he said, I think I've come out quite clearly pointing out the risks of large-scale access. I think I called it naive open source in 20 years' time. So what that means is if we just continue to open-source absolutely everything for every new generation of frontier models, then it's quite likely that we're going to see a rapid proliferation of power. These are state-like powers which enable small groups of actors or maybe even individuals to have an unprecedented one-to-many impact in the world. Just as the last wave of social media anyone to have broadcast powers, anybody to essentially function as an entire newspaper from the 90s, by the 2000s you could have millions of followers on Twitter or Instagram or whatever, and you're
Starting point is 00:14:12 really influencing the world. In a way that was previously the preserve of a publisher that in most cases was licensed and regulated, that was an authority that could be held accountable if it did something really egregious. And all of that has now kind of fallen away. For good reasons, by the way, and in some cases with bad consequences, we're going to see the same trajectory with respect to access to the ability to influence the world. You can think of it as related to my modern Turing test that I proposed around artificial, capable AI. Like machines that go from being evaluated on the basis of what they say, you know, the imitation test of the original Turing test, to evaluating machines on the basis of what they can do. Can they use APIs? How persuasive are they of other humans?
Starting point is 00:14:48 Can they interact with other AIs to get them to do things? So if everybody gets that power, that starts to look like individuals having the power of organizations or even states. I'm talking about models that are two or three or maybe four orders of magnitude on from where we are. And we're not far away from that. We're going to be training models that are 1,000x larger than they currently are in the next three years. Even at inflection with the compute that we have, will be 100x larger than the current frontier models in the next 18 months. Although I took a lot of heat on the open source thing, I clearly wasn't talking about today's models. I was talking about future generations. And I still think it's right and I stand by that. Because I think that if we don't have the conversation,
Starting point is 00:15:21 then we end up basically putting massively chaotic destabilizing tools in the hands of absolutely everybody. How you do that in practice, somebody referred to it like trying to catch rain water or trying to stop rain by catching it in your hands, which I think is a very good rebuttal. It's absolutely spot on. Of course, this is insanely hard. I'm not saying that it's not difficult. I'm saying that it's the conversation we need to be having. Now, interestingly, another former Googler, Eric Schmidt shared similar concerns recently on CNN. He said, discussing his belief that within five years AI will start self-improving entirely on their own, Schmidt said, that's a very, very big change in history. Until now, the tools we've built
Starting point is 00:15:55 have been under our control. What really worries me is diffusion from the very, very powerful models to the next tier open source models. You're building a system that is open source so anyone can get access to it, but you don't know what it can do. What happens if it builds a pathogen, which gets in the hands of an Osama bin Laden type person, and that pathogen can kill a million people. So you say, no problem, we'll put guardrails on it, alignment, to prevent it from being misused. But if you open source it and I'm evil, I can strip the restraints off. I'm concerned the AIs will have polymathic capabilities to allow someone who doesn't have a PhD in biology and is evil to really harm people. Imagine if one of these things learns how to get access to weapons. Open source AI will be too
Starting point is 00:16:32 dangerous, too powerful to be unmonitored. Now, I think that the problem with the state of this open source conversation right now is that on the one hand, you have a fundamental assumption, a nigh unchallengeable assumption, that the world is better off if all technology is open source. That open source is a bulwark against the concentration of power. It's understandable where, this perspective comes from, because in many cases it is, and it has been. But to folks who are concerned about these issues, it can seem like a blind, unconsidered assumption that open source is right and open source is good a priority, no matter what the context. On the flip side, however, people who are broadly in support of open source models and who think that they
Starting point is 00:17:11 are part of the answer to the problems of AI are concerned that the safety folks, for their part, fail to recognize the threat of concentration of power when it comes to these incredibly advanced systems. Now, adding an additional dimension on top of all of this is the fact that anyone who is in a big tech company setting is assumed to be just looking who is against open sourcing of advanced models, is assumed to be simply about regulatory capture and pulling up the ladder, and taking that position only because they want to be the ones to economically benefit from AI rather than sharing the benefits far and wide. Now, I think many of you smart, nuanced, thoughtful, non-biased thinkers will see points to recommend all of these positions.
Starting point is 00:17:50 And so the question, of course, is how we reconcile them, how we sort it out. One path forward that I see is in getting more granular. In that interview with 80,000 hours, Mustafa Sullyman says, I clearly wasn't talking about today's model. I was talking about future generations. But that wasn't clear to people. It wasn't clear at all. And what we might need is to get that type of clarity and stop talking vaguely about, for example, future generations and start trying to answer collectively questions like, what would be too powerful to release openly? What are the capacities that are too risky to be open source? And what does that say about the development of those models in general? Of course, you very quickly get to a conversation about how close source models are
Starting point is 00:18:30 developed and released as well, which is, of course, a part of the conversation that we need to have. The point is, and you will hear me harken back to this fairly frequently, good conversations about AI policy and just about societal norms and expectations around this new technology are going to come from moving from the very general to the much more specific. I think we are now at that time where instead of just vaguely pointing to futures, it behooves us to get specific about the implications of different capabilities and futures that we find concerning or not, and try to act accordingly based on that. Because as Falcon 180B once again reminds us, even when things feel quiet and calm and low ebb, they are moving faster than just about anyone would have imagined
Starting point is 00:19:12 even just a couple years ago. So friends, lots of food for thought. But for now, given that we aren't at those advanced models, I think it's pretty exciting to see the development of Falcon 180B, and I'm really excited to see what people build on it. If you want to come proffer a theory for at what level of capability models should no longer be open-sourced, or alternatively, an argument for why at any level of capability the world is better if they are released as open source, come join the breakers' discord.
Starting point is 00:19:39 Bit.ly slash AI breakdown. I'll see you there. Until next time, guys. Peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.