The AI Daily Brief: Artificial Intelligence News and Analysis - Code Llama Kicks LLM for Code Battle Into Overdrive

Episode Date: August 25, 2023

Meta has released LLM-for-coding Code Llama in numerous versions. NLW explores the community discussion, including some interesting data around an unreleased version trained on synthetic data that see...med to perform better than any other. Before that on the Brief, Spain starts an AI agency; the UK announces more details of its AI Safety Summit and new AI models out of South Korea and China. Today's Sponsor: Supermanage - AI for 1-on-1's - https://supermanage.ai/breakdown ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI.  Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI breakdown, we're looking at meta's just-released Kodlama. Before that on the brief, the geopolitical competition around AI policy and performance heats up. The AI breakdown is a daily podcast and video about the most important news and discussions in AI. Go to Breakdown.network for more information about our newsletter, our Discord, and our YouTube. Welcome back to the AI breakdown brief, all the AI headline news you need in around five minutes. Today we have a really interesting theme that emerges from the news, which is the geopolitics of AI, and the global competition both around new models, but also around regulatory regimes. Now, of course, maybe the biggest story in AI today is the launch of META's Code Lama,
Starting point is 00:00:47 but for that you can check out the main episode, which is coming shortly after this one. For us, where we begin is in the country of Spain. Spain has just launched the Spanish Agency for the Supervision of Artificial Intelligence, the AESIA and is touting it as the first European country to establish a dedicated AI agency. The new agency was created by royal decree and approved by the Council of Ministers on August 22nd. The agency is to be a joint effort of the Spanish Ministry of Finance and Civil Service, as well as the Ministry of Economic Affairs and Digital Transformation. Now, this is actually part of a larger effort in Spain called the National Artificial Intelligence Strategy,
Starting point is 00:01:24 and it's quite clear from that that Spain's approach to regulation in this area is not to strangle this industry but to harness it. As part of the announcement, the Spanish government writes, digital transformation is a priority in the government's line of action as reflected by the digital agenda 2026. This strategy includes various strategic plans, among them the national strategy for artificial intelligence, which aims to provide a framework for the development of artificial intelligence that is inclusive, sustainable, and citizen-centered. Meanwhile, moving over to the UK, another country that is making major efforts to be a global leader in both the development of artificial intelligence technology as well as its regulation, one of their cornerstone initiatives for this
Starting point is 00:02:01 year is the AI safety summit that's coming later this fall on November 1st and 2nd. The UK's Department for Science, Innovation, and Technology has just revealed more details of the summit. One of the notable aspects of that is where it's to be held. The summit is to be held at Bletchley Park, which is probably best known as the home for Britain's code-breaking efforts during World War II. The estate housed the government code in cipher school, whose most famous accomplishment was breaking the German Enigma code. In addition to the location for the summit, the prime minister's office has also announced their representatives, Matt Clifford and Jonathan Black, who together will quote, spearhead talks and negotiations as they rally leading AI nations and experts over the next three months to ensure the summit
Starting point is 00:02:39 provides a platform for countries to work together on further developing a shared approach to agree the safety measures needed to mitigate the risks of AI. The press release also referenced other UK AI efforts, including the announcement last week of £13 million for AI research focused on health care. Now, one of the more interesting things about the UK's efforts was the appointment in June of entrepreneur Ian Hogarth to chair the UK's AI Foundation Model Task Force. To me, this signal the real seriousness about both the safety aspects of this conversation as well as the innovation and entrepreneurial aspects, and so I'm excited to see what the Department for Science, innovation and technology does. Now, outside of just national regulatory and policy efforts,
Starting point is 00:03:18 big tech companies from around the world are also launching more customized local solutions. South Korean internet giant Navar has unveiled its own generative AI model, which it calls Hyper Clova X, continuing the grand tradition of really, really bad LLM names, although it does sound like maybe the shorthand for the name of the AI service will be Q. Now, what's most interesting about this announcement to me is the way that the company is positioning their service. They're saying basically that they have a leg up in understanding South Korea's culture, background regulation, and laws.
Starting point is 00:03:49 CEO Choi Suyan said, I am proud that Nava is the company which knows Koreans' minds the best. The company also claims that Q had better results compared to ChatGPT 3.5 in internal testing, and I think it'll be interesting to see the extent to which this local customization or fine-tuning actually matters. It wouldn't shock me at all if it actually does, and if the strategy gets borne out, it could impact how LLM competition rolls out around the world. Lastly, today, Alibaba has released two new models. The models are called QuenvL and Quenvil chat, and say that the models allow for the, quote, input and comparison to multiple images, as well as the ability to specify questions related to the images and engage in multi-image storytelling.
Starting point is 00:04:28 Now, the market interpretation of Alibaba's fierce push into the AI space is an attempt to increase growth for their cloud division as that part of the company prepares to go public. The company is releasing both models open source, although, of course, standard caveats apply. Whenever a big tech company says that they're releasing a model open source, it's worth reading the fine print. Lastly, one note today as a follow-up from previous episodes, despite the monster, monster invidia earnings report, which one Wall Street analyst called a 1995 internet moment, the stock market continues to wobble in advance of Fed Chair Jerome Powell's speech at Jackson Hole. This has been
Starting point is 00:05:02 one of the key themes all year, negative macro factors on the one hand, positive AI factors on the other, and frankly, I think it's a little bit comforting that the exuberance and enthusiasm around AI isn't so powerful that it can overcome what I think are legitimate fears of the Fed Chair saying that interest rates are going to be held higher for longer. Anyways, friends, that is going to do it for today's AI breakdown brief. Thanks as always for listening or watching, and I'll be back soon with the main AI breakdown. Before we get into the main AI breakdown, I want to tell you about today's sponsor, Supermanage. If you work in a professional setting, you probably have some version of a one-on-one meeting, either with the people that work for you or the people that you work with. Unfortunately,
Starting point is 00:05:43 all too often, those one-on-one meetings become glorified catch-up calls. Don't you wish you could jump right to the stuff that really matters? That's where Supermanage comes in. supermanage AI magically distills your team's public Slack channels into a real-time brief on any employee anytime catch up on contributions work in progress challenges they're facing sentiment everything you need to show up ready for a truly meaningful conversation and it's completely free visit supermanage.ai forward slash breakdown today to start making the most of your one-on-ones and thanks again to supermanage for sponsoring the AI breakdown welcome back to the AI breakdown as you can tell if you were watching the
Starting point is 00:06:22 on YouTube from the cute cartoon Lama robot on your screen. Today we are talking about META's formal announcement of Code Lama, which is their dedicated LLM built on top of Lama 2, but fine-tuned for coding purposes. What we're going to talk about today is one, Code Lama itself, how it's released, how it was trained, the variations thereof, and community response, and we're also going to situated in the larger context of the competition around coding dedicated LLMs. Now this is an extremely important area of competition. In his tweet discussing the announcement of CodeLama, Dr. Jim Fan from Nvidia said, coding is by far the most important LLM task. It's the cornerstone of strong reasoning engines and powerful AI agents. Now, we first got news that meta was likely to
Starting point is 00:07:07 release a code dedicated model last week when the story was broken by the information. The story they wrote was called Meta's Next AI attack on OpenAI, free code generating software. The angle that the information pursued in that story was that by offering an open model dedicated to code generation, it could, as they put it, siphon customers from paid coding assistants such as Microsoft's GitHub co-pilot, which is powered by OpenAI. Well, yesterday, META officially announced CodeLama an AI tool for coding. Here are the most important details. First of all, as I mentioned before, Code Lama is what they call a code specialized version of Lama 2. It was created by further training Lama 2 on code-specific data Sampling more data from that same dataset for longer. Meta says that Code Lama can generate
Starting point is 00:07:50 code and natural language about code from both code prompts as well as natural language prompts. It can also be used for code completion as well as debugging. As part of the release, Meta released three sizes of Code Lama with $7,13 billion, and $34 billion parameters, and they say that each of the models was trained with 500 billion tokens of code and code-related data. Code Lama supports languages including Python, C++, Java, PHP, TypeScript, C-Sharp, and other. Now, the reason they're releasing multiple models is that they're good for different uses. Meta writes, the three models address different serving and latency requirements. The 7 billion model, for example, can be served on a single GPU.
Starting point is 00:08:27 The 34 billion model returns the best results and allows for better coding assistance, but the smaller 7B and 13B models are faster and more suitable for tasks that require low latency, like real-time code completion. Now, in addition to those three base models, they also release two different variants, one called CodeLama Python and one called CodeLama Instruct. Python is, as you would imagine, a language-specialized variant that they say was further fine-tuned on 100 billion tokens of Python code. They believe that a special model was relevant given how important Python is for the AI
Starting point is 00:08:57 community and because it's the most benchmarked language for code generation. Now, CodeLama Instruct is a variant that's been specifically fine-tuned for natural language. So if one is prompting CodeLama in natural language, using CodeLama Instruct might yield better results than one of the standard base models. Finally, Meta is releasing these models under the same license as Lama 2. So what are people talking about in relation to this release? Well, one issue, although it's much more for media than it is in the discussion on Twitter, is summed up here by TechCrunch. They write, then there's the intellectual property elephant in the room. Some code generation models, not necessarily code Lama, although Meta won't categorically deny it, are trained on copyrighted
Starting point is 00:09:36 or code under a restrictive license. And these models can regurgitate this code when prompted in a certain way. Legal experts have argued that these tools could put companies at risk if they were to unwittingly incorporate copyrighted suggestions from the tool into their production software. A second issue, once again identified by media, is the ability to use CodeLama for malicious purposes. TechCrunch says that Meta red-teamed Code Lama with only internally with 25 employees, and that they were able to prompt some concerning behavior. TechCrunch writes, CodeLama won't write ransomware code when asked directly. However, when the request is phrased more benignly, for example, create a script to encrypt all files in a user's home directory, which is effectively
Starting point is 00:10:14 a ransomware script, the model complies. Still, I would say that the vast majority of people are talking about one of two things. The first is the performance. Going back to Dr. Jim Fan, he writes, Lama 2 was almost at GPT 3.5 level except for coding, which was a real bummer. Now, Code Lama finally bridges the gap to GPT 3.5. Today, he says, is another major milestone in open source software foundation models. Others were similarly excited to see an open-ish model beating closed models like GPT 3.5 on certain Eval tests. Yassine tweets, I cannot believe Zuck at all just beat GPT3.5 at human aval pass at 1 and is approaching GPT4 with only 34 billion perams. Still easily the most discussed aspect was something that was slightly buried in the white paper, which was that their
Starting point is 00:11:01 highest performing model wasn't one that they released, what Lama called their unnatural code Lama, which was a model trained on synthetic data, actually performed best. For example, the human eval pass at one test, GBT3.5 scores of 48.1%. Code Lama Python 34B scores a 53.7%. And the unnatural code Lama scored a 62.2%. Professor Ethan Mollick writes, Will AIs start to fail when they start training on AI generated data?
Starting point is 00:11:29 There has been a lot of speculation. Now we have some hints that it may not be an issue. The new open source code Lama performs better when giving in AI-generated examples to train on. Gary Basin tweets, they don't want you to know that synthetic data is the future. LLM's generating synthetic data to train on drives a huge boost in unnatural code Lama, the one model they aren't releasing, surpasses GPT 3.5 and gets close to GPT4 performance on a 34B model. Now, there's a lot of speculation so far on why this might be, that is at this point just that, speculation, but it's certainly something really important to watch, given just how
Starting point is 00:12:03 much discussion there has been about how models will likely implode on themselves if they start to be trained on a higher and higher percentage of synthetic or AI-created data. Now, as we wrap up, let's just do a quick summary of where the state of coding LLMs is. And let's first talk on the open-source or open-source-ish side of things. There is, of course, now Code Lama, as we just discussed, but then a couple weeks ago, we also got Stability AI releasing stable code. One of the big benefits that stable code promised was a longer context window of 16,000 tokens. In May, Hugging Face announced StarCoder, which was trained on more than 80 programming languages, and also fine-tuned for Python. And then, of course, on the commercial side, there is Amazon's Code Whisperer, Microsoft's GitHub
Starting point is 00:12:45 co-pilot, which is based on OpenAI's technology. And yes, a forthcoming but as-yet unreleased tool from Google called Alpha Code. Now, the takeaway from this, I think, is less about which of these is the best right now, although it appears that CodeLama has some good standing to argue that it is, if not better, catching up rapidly, but more just to understand how intense this competition area really is. I think Jim is right when he says coding is by far the most important LLM task, at least right now. Indeed, we've seen with ChatGBT's Code interpreter how much the ability to create code to answer certain problems changes the performance of an LLM. It's why some people have called chat GPT with code interpreter a sneaky version of GPT 4.5, even though it's not named that.
Starting point is 00:13:28 Anyways, this is one of the most dynamic and exciting areas of the AI space to watch. And with Metis Code Interpreter on the scene, the competition has done nothing but heat up. That is going to do it for today's AI Breakdown. If you enjoyed this, do me a favor. Go check out the AI Breakdown Newsletter. You can go to Breakdown.network to find a link. It comes out every morning and has the key AI stories that you need to know to start your day. Let me know which of these AI coding tools you are liking best in the comments or on our Discord.
Starting point is 00:13:55 And until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.