The AI Daily Brief: Artificial Intelligence News and Analysis - Code Llama Kicks LLM for Code Battle Into Overdrive
Episode Date: August 25, 2023Meta has released LLM-for-coding Code Llama in numerous versions. NLW explores the community discussion, including some interesting data around an unreleased version trained on synthetic data that see...med to perform better than any other. Before that on the Brief, Spain starts an AI agency; the UK announces more details of its AI Safety Summit and new AI models out of South Korea and China. Today's Sponsor: Supermanage - AI for 1-on-1's - https://supermanage.ai/breakdown ABOUT THE AI BREAKDOWN The AI Breakdown helps you understand the most important news and discussions in AI. Subscribe to The AI Breakdown newsletter: https://theaibreakdown.beehiiv.com/subscribe Subscribe to The AI Breakdown on YouTube: https://www.youtube.com/@TheAIBreakdown Join the community: bit.ly/aibreakdown Learn more: http://breakdown.network/
Transcript
Discussion (0)
Today on the AI breakdown, we're looking at meta's just-released Kodlama.
Before that on the brief, the geopolitical competition around AI policy and performance heats up.
The AI breakdown is a daily podcast and video about the most important news and discussions in AI.
Go to Breakdown.network for more information about our newsletter, our Discord, and our YouTube.
Welcome back to the AI breakdown brief, all the AI headline news you need in around five minutes.
Today we have a really interesting theme that emerges from the news, which is the geopolitics of AI,
and the global competition both around new models, but also around regulatory regimes.
Now, of course, maybe the biggest story in AI today is the launch of META's Code Lama,
but for that you can check out the main episode, which is coming shortly after this one.
For us, where we begin is in the country of Spain.
Spain has just launched the Spanish Agency for the Supervision of Artificial Intelligence,
the AESIA and is touting it as the first European country to establish a dedicated AI agency.
The new agency was created by royal decree and approved by the Council of Ministers on August 22nd.
The agency is to be a joint effort of the Spanish Ministry of Finance and Civil Service,
as well as the Ministry of Economic Affairs and Digital Transformation.
Now, this is actually part of a larger effort in Spain called the National Artificial Intelligence Strategy,
and it's quite clear from that that Spain's approach to regulation in this area is not to
strangle this industry but to harness it. As part of the announcement, the Spanish government
writes, digital transformation is a priority in the government's line of action as reflected by the
digital agenda 2026. This strategy includes various strategic plans, among them the national strategy
for artificial intelligence, which aims to provide a framework for the development of artificial
intelligence that is inclusive, sustainable, and citizen-centered. Meanwhile, moving over to the
UK, another country that is making major efforts to be a global leader in both the development of
artificial intelligence technology as well as its regulation, one of their cornerstone initiatives for this
year is the AI safety summit that's coming later this fall on November 1st and 2nd. The UK's Department
for Science, Innovation, and Technology has just revealed more details of the summit. One of the notable
aspects of that is where it's to be held. The summit is to be held at Bletchley Park, which is probably
best known as the home for Britain's code-breaking efforts during World War II. The estate housed
the government code in cipher school, whose most famous accomplishment was breaking the German Enigma
code. In addition to the location for the summit, the prime minister's office has also announced
their representatives, Matt Clifford and Jonathan Black, who together will quote, spearhead talks and
negotiations as they rally leading AI nations and experts over the next three months to ensure the summit
provides a platform for countries to work together on further developing a shared approach to agree
the safety measures needed to mitigate the risks of AI. The press release also referenced
other UK AI efforts, including the announcement last week of £13 million for AI research focused
on health care. Now, one of the more interesting things about the UK's efforts was the appointment
in June of entrepreneur Ian Hogarth to chair the UK's AI Foundation Model Task Force. To me, this
signal the real seriousness about both the safety aspects of this conversation as well as the
innovation and entrepreneurial aspects, and so I'm excited to see what the Department for Science,
innovation and technology does. Now, outside of just national regulatory and policy efforts,
big tech companies from around the world are also launching more customized local solutions.
South Korean internet giant Navar has unveiled its own generative AI model, which it calls
Hyper Clova X, continuing the grand tradition of really, really bad LLM names, although it does
sound like maybe the shorthand for the name of the AI service will be Q.
Now, what's most interesting about this announcement to me is the way that the company
is positioning their service.
They're saying basically that they have a leg up in understanding South Korea's culture, background
regulation, and laws.
CEO Choi Suyan said, I am proud that Nava is the company which knows Koreans' minds the
best. The company also claims that Q had better results compared to ChatGPT 3.5 in internal testing,
and I think it'll be interesting to see the extent to which this local customization or fine-tuning
actually matters. It wouldn't shock me at all if it actually does, and if the strategy gets borne out,
it could impact how LLM competition rolls out around the world. Lastly, today, Alibaba has released
two new models. The models are called QuenvL and Quenvil chat, and say that the models allow for
the, quote, input and comparison to multiple images, as well as the
ability to specify questions related to the images and engage in multi-image storytelling.
Now, the market interpretation of Alibaba's fierce push into the AI space is an attempt to
increase growth for their cloud division as that part of the company prepares to go public.
The company is releasing both models open source, although, of course, standard caveats apply.
Whenever a big tech company says that they're releasing a model open source, it's worth
reading the fine print.
Lastly, one note today as a follow-up from previous episodes, despite the monster, monster
invidia earnings report, which one Wall Street analyst called a 1995 internet moment, the stock market
continues to wobble in advance of Fed Chair Jerome Powell's speech at Jackson Hole. This has been
one of the key themes all year, negative macro factors on the one hand, positive AI factors on the other,
and frankly, I think it's a little bit comforting that the exuberance and enthusiasm around AI
isn't so powerful that it can overcome what I think are legitimate fears of the Fed Chair saying that
interest rates are going to be held higher for longer. Anyways, friends, that is going to do it for
today's AI breakdown brief. Thanks as always for listening or watching, and I'll be back soon with
the main AI breakdown. Before we get into the main AI breakdown, I want to tell you about today's
sponsor, Supermanage. If you work in a professional setting, you probably have some version of a one-on-one
meeting, either with the people that work for you or the people that you work with. Unfortunately,
all too often, those one-on-one meetings become glorified catch-up calls. Don't you wish you could
jump right to the stuff that really matters? That's where Supermanage comes in.
supermanage AI magically distills your team's public Slack channels into a real-time brief on
any employee anytime catch up on contributions work in progress challenges they're facing sentiment
everything you need to show up ready for a truly meaningful conversation and it's completely free
visit supermanage.ai forward slash breakdown today to start making the most of your one-on-ones
and thanks again to supermanage for sponsoring the AI breakdown
welcome back to the AI breakdown as you can tell if you were watching the
on YouTube from the cute cartoon Lama robot on your screen. Today we are talking about META's formal
announcement of Code Lama, which is their dedicated LLM built on top of Lama 2, but fine-tuned for
coding purposes. What we're going to talk about today is one, Code Lama itself, how it's released,
how it was trained, the variations thereof, and community response, and we're also going to
situated in the larger context of the competition around coding dedicated LLMs. Now this is an
extremely important area of competition. In his tweet discussing the announcement of CodeLama,
Dr. Jim Fan from Nvidia said, coding is by far the most important LLM task. It's the cornerstone of
strong reasoning engines and powerful AI agents. Now, we first got news that meta was likely to
release a code dedicated model last week when the story was broken by the information. The story they wrote
was called Meta's Next AI attack on OpenAI, free code generating software. The angle that the
information pursued in that story was that by offering an open model dedicated to code generation,
it could, as they put it, siphon customers from paid coding assistants such as Microsoft's GitHub
co-pilot, which is powered by OpenAI. Well, yesterday, META officially announced CodeLama an AI tool for coding.
Here are the most important details. First of all, as I mentioned before, Code Lama is what they call
a code specialized version of Lama 2. It was created by further training Lama 2 on code-specific data
Sampling more data from that same dataset for longer. Meta says that Code Lama can generate
code and natural language about code from both code prompts as well as natural language prompts.
It can also be used for code completion as well as debugging. As part of the release,
Meta released three sizes of Code Lama with $7,13 billion, and $34 billion parameters,
and they say that each of the models was trained with 500 billion tokens of code and code-related data.
Code Lama supports languages including Python, C++, Java, PHP, TypeScript, C-Sharp, and other.
Now, the reason they're releasing multiple models is that they're good for different uses.
Meta writes, the three models address different serving and latency requirements.
The 7 billion model, for example, can be served on a single GPU.
The 34 billion model returns the best results and allows for better coding assistance,
but the smaller 7B and 13B models are faster and more suitable for tasks that require low latency,
like real-time code completion.
Now, in addition to those three base models, they also release two different variants,
one called CodeLama Python and one called CodeLama Instruct.
Python is, as you would imagine, a language-specialized variant that they say was further
fine-tuned on 100 billion tokens of Python code.
They believe that a special model was relevant given how important Python is for the AI
community and because it's the most benchmarked language for code generation.
Now, CodeLama Instruct is a variant that's been specifically fine-tuned for natural language.
So if one is prompting CodeLama in natural language, using CodeLama Instruct might yield better
results than one of the standard base models. Finally, Meta is releasing these models under the same
license as Lama 2. So what are people talking about in relation to this release? Well, one issue,
although it's much more for media than it is in the discussion on Twitter, is summed up here by TechCrunch.
They write, then there's the intellectual property elephant in the room. Some code generation models,
not necessarily code Lama, although Meta won't categorically deny it, are trained on copyrighted
or code under a restrictive license. And these models can regurgitate this code when prompted
in a certain way. Legal experts have argued that these tools could put companies at risk if they
were to unwittingly incorporate copyrighted suggestions from the tool into their production software.
A second issue, once again identified by media, is the ability to use CodeLama for malicious
purposes. TechCrunch says that Meta red-teamed Code Lama with only internally with 25 employees,
and that they were able to prompt some concerning behavior. TechCrunch writes,
CodeLama won't write ransomware code when asked directly. However, when the request is phrased more benignly,
for example, create a script to encrypt all files in a user's home directory, which is effectively
a ransomware script, the model complies. Still, I would say that the vast majority of people are talking
about one of two things. The first is the performance. Going back to Dr. Jim Fan, he writes,
Lama 2 was almost at GPT 3.5 level except for coding, which was a real bummer. Now, Code Lama finally
bridges the gap to GPT 3.5. Today, he says, is another major milestone in open source software
foundation models. Others were similarly excited to see an open-ish model beating closed models like
GPT 3.5 on certain Eval tests. Yassine tweets, I cannot believe Zuck at all just beat GPT3.5 at
human aval pass at 1 and is approaching GPT4 with only 34 billion perams. Still easily the most
discussed aspect was something that was slightly buried in the white paper, which was that their
highest performing model wasn't one that they released, what Lama called their unnatural code Lama,
which was a model trained on synthetic data, actually performed best.
For example, the human eval pass at one test,
GBT3.5 scores of 48.1%.
Code Lama Python 34B scores a 53.7%.
And the unnatural code Lama scored a 62.2%.
Professor Ethan Mollick writes,
Will AIs start to fail when they start training on AI generated data?
There has been a lot of speculation.
Now we have some hints that it may not be an issue.
The new open source code Lama performs better when giving
in AI-generated examples to train on. Gary Basin tweets, they don't want you to know that synthetic
data is the future. LLM's generating synthetic data to train on drives a huge boost in unnatural
code Lama, the one model they aren't releasing, surpasses GPT 3.5 and gets close to GPT4 performance
on a 34B model. Now, there's a lot of speculation so far on why this might be, that is at this
point just that, speculation, but it's certainly something really important to watch, given just how
much discussion there has been about how models will likely implode on themselves if they start
to be trained on a higher and higher percentage of synthetic or AI-created data. Now, as we wrap up,
let's just do a quick summary of where the state of coding LLMs is. And let's first talk on the
open-source or open-source-ish side of things. There is, of course, now Code Lama, as we just discussed,
but then a couple weeks ago, we also got Stability AI releasing stable code. One of the big benefits
that stable code promised was a longer context window of 16,000 tokens. In May, Hugging Face announced
StarCoder, which was trained on more than 80 programming languages, and also fine-tuned for Python.
And then, of course, on the commercial side, there is Amazon's Code Whisperer, Microsoft's GitHub
co-pilot, which is based on OpenAI's technology. And yes, a forthcoming but as-yet unreleased tool
from Google called Alpha Code. Now, the takeaway from this, I think, is less about which of these
is the best right now, although it appears that CodeLama has some good standing to argue that it is,
if not better, catching up rapidly, but more just to understand how intense this competition area
really is. I think Jim is right when he says coding is by far the most important LLM task, at least right now.
Indeed, we've seen with ChatGBT's Code interpreter how much the ability to create code
to answer certain problems changes the performance of an LLM. It's why some people have called
chat GPT with code interpreter a sneaky version of GPT 4.5, even though it's not named that.
Anyways, this is one of the most dynamic and exciting areas of the AI space to watch.
And with Metis Code Interpreter on the scene, the competition has done nothing but heat up.
That is going to do it for today's AI Breakdown.
If you enjoyed this, do me a favor.
Go check out the AI Breakdown Newsletter.
You can go to Breakdown.network to find a link.
It comes out every morning and has the key AI stories that you need to know to start your day.
Let me know which of these AI coding tools you are liking best in the comments or on our Discord.
And until next time, peace.
