The AI Daily Brief: Artificial Intelligence News and Analysis - Is Open Source AI Falling Behind?

Starting point is 00:00:00 The day on the AI Daily Brief, met as LamaCon and is open source falling behind? Before then in the headlines, up to 30% of Microsoft's code has now been written by AI. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. Thanks to today's sponsors, Vanta and Superintelligent, and for an ad-free version of the show, go to patreon.com slash AI Daily Brief. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. Well, friends, it turns out that AI coding is not just. for the vibe coders. At Meta's LamaCon event, which will be the topic of our main episode today,

Starting point is 00:00:37 Microsoft CEO Satcha Nadella, made a crossover appearance in a fireside chat with Meta CEO Mark Zuckerberg. One of the more interesting topics was the takeover of AI code in big tech. Nadella said that between 20 and 30 percent of the code in Microsoft's repositories was generated by AI. In other words, he's saying that this is not just a significant portion of the new code being written, but that AI generated code is now a big part of the overall code base. He also got a little detailed, which was interesting. He mentioned that the company was seeing mixed results across different languages, with the strongest performance in Python, and less progress being made with C++.

Starting point is 00:01:12 Throwing the question back at Zuck, the meta-CEO said that he didn't know how much of the company's code was being generated by AI, but aims for it to get to 50% by the end of next year. You might remember that late last year, Google CEO Sundar Pichai said that his company was using AI to generate 25% of their code, but earlier this month he actually updated that, stating that it's now, quote, well over 30%. Next up today, OpenAI has apparently fixed GPT-4-0's personality, or at least attempted to, to make it less sycophantic. As we discussed on Monday's show, the personality of the default chat GPT model went

Starting point is 00:01:44 haywire over the weekend, leading it to agree with basically everything and overly compliment the user. We talked about all the various ways. That was bad, so check out that episode if you haven't heard it yet. But in any case, yesterday Sam Altman posted, we started rolling back the latest update to GPT40 last night. It's now 100% rolled back for free users and we'll update again when it's finished for paid users, hopefully later today.

Starting point is 00:02:05 We're working on additional fixes to model personality and we'll share more in the coming days. The company also published a post-mortem blog explaining, When shaping model behavior, we start with baseline principles and instructions outlined in our model spec. We also teach our models how to apply these principles by incorporating user signals like thumbs up, thumbs-down feedback on chat GPT responses. However, in this update, we focused too much on short-term feedback and did not fully account for how users' interactions with chat GPT evolve over time. As a result, GPT4O skewed towards responses that were overly supportive, but disingenuous.

Starting point is 00:02:38 OpenAI model designer Ada McLaughlin had previously commented, We originally launched with a system message that had unintended behavior effects, but found an antidote. Now, the post implied that most of the personality change was to do with a new system prompt rather than additional post training. Jailbreaker Pini the Liberator had, of course, found the hidden system prompt, giving us a look under the hood. The old malfunctioning prompt said, Over the course of the conversation, you adapt to the user's tone and preference, try to match the user's vibe tone and generally how they're speaking.

Starting point is 00:03:05 The new prompt inserted on Monday read, engage warmly yet honestly with the user, be direct, avoid ungrounded or sycophantic flattery, maintain professionalism and grounded honesty that best represents open AI and its values. When asked if he believed that this would fix the problem, Pliny said, the full scope of the problem runs much deeper for sure. It's a silly fix, but probably does give like 10 to 20% improvement

Starting point is 00:03:24 for that particular behavior. In their blog post, OpenAI committed to refining their training techniques and system prompts to steer away from sycophancy, but beyond that, we didn't get a ton of specifics. Overall, this is another reminder of how new and novel these technologies are and how little changes can make big differences. Lastly today, Duo Lingo is the latest company going AI first. In an all-hands email, CEO Louis von On wrote, AI is already changing how work gets done.

Starting point is 00:03:51 It's not a question of if or when it's happening now. When there's a shift this big, the worst thing you can do is wait. In 2012, we bet big on mobile. While others were focused on mobile companion apps for websites, we decided to build mobile first because we saw it was the future. Betting on mobile made all the difference. We're making a similar call now, and at this time, the platform shift is AI. Von On discussed how the company has already adopted AI to help automate their content

Starting point is 00:04:14 production process, commenting. To teach well, we need to create a massive amount of content and doing that manually doesn't scale. The company also recently introduced a video feature allowing users to chat with an AI avatar, car, a feature that, as the CEO pointed out, was impossible to build before. He continued, AI is not just a productivity boost. Being AI first means we'll need to rethink much of how we work. Making minor tweaks to systems designed for humans won't get us there. In many cases, we'll need to start from scratch. We're not going to rebuild everything overnight, and some things like getting AI to understand our code base will take time. However,

Starting point is 00:04:47 we can't wait until the technology is 100% perfect. We'd rather move with urgency and take occasional small hits on quality, then move slowly and miss the moment. Speaking to the practical changes of the company, Von Onrote, we'll gradually stop using contractors to do work that AI can handle. AI use will be part of what we're looking for in hiring. AI use will be part of what we evaluate in performance reviews. Headcount will only be given if a team cannot automate more of their work. Most functions will have specific initiatives to fundamentally change how they work. Now, the memo did include a caveat that the company still, quote, deeply cares about its employees and will provide training, mentorship, and tooling to support the transition. It said that the initiative is about,

Starting point is 00:05:25 quote, removing bottlenecks so we can do more with the outstanding employees we already have. We want you to focus on creative work and real problems, not repetitive tasks. Now, of course, the memo had clear echoes to the Shopify memo released earlier this month, which told the company that increased headcount would not be approved unless teams demonstrate that they cannot get what they want done using AI. AI advisor Ali K. Miller posted, First Shopify, now duolingo. If you're a digital native business, and haven't gotten the memo, here is the literal memo. Now, this is something we'll be talking about a lot more in the days to come, so I'll leave it there for now. But I think, and you will not be

Starting point is 00:06:01 surprised that I think this, that this is the beginning of a trend. For now, that's going to do it for today's AI Daily Brief Headlines edition. Next up, the main episode. Today's episode is brought to you by Vanta. Vanta is a trust management platform that helps businesses automate security and compliance, enabling them to demonstrate strong security practices and scale. In today's business, businesses can't just claim security, they have to prove it. Achieving compliance with a framework like SOC2, ISO-27-01, HIPAA, GDPR, and more is how businesses can demonstrate strong security practices. And we see how much this matters every time we connect enterprises with agent services providers at Superintelligent. Many of these compliance frameworks are simply not negotiable for enterprises.

Starting point is 00:06:43 The problem is that navigating security and compliance is time-consuming and complicated. It can take months of work and use up valuable time and resources. Vanta makes it easy and faster by automating compliance across 35 plus frameworks. It gets you audit ready in weeks instead of months and saves you up to 85% of associated costs. In fact, a recent IDC White Paper found that Vanta customers achieve $535,000 per year in benefits, and the platform pays for itself in just three months. The proof is in the numbers. More than 10,000 global companies trust Vanta, including Atlassian, Kora and more. For a limited time, listeners get $1,000 off at vanta.com slash nLW. That's va-n-ta.com for $1,000 off. Today's episode is brought to you by Superintelligent, and I am very excited

Starting point is 00:07:30 today to tell you about our consultant partner program. The new Superintelligent is a platform that helps enterprises figure out which agents to adopt, and then with our marketplace, go and find the partners that can help them actually build by customize and deploy those agents. At the key of that experience is what we call our agent readiness audits. We deploy a set of voice agents which can interview people across your team to uncover where agents are going to be most effective in driving real business value. From there, we make a set of recommendations which can turn into RFPs on the marketplace or other sort of change management activities that help get you ready for the new agent powered economy. We are finding a ton of success right now with consultants

Starting point is 00:08:08 bringing the agent readiness audits to their client as a way to help them move down the funnel towards agent deployments, with the consultant playing the role of helping their client hone in on the right opportunities based on what we've recommended and helping manage the partner selection process. Basically, the audits are dramatically reducing the time to discovery for our consulting partners, and that's something we're really excited to see. If you run a firm and have clients who might be a good fit for the agent readiness audit, reach out to agent at besuper.aI with consultant in the title, and we'll get right back to you with more on the consultant partner program. Again, that's Agent at B-Super.A.I.

Starting point is 00:08:42 and put the word consultant in the subject line. Welcome back to the AI Daily Brief. Today we were talking about Meta's big developer conference, LamaCon, everything that they announced, what people were excited about. We're going to do a little bit of a review of Zuckerberg's whistlestop tour of media, but kind of crouching behind all of this are some lurking questions, both for meta and for open source. And I think to kick us off, it's important to go back and give a little bit of context.

Starting point is 00:09:08 Now, Meta has firmly planted its flag as the big tech company who is most wrapped up its future in the triumph of open source AI as opposed to closed source models. This was, for many, an unexpected turn from Zuckerberg, and there are plenty of people who feel like it was largely opportunistic. But at the same time, for those who have been watching for a long time, Mark Zuckerberg really did have a conversion sort of experience when Apple almost killed their business with changes to the way that the iPhone model worked. And so the open source push is more philosophically coherent than one might think.

Starting point is 00:09:38 Whatever the motivation was, it was certainly working. Throughout a lot of 2023, one of the big freakouts from Google was that meta's developer ecosystem was beating them and OpenAI. It also felt like throughout 2024, open source was getting ever closer to the performance of closed source models, really closing the gap. And yet, meta has had a rough run of it this year. First of all, back in January, as DeepSeek released its reasoning models, reports were that Meta started freaking out. We had lots of what appeared to be leaks from inside, with engineers reporting that the company was scrambling and assembling warrooms to try to reverse engineer how deep see could done what it had done with so few resources, and by and large, things just seemed in a state

Starting point is 00:10:16 of upheaval. Another moment of controversy for meta came after they released the Lama 4 family of models with people accusing them of effectively artificially boosting their benchmark scores and releasing a different prioritized model for some of the benchmark tests than the model they released to the public. We're not going to rehash that here. The point is just to say that meta wasn't coming into this Lama Khan riding the top of the wave. In some ways, they were fighting to get back on the horse a little bit. So first of all, let's talk about what was released at this event. Remember, we got the announcement of the new models about a month ago, so no one was expecting some big announcement on that front. A couple of the big headline reveals included first a native API for Lama.

Starting point is 00:10:56 The Lama API is now available in a limited preview and is paired with META's SDKs to allow developers to build on the model family. The company didn't reveal pricing but did boast of lightning fast Through a partnership with Cerebrus, meta claims that their API can run 18 times faster than the traditional GPU inference used by OpenAI. The comparison is even better when you consider DeepSeek's native API, which crawls along at less than 100th of that speed. Now, the API does what you'd expect, offering tools for fine-tuning and evaluation alongside serving the models for app integration.

Starting point is 00:11:27 It may be basic infrastructure, but it's still an important step that meta has begun to offer their own access points. The other big announcement and one that got even more consumer attention, at least, was the announcement of a standalone chatbot app for Lama models. Now, there's been no shortage of ways to access META's chatbots. They've been, of course, integrated into WhatsApp, Instagram, Facebook, Messenger. But having a standalone app brings META more into parity with their peers. We saw something similar from GROC who first released their tools

Starting point is 00:11:53 exclusively through Twitter slash X, but then spun out their own app as well. One interesting feature, which is perhaps not surprising coming for meta, is that the Lama app has a social feed. Users can elect to share their prompts and responses with their friends across Meta's ecosystem. Now, I don't think right now there's any sort of latent demand, quote-unquote, for this kind of feature. That said, Sam Altman has very publicly talked about the idea of potentially doing a social network from within chat GPT, and just in general, it is always surprising what sort of things people actually like sharing and discovering about their peers and friends. Meta's VP of product, Connor Hayes, said that the idea is to show people what they can do with AI.

Starting point is 00:12:33 Now, this is actually highly utilitarian. One of the things that we've seen for the last couple of years, vis-a-vis super-intelligent, is that a lot of the barriers to AI usage are people just not knowing what to use it for. With every other technology, the pattern has been that a tiny handful of use-case inventors and discoverers go out and figure out how to use a thing and then we all copy them. And yet, for a couple of years,

Starting point is 00:12:54 we kind of expected everyone to figure out how to use AI for themselves, which again just runs counter to the way that technology has rolled out in the past. Anyways, as for big announcements, those were definitely the highlights. There were a few more technical additions that might move the needle for some developers. In their blog post, for example, Meta highlighted the first of several infrastructure integrations they're calling Lama Stack. Meta said that they envisioned LamaStack is the industry standard for enterprises looking to seamlessly deploy production-grade turnkey AI solutions.

Starting point is 00:13:21 They also announced a set of security and moderation tools and developer grants, but overall it was fairly muted. When it came to people's response to this, TechCrunch argued that the entire conference was all about undercutting OpenAI. Daniel Campos wrote, Crazy LamaCon is happening and not a single thing from it is on the front page of Hacker News.

Starting point is 00:13:39 And for some, it's hard not to feel like at this stage, meta is pretty clearly behind. They're behind leaders OpenAI in Anthropic and the consumer encoding assistant markets, at least according to the benchmarks, their latest model has been overtaken by new open source releases out of China, and yet during his keynote,

Starting point is 00:13:55 Zuckerberg laid out what he sees as the next chapter of the AI race playing out like. He said, part of the value around open source is that you can mix and match. So if another model like Deepseek is better, or if Quinn is better at something, then as developers, you have the ability to take the best parts of the intelligence from different models and produce exactly what you need. This is part of how I think open source basically passes in quality all the closed source models. It feels like sort of an unstoppable force.

Starting point is 00:14:19 AI entrepreneur Ted Benson unpacked his takeaways, posting, the first LamaCon keynote just wrapped seconds ago, and I feel like I'm getting a sense of meta's AI strategy for the first time. They didn't say it directly, but you could hear it between the lines. Many had speculated Zuckerberg was pursuing a commoditize your competitor's approach, out of fear of being trapped as an app within yet another company's platform again. I don't think that's it. If AI and AR represent an entirely new computing paradigm, that new paradigm will require a new operating system, and that new operating system will require a host of standard utilities like GNU utilities were to Linux.

Starting point is 00:14:54 small, fine-tuned models, large stock models, real-time voice models, 3D understanding models, image segmentation models, scene generation models. Collectively, that sounds like a lot of the standard library for a completely different platform of AI and AR computing. The insistence that all Lama derivatives be prefixed with Lama Dash feels telling. The last 40 years we've been building a top GNU Linux, I think in five years, META wants us to all be building a top Lama slash something. And adding some credence to that was the fact that throughout the entire event,

Starting point is 00:15:23 and on his numerous podcast appearances, Zuckerberg wore the meta-ray bands. Now, taking a step back and moving away from meta to the broader question of where open source stands. It's important to remember that while DeepSeek R1 was a phenomenon, it wasn't because it outperformed things like OpenAIs 01 on the benchmarks, and indeed in performance terms, it was quickly buried by releases from all of the major AI labs. Why it had such resonance was that it was the first freely available reasoning model, the first time that consumers got their hand on reasoning in a free chat app and because of all the scuttlebutt around how cheaply they had trained it. In an appearance on the Duarkech podcast released alongside the conference, Dwar Keshe asked

Starting point is 00:16:03 Zuckerberg straight up about how he felt that Lama 4 Maverick is now ranked 35th on L.M Arena and is generally behind and overwhelming on most of the benchmarks. Duarkech said, there's an impression that the gap between the best closed source and open source models has increased over the last year. Zuckerberg responded, I actually think that this has been a very a good year for open source overall. The prediction that this would be the year where open source generally overtakes closed source as the most used models out there is generally on track to be true. Touching on the benchmark dominance of reasoning models, Zuckerberg said that the new paradigm of scaling test time compute is compelling and that a llama four reasoning model would be coming

Starting point is 00:16:37 soon. However, he added that, for a lot of the things that we care about, latency and good intelligence per cost are actually much more important product attributes. He also made the argument that benchmarks are gameable, especially when it comes to L.M. Arena, and said that tuning for benchmark performance had often led the company astray. He said, I think you just need to be a little careful with some of the benchmarks, and we're going to index primarily on the products. Now, if you look around, there continues to be plenty of skepticism of where meta is right now. Earlier in the month, Fortune, for example, published a piece called Some Insiders say Meta's AI Research Lab is dying a slow death. I'm not really sure. There's no doubt that open source competition is increasing, that the models out of China are putting

Starting point is 00:17:18 intense competitive pressure on Zuckerberg and everyone else who's thinking about open source. It is also the case that open source models have not surpassed the big close source models, especially as reasoning has become the dominant paradigm. I also do think, though, that Zuckerberg is playing an extremely long game here. I do not believe that he views winning as who has the most downloaded app on the Apple App Store charts. I think he views winning as who owns the infrastructure in the future, which is basically what Ted Benson was arguing in that post. There is no doubt that certain competitive pressures

Starting point is 00:17:49 may have forced Meta's timelines in ways that were a little uncomfortable and leave the appearance of being behind, but I am far from counting them out yet. But that at least is the story for now. Appreciate you guys listening or watching as always, and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - Is Open Source AI Falling Behind?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.