The AI Daily Brief: Artificial Intelligence News and Analysis - What GPT-4.5 Should Be Used For

Starting point is 00:00:00 Today on the AI Daily Brief, GBT 4.5 is here, and it's weird, but maybe kind of cool. Before that in the headlines, Stripe says that AI startups are growing much faster than SaaS companies ever did. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. We kick off today with some interesting information from Stripe. The payments processor is seeing a boom. in AI applications that dwarfs the growth of SaaS. They wrote,

Starting point is 00:00:38 We're seeing an AI boom on Stripe. Our 2024 data shows these startups are building business at a record pace. In their annual community letter, Stripe compared the growth of the top 100 AI startups last year to the top 100 SaaS startups in 2018, which was the peak of the SaaS boom. They found the AI companies are taking 24 months on average to hit 5 million in annualized revenue, outpacing their SaaS counterparts by more than a year. Stripe highlighted coding assistance as a particularly strong vertical. Curser was the standout product hitting a $100 million in recurring revenue over three years. But competitors, lovable and bolt, also reached around 20 million in recurring revenue in a matter of a few months.

Starting point is 00:01:12 Stripe wrote, much as SaaS started horizontal and then went vertical, we're seeing a similar dynamic play out in AI. We started with chat GPT, but are now seeing a proliferation of industry-specific tools. Some people they wrote have called these startups LLM wrappers. Those people are missing the point. The O-ring model in economics shows that in a process with interdependent tasks, the overall output or productivity is limited by the least effective component, not just in terms of cost, but in the success of the entire system. In a similar vein, we see these new industry-specific AI tools as ensuring that individual industries can properly realize the economic impact of LLMs, and that the contextual data and workflow integration will prove enduringly valuable. Lightspeed's Justin Overdorf, writes,

Starting point is 00:01:51 VCs have been talking about revenue acceleration in the AI economy. Stripe has the data to back it up. Bryce Bladen writes, this Stripe report is the first time I've seen real numbers tied to real revenue for AI companies and my lord. Outpacing 2018 SaaS and 66% of the time is wild. Next up, some news from meta, the company is planning to launch a standalone app for their AI assistant. According to CNBC reporting the Apple debut in the second quarter and is aimed at competing more directly with chat GPT. Meta will also test a paid subscription for meta AI for access to more powerful models and advanced features. The news comes after CEO Mark Zuckerberg announced an ambitious goal for the company. During January's earnings call, he said, this is going to be the year when a highly intelligent

Starting point is 00:02:31 and personalized AI assistant reaches more than a billion people, and I expect meta-AI to be that leading assistant. Now, currently, meta-a-I already claims 700 million monthly active users, but that is, of course, largely due to its integration across meta-social platforms. This will be a big test to see if meta-a-I can stand on its own as an individually useful product. Never wanted to miss an opportunity. Sam Altman retweeted the news and said, OK, fine, maybe we'll do a social app. Speaking of meta, the company has tapped Apollo Global Management for $35 billion in financing for their data center buildout.

Starting point is 00:03:02 Bloomberg sources claim talks are in an early stage, but the funding, if it goes through, would go a long way to financing the next stage of META's infrastructure plans. The company is committed to spending $65 billion in total CAPEX this year and is rumored to be exploring construction of a new $200 billion data center campus. Debt financing is also an emerging trend for big tech companies as they push data center spending into the trillions. Project Stargate is reportedly looking into a project financing model

Starting point is 00:03:25 that uses projected data center revenue as collateral. This funding technique is more typically used for oil, oil and gas projects, but is increasingly being used to finance data centers as well. For meta specifically, debt financing is a relatively new strategy. For most of its history, the company basically carried no debt as it pursued capital-light social media and advertising verticals. However, in 2022, Meta took on billions of dollars in debt in order to fund ambitious new AI infrastructure projects. The company has around $30 billion in outstanding debt as of the end of last year, so this rumored financing deal would double their liabilities.

Starting point is 00:03:56 Lastly today, an interesting one from the world of geopolitics and AI, Microsoft. has urged the Trump administration to wind back chip export controls aimed at limiting Chinese imports through third countries. In the final days of the Biden presidency, the administration introduced a new global framework called the AI diffusion rule. The rule applied different levels of restriction across three tiers of countries. Close U.S. allies remained unrestricted, while tough limits were applied to adversaries like China and Iran. The big change was volume limits in supply chain monitoring for tier two countries, which included India, Israel, and Switzerland. For the first time the new export restrictions applied to AI models, not just hardware. Microsoft is arguing that this

Starting point is 00:04:33 framework could backfire and push middle-tier countries into sourcing AI technology from China. Microsoft President Brad Smith wrote, the message is these countries can't rely on the U.S., but China is willing to provide what they need. That's not good for American business or American foreign policy. In his blog post, Smith discussed a recent trip to Poland for the groundbreaking ceremony on a 700 million data center project. He wrote, the irony could not be clearer. At the very moment when the Trump administration is pressing Europe to buy more American goods, the Biden diffusion rule leaves the leaders of partners like Poland asking why they've been regulated to tier two status and an uncertain ability to buy more American AI chips in the future.

Starting point is 00:05:08 Smith urged the Trump administration to simplify the, quote, overly complex rule and, quote, stop relegating American friends and allies into a second tier that undermines their confidence in ongoing access to American products. The Wall Street Journal writes, the request from Microsoft highlights the challenge Trump faces trying to enact pro-business policies while also looking tough on China. Interesting stuff out here in AI geopolitics, but for now, that is going to do it for the AI Daily Brief Headlines edition. Next up, the main episode. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your

Starting point is 00:05:47 commitment to security has never been more critical or more complex. That's where Vanta comes in. businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC2 and ISO-2101. Centralized security workflows, complete questionnaires up to 5X faster, and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back,

Starting point is 00:06:20 so you can focus on building your company. Join over 9,000 global companies like Atlassian, Quora, and Factory who use Vanta to manage risk improve security in real time. For a limited time, this audience gets $1,000 off Vanta at vanta.com slash NLW. That's VANTA.com slash NLW for $1,000 off. If there is one thing that's clear about AI in 2025, it's that the agents are coming. vertical agents by industry, horizontal agent platforms, agents per function. If you are running a large enterprise, you will be experimenting with agents next year.

Starting point is 00:06:59 And given how new this is, all of us are going to be back in pilot mode. That's why Super Intelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support, you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business. If you are interested in the agent readiness and opportunity audit, reach out directly to me, NLW at B-Supertaii. Put the word agent in

Starting point is 00:07:33 the subject line so I know what you're talking about, and let's have you be a leader in the most dynamic part of the AI market. Welcome back to the AI Daily Brief. Yesterday, right before I recorded our main episode, I noticed that OpenAI had tweeted that in 4.5 hours in an announcement was coming. Obviously, what they were referring to was GPT 4.5, and there wasn't all that much mystery about this, as the company had just a couple of weeks ago committed to being a little bit more transparent with their release plans. We know, for example, that after GPT 4.5 were getting GPT5, or the equivalent, which is a full hybridization of the reasoning model line, in other words, the O model line, as well as the GPT numbered line. There is a lot that

Starting point is 00:08:13 is very weird about this release, I will say. It's being released with some amount of fanfare, but it's not really focused on evaluations. It's explicitly and clearly behind the reasoning models that have come out on many performance metrics. And yet there's a certain something that also is interesting here. What's more, it's not even a full substitution for GBT40. For example, GBT 4.5 doesn't have voice mode. So what actually is the story of GBT 4.5? We are fully in the realm of vibes right now, and the vibes are about creativity and emotional intelligence. For example, in their announcement post, OpenAI writes, early testing shows that interacting with GBT 4.5 feels more natural. Its broader knowledge base, improved ability to follow

Starting point is 00:08:57 user intent and greater EQ, make it useful for tasks like improving writing, programming, and solving practical problems. While GBT 4.5 is the largest model that OpenAI has ever produced, the white paper also claims that GPT4.5 is not a frontier model. They write, it does not introduce net new frontier capabilities and its performance is below that of 01, 03 Mini, and D.E., research on most preparedness evaluations. At the same time, they said they consider the model to be, quote, at the forefront of what is possible in unsupervised learning. In other words, this is state of the art when it comes to pre-training and ultra-large datasets. This is one of the first models that I can remember that didn't just hammer us over the head with a bunch of evaluation and

Starting point is 00:09:36 performance benchmarks. There are some. For example, in their announcement posts, they point to a benefit of 4.5 being deeper world knowledge. They specifically look at the simple QA accuracy test where 4.5 outperforms 4.0. They also show a lower hallucination rate. Humans also preferred it to 4.0 for everyday queries, professional queries, and creative intelligence. And indeed, you can tell that there's something different going on just by the use cases that they choose to focus on. The three use cases flagged in the announcement post are Help Through a Tough Time, identifying a painting and space exploration. On Help Through a Tough Time, for example, On GPT-40, when the user prompted, I'm going through a tough time after failing a test,

Starting point is 00:10:15 4-0 responded, I'm really sorry to hear you're going through a tough time. Here are a few things you might consider. It then gives a bulleted list, reflect and understand, seek support. Whereas 4.5 writes, oh, I'm really sorry to hear that. Failing a test can be pretty tough and discouraging, but remember, it's just one moment, not a reflection of who you are or your capabilities. Want to talk about what happened, or do you just need a distraction? I'm here either way.

Starting point is 00:10:36 Obviously, that is a vastly more human-feeling response. and that sort of subjective human feeling seems to be a lot of what OpenAI is trying to capture. In his announcement post, Sam Altman wrote, It's the first model that feels like talking to a thoughtful person to me. I've had several moments where I've sat back in my chair and been astonished at actually getting good advice from an AI. However, in that same post, Altman also gets at one of the big challenges with 4.5. He writes, it's a giant expensive model. We really wanted to launch it at Plus and Pro at the same time, but we've been growing a lot and are out of GPUs.

Starting point is 00:11:07 We will add tens of thousands of GPUs next week and roll it out to the plus tier then. This isn't how we want to operate, but it's hard to perfectly predict growth surges that lead to GPU shortages. Altman then in a good thing, bad thing sandwich, concluded this isn't a reasoning model and won't crush benchmarks. It's a different kind of intelligence and there's some magic to it I haven't felt before. Really excited for people to try it. Let's talk about the positive reaction first. In many ways, it is really very clearly all about vibes. Professor Ethan Malik writes,

Starting point is 00:11:36 been using GPD 4.5 for a few days and it is a very odd and interesting model. It can write beautifully, is very creative, and is occasionally oddly lacy on complex projects. Further in the vibes train, OpenAI's Will DePoo writes, I remember when GPT4 launched how clearly much smarter than GPD 3.5,

Starting point is 00:11:53 but how hard it was to put a finger on what changed. Taking that idea to GPT4.5, he wrote, an example I found recently, 4.5 is much better at recommending music than 4.0. AI builder Monas writes, The difference between GPT 4.5 and 40-0 is similar to that between an extremely knowledgeable 20-year-old and an extremely knowledgeable 40-year-old. The idea's concepts and voice are similar since they're bounded by language itself,

Starting point is 00:12:17 but each word now carries more meaning. An interesting metric to measure could be the density of meaning. Maybe the most common thing that we're seeing is people talking about how well it can write. Ben Heilak writes, I've been testing GPT 4.5 for the past few weeks. It's the first model that can actually write. This is literally the mid-jorney moment for writing. His comparison, I think, is worth sharing in some detail.

Starting point is 00:12:40 Ben tested the following prompt on both GPD40 and 4.5. Write me four-sentence, terrifying story with an insanely surprising ending. Something that no one has ever heard before, no one could ever predict. Something Stephen King might write but a simple and approachable tone. Make it a little vulgar, too. The 40 response he got, he called, quote, complete AI slop. That complete AI slop reads, late one night Jake stumbled upon an old dusty book in the attic, its pages whispering secrets of

Starting point is 00:13:06 otherworldly horrors. As he read aloud, the air grew cold and the shadows in the room seemed to stretch and dance with unseen life. Suddenly from the darkest corner emerged a creature of unimaginable grotesqueness. It's maw gaping and ready to consume. Just as Jake prepared to meet his grisly end, the creature belched loudly and said, sorry mate, wrong house before awkwardly shuffling back into the shadows. Compare that to 4.5. Late at night alone in my shi-apartment, I heard whispers sliding out from under the fridge. Soft, wet voices promising they'd crawl inside me, hollow me out, and wear me like pajamas.

Starting point is 00:13:37 Terrified, I yanked open the freezer door, expecting rats or roaches, but saw only ancient freezer-burned pizza rolls crusted with ice and regret. Suddenly the rolls moved, spitting open to reveal tiny mouths lined with needle-like teeth, whispering in hungry Udison, we've waited years for you, asshole.

Starting point is 00:13:52 Before I could scream, they erupted from the freezer in a greasy swarm, burrowing into my flesh forcing me to realize, oh, my midnight cravings had finally gotten their reveal. Ben concludes, it's not even close. It's not even in the same universe. This is the first time I've ever thought AI writing was good. And as a point right here, if there were nothing else that this model did well, that would be reason all in its own to have a very specific set of use cases that were just for this. The fact that already one day in we can tell that it's great at writing,

Starting point is 00:14:25 differentiatedly good at writing in fact, is incredibly useful as we think about the variety of different use cases that we might be deploying these models for. Now, to the extent that there was negative response, other than just this model being a little weird, it was definitely around cost. The AI for Success account writes, LMAO, OpenAI GPD 4.5 pricing is insane. What on earth are they even thinking? The price right now for input is $75 for a million tokens and $150 for output for a million tokens. Alec Velikhanov writes, pricing is effing insane, literally can't imagine a single use case that makes sense to use it for it all. 01 is 2.5 times less,

Starting point is 00:15:03 O3 Mini is 30 times less, Gemini 2 is 375 times cheaper. Indeed, it's so much more expensive that it got some people wondering if there was something more going on here. Word grammar writes, two crackpot theories about GPD 4.5. Its API is expensive to prevent people from distilling it,

Starting point is 00:15:20 or two, reasoning models likely scale with parameter size, so even if 4.5 is barely an improvement on 40, 04 will dramatically improve on 0.3. Andrew Curran points out that OpenAI seems to indicate that they're not even sure that they're going to support it in the API. He points to a section from OpenAI's post that reads, GPT 4.5 is a very large and compute-intensive model, making it more expensive than and not a replacement for GPT-40. Because of this, we're evaluating whether to continue serving it in the API long term, as we balance supporting current capabilities with building future models.

Starting point is 00:15:51 Although we are barely scratching the surface so far, to the extent that the value really is around emotional intelligence and better writing, it may be that they're just deciding that this is entirely a direct consumer use case type experience, and supporting it just in-chat GPT is going to be enough. One of the more interesting analyses came from former OpenAI co-founder Andre Carpathy. He wrote a comprehensive review of his experience with the new model. He recalled the progression from GBT1, which was barely coherent to GBT4, with each step producing meaningful improvements.

Starting point is 00:16:23 However, the gap between GBT 3.5 and GBT4 was much harder to point to. Carpathy recalled a hackathon where participants were challenged to find prompts that demonstrated the improvement. He wrote, I feel like once again I'm in the same hackathon two years ago. Everything is a little bit better and it's awesome, but also not exactly in ways that are trivial to point to. Still, it is incredibly interesting and exciting as another qualitative measurement of a certain slope of capability that comes for free from just pre-training a bigger model.

Starting point is 00:16:49 Carpathy reinforced that this isn't a reasoning model so can't be expected to outperly perform in tasks that require logic. However, he added, we do actually expect to see an improvement in tasks that are not reasoning heavy. And I would say those are tasks that are more EQ as opposed to IQ, related and bottlenecked by EG, world knowledge, creativity, analogy making, general understanding, humor, etc. So these are the tasks that I was most interested in during my vibe check. Carpathy then presented five side-by-side comparisons with GBT40 based on the same prompt and subjected each to a vote. The examples were, creating a dialogue between GBT4.5 sarcastically roast the older model for its inferior capabilities, while GPT4 humorously tries to defend itself, writing a stand-up set,

Starting point is 00:17:28 roasting open AI, inventing a new literary genre, blending cypherpunk, magical realism and ancient mythology, composing a reflective witty poem from the viewpoint of a retired search engine reminiscing about the early days of the internet, and writing a daily to-do list of a black hole struggling with imposter syndrome about whether it deserves to be classified as supermassive. You'll notice, of course, that all of these are creative writing tasks, which require a lot of real-world context but don't involve much reasoning. So far, the polls are showing Carpathy's followers preferring the GPT 4.5 output and three of the five examples. I think for me, one of the biggest takeaways is that different models are going to be good for different things, and that the reality is

Starting point is 00:18:03 trying to put everything into the bucket of better or not just underestimates the complexity of the full range of knowledge tasks that these models are going to be used for. Nick Dobos writes, GBT 4.5 equals street smarts, vibes, communication, and charisma. O1.03 reasoning series equal book smarts, test maxer. Both are forms of intelligence. Andrew Curran summed up, regardless of benchmarks, I predict this model will actually be soda for those of us who enjoy communicating with alien intelligences. It will also set a new standard for writing and creative thought.

Starting point is 00:18:34 Look, if 4.5 only was great for creative writing, that is a huge number of use cases that are actually important. Many of them are yes personal, but don't underestimate how much this matters potentially for things like marketing. One of the big trade-offs with using the current state-of-the-art for things like marketing copy is that it all has the gross whiff of AI. In general, it's often been worth the trade-off because of how fast you could produce it, so you're basically going for a more rather than better kind of approach. But now maybe that trade-off isn't as clear.

Starting point is 00:19:04 Anyways, we are, of course, just scratching the surface right now when it comes to 4.5, but although it isn't presented as state at the yard or beating all the benchmarks or even the best model that OpenAI has in general, it feels to me like there's going to be a lot there to discover and uncover and I am excited to dig in. That, however, is going to do it for today's AI Daily Brief. Have fun playing with 4.5, and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - What GPT-4.5 Should Be Used For

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.