The AI Daily Brief: Artificial Intelligence News and Analysis - 5 Ways Companies Are Using AI Agents Today

Starting point is 00:00:00 Today on the AI Daily Brief, five ways that companies are using AI agents right now. Before that on the headlines, Google forms a new team to build world models. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link at our show notes. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. Perhaps the biggest theme of Q4 of last year was this question around whether the pre-training model for scaling AI had started to into serious limits. We obviously got the rise of reasoning models like 01 and 03. We had CEOs

Starting point is 00:00:42 like Satchanedella from Microsoft talking about how new architectures were needed, but we also got some interesting alternatives. One of the approaches that some are interested in are models that can simulate the physical world. Google is forming a new team within DeepMind to work on scaling these types of models. The team will be led by Tim Brooks, one of the co-leads of OpenAIs SORA video model who left that company back in October. Yesterday, Brooks posted, DeepMind has ambitious plans to make massive generative models that simulate the world. I'm hiring for a new team with this mission. Come build with us. So far, what we've seen from labs are functional if limited demos. Basically, these are AI models that have a better understanding of the physics

Starting point is 00:01:21 and appearance of the real world, understanding it in a similar way to how LLMs understand the structure of language. So far, a lot of what we've seen from world model labs are based on training data from video games or movies, and so are really only a proof of concept. One of the few projects to move past this stage was Genesis, first shown off last month. That project was able to generate groundbreaking video and extremely accurate robotics training modules using a 4D world simulation. Genesis claimed they were able to train robots 430 times faster than the previous leading physics simulator, cutting the time below a minute. Now, Deep Mind is one of the labs that published a brief demo of a model that understands video game physics last year. That model was called Genie 2,

Starting point is 00:01:59 and I actually think that the announcement went a little under the radar. Establishing this new team suggests that they want to push the technology even harder. Job postings for the new team invited applicants to, quote, join an ambitious project to build generative models that simulate the physical world. We believe scaling pre-training on video and multimodal data is on the critical path to artificial general intelligence. World models will power numerous domains, such as visual reasoning and simulation, planning for embodied agents and real-time interactive entertainment. The team will collaborate with and build on work from Gemini, VO, and Genie teams and tackle critical new problems to scale world models to the highest level of compute. One of the people who has

Starting point is 00:02:33 talked most explicitly about this view of the importance of these types of models for achieving AGI, is meta-chief AI scientist Jan Lecun. Indeed, he has gone so far as to hypothesize, loudly, on Twitter, that standard GPT architecture has no pathway to AGI. This project sounds as though it will be one of the first to attempt to build a world model using the full scale of the training data and compute that can be mustered by a big tech firm. Invidia, meanwhile, is also pushing the frontier of world models, releasing a family of models called Cosmos, during his keynote address at CES, which we will cover in more depth later in the week, the Nvidia CEO Jensen Huang announced, the chat GPT moment for robotics is coming. Like large language models, world foundation

Starting point is 00:03:10 models are fundamental to advancing robot and AV development, yet not all developers have the expertise and resources to train their own. He demonstrated the model being used to simulate warehouses and roadways commenting, it's not about generating creative content, but teaching the AI to understand the physical world. The models were trained on 20 million hours of video, with a particular focus on human movements like walking, hand movements, and manipulating objects. They can be fine-tuned for specific tasks and customized for external data. The family includes three models ranging from 4 billion to 14 billion parameters. The smallest model is optimized for low latency in real-time applications, while the largest model is intended to deliver high-fidelity outputs.

Starting point is 00:03:47 And what's more, the models are available as open source for commercial use, allowing robotics and autonomous vehicle developers to use them in production. Diego Odd posted, this is huge for AI democratization, a powerful open source video world model trained on 20 million hours. Not just the model itself, but its application to synthetic data generation could be a game changer for robotics training. One more quick story before we close out the headlines. One of the big questions surrounding the AI industry is whether it can actually make money. You'll remember that this was a huge point of conversation last summer.

Starting point is 00:04:18 We had that Sequoia blog post AI $600 billion problem. And now we've learned that ChatGPT Pro, the $200 per month tier, is not only not a cash grab, but is actually not even paying for itself. A couple of days ago, Sam Altman tweeted, insane thing. We are currently losing money on OpenAI pro subscriptions. People use it much more than we expected. In the replies, he added, I personally chose the price and thought we would make money. Now, of course, Open AI is making a ton of money but losing more. The company reportedly expected losses of around $5 billion last year on revenues of $3.7 billion. The pricing of all of this stuff at any point has been pretty arbitrary. In a recent interview, Sam Altman said,

Starting point is 00:04:58 that when it came to the main chat GPT subscription, the company was tossing it up between $20 and $42. They eventually went with $20 because, quote, people thought $42 was a little too much. They were happy to pay $20. Alman continued, it was not a rigorous hire someone and do a pricing study thing. Now, what makes this interesting isn't anything really about OpenAI itself. It's much more about the question of the long-term profitability of AI. Mojo Flynn writes, OpenAI losing money is no big surprise. but when they're losing money on a $200 monthly subscription

Starting point is 00:05:29 should tell you there's no viable at-scale consumer business model. Even Microsoft with a $30 co-pilot subscription is forced to offer discounted pricing. I don't think it's an unreasonable concern. However, I do have a very different take. I think that we are extremely early in the life cycle of AI, and the simple reality is, the cost of delivering the service hasn't come down

Starting point is 00:05:49 as fast as the demand for using the service has increased. That's an unsustainable state, but unsustainable doesn't mean an understanding. inevitable failure, it means that there's going to need to be a recalibration. Already, the cost of AI has come down spectacularly from where it was a few years ago, at least in terms of what you can do with the same amount. I would expect that to continue, and I think that we're going to figure out a lot more use case by use case what sort of business models different performance levels of AI can

Starting point is 00:06:16 support. Frankly, I think this is exactly what venture capital and risk capital is designed to do. It's designed to allow incredibly promising innovations, the ability to build and get through these complicated early stages before these markets get rationalized. I think the speed of adoption of these tools has taken basically everyone by surprise and puts additional pressure on this even relative to other industries. Anyways, still an interesting story to watch, one that we will keep track of here. For now, though, that is going to do it for today's AIA Daily Brief Headlines edition. Next up, the main episode. Today's episode is brought to you by Vanta. Trust isn't just earned,

Starting point is 00:06:52 it's demanded. Whether you're a startup founder navigating your first audit or a season, and security professionals scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in. Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC2 and ISO-2 2701. Centralized security workflows, complete questionnaires up to 5X faster, and proactively manage vendor risk. Vanta can help you start or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back, so you can focus

Starting point is 00:07:33 on building your company. Join over 9,000 global companies like Atlassian, Kora, and Factory, who use Vantage to manage risk and prove security in real time. For a limited time, this audience gets $1,000 off Vanta at vanta.com slash NLW. That's V-A-N-T-A-com slash NL-W for $1,000 off. There is one thing that's clear about AI in 2025. It's that the agents are coming. Vertical agents by industry, horizontal agent platforms, agents per function. If you are running a large enterprise, you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode. That's why Superintelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks,

Starting point is 00:08:24 we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business. If you are interested in the agent readiness and opportunity audit, reach out directly to me, NLW at B-Super.A.I. Put the word agent in the subject line so I know what you're talking about, and let's have you be a leader in the most dynamic part of the AI market.

Starting point is 00:08:51 Welcome back to the AI Daily Brief. Right now in Las Vegas, the annual CES Consumer Electronics Show is happening, and I anticipate that there will be some interesting AI announcements from that event that we will cover later in the week. However, for today's episode, as we let those announcements come in a little bit more, I noticed something interesting in the Wall Street Journal. Yesterday, that publication published a piece of their CIO journal called How Are Companies Using AI Agents.

Starting point is 00:09:17 Here's a look at five early users of the bots. You can tell the language is a little bit stuck in the past. But what's interesting to me is that in a year where we really are talking about 2025 being the time that companies start experimenting with agents, mainstream media is already picking it up that this is a major theme. Part of why this matters is that most people in big companies, much to my chagrin, are not so up to speed that they're listening to something like the AI Daily Brief. They're getting their news from sources like the Wall Street Journal. And so when this style of publication starts taking this stuff seriously, it can have a pretty big impact. So what we're going to do today is briefly look through these.

Starting point is 00:09:52 five use cases that the WSJ covered, and I'm going to pair that with an overview of a recent paper from Google that I think might be a pretty useful resource as well. The Wall Street Journal piece basically points out that this is a big trend. They describe how many different companies have officially announced their own agents, and they point out one of the biggest reasons, frankly, that enterprises are so focused on agents. Quote, if these agents work is promised, they could also provide businesses with the return on investment they've been looking for out of generative AI. According to some corporate technology leaders, that means the ability to tie the technology to a reduction in the number of hours employees work, or even how many new people they need to

Starting point is 00:10:28 hire. Basically, there is a priori built in if agents actually work. Agents necessarily replace certain amounts of human labor, and presumably do it at lower cost than the equivalent human time. Now, it's important to note that how companies use those cost savings and that increased productivity is going to dictate just how disruptive this is. If companies reinvest that human time into growing the business in other areas, I tend to think that this will be a phenomenal development for everyone. If, on the other hand, they'd just view it as a cost-cutting measure,

Starting point is 00:10:58 well, that's a whole different kettle of fish. But the real thrust of this Wall Street Journal piece is to try to figure out how agents are being used right now in reality. The first example they gave is from pharmaceutical giant Johnson and Johnson, who have been deploying drug discovery agents. Honing in on what agents can and can't do, the article points out that these agents aren't yet up to the task of coming up with new drugs all by themselves. Instead, they're deployed to optimize key points in the drug synthesis process.

Starting point is 00:11:24 Traditionally, drug manufacturing is refined by running a multitude of experiments, which often have multiple variables to adjust. Agents are able to take the data from a smaller number of experiments and extrapolate it out to arrive at an optimal method. At this stage, employees are still reviewing the output of agents, but they write, the company is still figuring out how that oversight can be done more systematically. Next up, we move over to the world of Finance, where financial analysis firm Moody's has developed a team of agents to research public company filings and perform industry comparisons. In total, the firm has 35 different agent designs all trained for different subtasks and linked up together in a multi-agent system. The system

Starting point is 00:11:59 even has agents as supervisors to check for hallucinations. The novel idea here is that each agent has its own set of instructions, personality, and data access. This means the agents within the system can come up with different conclusions in their analysis, which are then synthesized together. For example, one agent might be building their analysis based on industry competition data, while another might be focused on geopolitical risk. Nick Reed, the company's chief product officer, said, it's almost a bit like your ability as an individual person. What we worked out is that an agent is better at not multitasking. This is obviously a highly relevant conclusion, even if this just represents the current state of things, in terms of how enterprises think about deploying agents.

Starting point is 00:12:36 Rather than trying to have one agent do multiple things, companies might get better results by assigning multiple agents with narrow subtasks and finding ways to coordinate them, once again, possibly with agents. The thinking is not ultimately dissimilar to the way you would construct a team of humans to carry out a multidisciplinary task. eBay is engaged in one of the most popular agent use cases, writing code. Interestingly, eBay actually built its own agent framework that can take advantage of several different LLMs. In addition to writing code, eBay's agents are also creating marketing campaigns, and they're planning on rolling out another set of agents that can help buyers find items, as well as helping sellers list goods.

Starting point is 00:13:13 The journal writes, eBay's agent framework functions as an orchestrator, dictating which AI models will be used for certain tasks like translating code and suggesting code snippets. Next up is Deutsche Telecom, and rather than facing outward, their agents are facing inward. The company employs roughly 80,000 workers across Germany. They've trained agents now to answer employees. employee questions about internal policies and benefits. They also have an agent trained to assist service staff with questions about the company's products and services. In this case, we might be pushing the boundaries of the language of agent. This sort of sounds ultimately like a chatbot

Starting point is 00:13:46 that has access to internal databases. Still, call it what you want. It seems to be getting a lot of traction. The company's chief product and digital officer, Jonathan Abramson, said that about 10,000 employees are using it each week. That is dramatically more efficient than having an HR specialist or having employees search for policies on an internal website. Still, Deutsche Telecom is figuring out how to go farther. The company's next step is allowing the agent to execute requests on behalf of employees further automating basic HR. The example given was allowing the agent to complete a request for leave and enter it into the HR system, all fully automated from a natural language text prompt. The final example is, I believe, at this stage, the most commonly deployed agent example.

Starting point is 00:14:26 In this case, it came from Spanish company, Constantino, who manufacturer countertops and other stone materials for buildings. The company has brought on a team of agents to fill in gaps for their customer service staff. They refer to the agents as a digital workforce and are thinking about them in a very similar way to human workers. The agents are expected to have basic skills but receive training when they begin work. Agents are given instructions to follow a strict process and supervisors are present to ensure they don't go off the rails. The so-called digital staff have replaced the work of three to four team members who were previously involved in clearing customer orders. Those people have now been reassigned to more high-touch areas of customer service liberated from their data entry tasks.

Starting point is 00:15:04 Now, like I said, all of these are fairly basic use cases, but that I think represents where we are. I do believe that 2025 is going to be a huge year for agent pilots, and many of them are going to fall into some of these areas described and articulated in this piece. Now, one useful resource for figuring out how to implement agents in your workforce is a white paper published by Google last September, simply titled, agents. The paper explains what agents are and what they require to function. But more importantly, suggests that companies shouldn't think about agents as an upgrade to existing technology. Instead, they should think about agents as a fundamental shift in the way organizations operate in order to see maximum gains in efficiency and productivity.

Starting point is 00:15:41 Basically, the first big idea in the paper is that agents are more than just smarter LLMs. The core agentic function is being able to access other systems. This could mean simply accessing a database to inform an output, but the possibilities go so much deeper. It's possible, for example, to integrate agents into real-time data feeds to inform autonomous to see. decision-making. Agents have much greater ability to process data than a human. We will likely find agents are able to monitor and take actions based on multiple data sources that would have required an entire team of people to carry out. Google's paper discusses another big difference between LLMs and agents, the ability to reason through multi-step tasks. There are many different

Starting point is 00:16:15 architectures that can be used to achieve this. The agent could use chain of thought, an iterative process of reassessing the task as it progresses based on new information revealed at each step, could use a tree of thoughts where multiple possible solutions are explored at the same time. Ultimately, according to the paper, this makes agents capable of managing uncertainty and complexity in ways that traditional models can't. There's a ton of really interesting information in here. I will link to it in the show notes. And of course, one quick shill here if you've made it this far. You've probably been hearing this ad, but one of the things that we were doing at Super this year is an agent readiness audit, where we are digging in with you to help you understand

Starting point is 00:16:46 what parts of your company or your workforce's activities are best suited for exploring agents. And we're also helping scope and even support pilots in that area. If that's something you're interested in, email me at nLW at Bsuper.a.I. And join this 2025, the year of agents. For now that, that is going to do it for today's AI Daily Brief. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - 5 Ways Companies Are Using AI Agents Today

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.