The AI Daily Brief: Artificial Intelligence News and Analysis - 7 Ways to Use OpenAI's Operator

Episode Date: January 25, 2025

OpenAI's new Operator agent is making waves. In this episode, explore seven real-world ways users are testing its capabilities. From handling routine tasks like grocery shopping and bill payments ...to more ambitious applications like sales outreach and app development, Operator is setting a new standard for automation. Discover its potential, limitations, and the innovative ideas shaping its use. Brought to you by: KPMG – Go to ⁠⁠⁠⁠www.kpmg.us/ai⁠⁠⁠⁠ to learn more about how KPMG can help you drive value with our AI solutions. Vanta - Simplify compliance - ⁠⁠⁠⁠⁠⁠⁠https://vanta.com/nlw The Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score. The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614 Subscribe to the newsletter: https://aidailybrief.beehiiv.com/ Join our Discord: https://bit.ly/aibreakdown

Transcript
Discussion (0)
Starting point is 00:00:00 Today on the AI Daily Brief, OpenAI launches its agent operator, and before that in the headlines, the latest on President Trump's AI executive order. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. To join the conversation, follow the Discord link in our show notes. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. Today is one of those days where we basically have two main episodes crammed together as one. Later on in the main episode, I'll be talking about Open AI's operator, but for the headline, we are talking about a set of Trump policies, starting off with a new executive order on artificial intelligence. Going back a few days on Monday, the Biden AI executive order was one of the
Starting point is 00:00:48 many rescinded by the incoming administration. To be fair, the substantive parts of that 2023 order had largely already played out, mostly to do with government department's filing reports. The major ongoing policy was a mandatory testing and disclosure regime conducted through the AI Safety Institute, with major labs potentially continuing with this on a voluntary basis. Anthropic CEO Dario Amadee even commented, earlier in the week that the repeal was, quote, not a big deal. The big question then was what would take its place. We didn't have to wait very long to find out, with Trump outlining his AI agenda on Thursday. AIsar, David Sachs explained the order to the president in the Oval Office, stating,
Starting point is 00:01:20 we're announcing the administration's policy to make America the world capital and artificial intelligence and dominate and lead the world in AI. Overall, the new executive order is primarily a vibe shift towards AI acceleration, with just a little touch of the culture war for good measure. It says, the United States has long been at the forefront of AI innovation, driven by the strength of our free markets, world-class research institutions, and entrepreneurial spirit. To maintain this leadership, we must develop AI systems that are free from ideological bias or engineered social agendas. That, of course, is the culture war part, as you can see, from media who picked up on the free-from ideological bias piece. Still, overall, the order sets
Starting point is 00:01:54 the new overarching policy direction for the U.S., quote, to sustain and enhance America's global AI dominance in order to promote human flourishing, economic competitiveness, and national security. Substitively, the order directs the heads of various White House advisors to submit an action plan to achieve this policy within 180 days. And so in this way, while the substance of it might be different and while the action plan that they're looking for might be different, the function of the EO isn't all that dissimilar to what Biden put out, in that it's really more of a first step towards getting the various White House bodies in alignment around a set of new policies. Sachs is, of course, playing a leading role in this process alongside the advisor for science and technology
Starting point is 00:02:30 and the advisor for national security. Input is also required from the advisors for domestic policy in the Office of Management and Budget. The order also directs to survey all agencies for any action taken as a result of the Biden executive order. He is required to determine whether they are, quote, inconsistent with or present obstacles to the new policy directive. Within 60 days, the agencies are required to halt any initiatives deemed to be a problem. And that's just about it, a brief one-page document compared to the sprawling 111-page Biden-EO. I think most people's sense reviewing this is that the Trump administration knows that it wants to accelerate AI development, but isn't sure yet what steps it needs to take to do that.
Starting point is 00:03:06 The EO is basically a planting of the flag that says an important first step is removing the Biden guardrails as minimal as they were. As the companion fact sheet claimed, the Biden AI executive order established unnecessarily burdensome requirements for companies developing and deploying AI that would stifle private sector innovation and threaten American technological leadership. Again, appearing on Fox News, SAC said that the core point was to make the U.S. the global leader in AI. Unsurprisingly, there are plenty of folks who are concerned about what will come next. Alondra Nelson, the former acting director of the White House Office of Science and Technology Policy under Biden, noted that agencies would be tasked with reviewing initiatives, quote, that are
Starting point is 00:03:41 already helping people with an implicit intent to unwind them. She continued, In 60 Days, we'll know which Americans' rights and safety the Trump administration believes deserves to be protected in the age of AI, and if there will be a level playing field for every technologist, developer, and innovator, or just the tech billionaires. Still on the flip side, when it comes to industry and the accelerationists, the attitude might be summed up by based Beth Jaisos, who writes, unfathomable levels of EACC victory. Now, maybe a more substantive policy, which came through a virtual appearance at Davos, was President Trump announcing plans to accelerate energy policy for AI data centers.
Starting point is 00:04:14 He said, and I am clipping, editing, and paraphrasing because this is President Trump we're talking about here, we're going to give rapid approvals to build electric generating plants in the United States. We need double the energy we currently have in the U.S. for AI to really be as big as we want it. I'm going to give emergency declarations so they can start building them almost immediately. The National Energy Emergency was declared on day one of the presidency and directed government departments to use any tools they have at their disposal to expedite construction. The new part of the policy is that the administration has removed any climate targets that were binding the AI industry. For Trump, this seems to mean turning back to coal power.
Starting point is 00:04:47 He said there are some companies in the U.S. that have coal sitting right by the plant so that if there's an emergency, they can go back to that. Now, of course, this doesn't necessarily mean big tech companies are all of a sudden going to start building a ton of coal plant. Their climate goals are as much about internal pressure from employees and leadership, as well as public perception as they are about government policy. Instead, what many anticipate is the construction of gas-powered turbines which can be built quickly and relatively cheaply, as well as the red tape around nuclear facilities being slashed in order to ensure the new projects don't get stuck in the regulatory quagmire we've seen over the past few years. The other pillar of the policy is ensuring new data
Starting point is 00:05:19 centers are able to build exclusive on-premise power stations. Power utilities have lobbied against co-location in the past, warning it could lead to supply shortages. More realistically, however, co-location tends to just cut out these middlemen and reduces the need to wait for new infrastructure to be built. The optimistic take is that with the average wait time for connection to the grid ballooning out to multiple years, this policy change could speed up the deployment of new data center significantly. Lastly, an update on Project Stargate. The participants are claiming that they have the money despite what Elon Musk says. Tuesday's announcement of the $500 billion Project Stargate shook up the AI industry,
Starting point is 00:05:52 implying an infrastructure buildout even more significant than the Manhattan Project. Not everyone was convinced with Elon Musk saying that they didn't actually have the money. According to the information, though, they do actually have the money or at least enough to get started. Their report said that SoftBank and OpenAI have each committed $19 billion to the joint venture, although it's not exactly clear where OpenAI is getting its $19 billion, and that Oracle and Abu Dhabi-backed fund MGX are kicking in a further $7 billion apiece. Still a few pennies short of the $100 billion price tag for the first year of the project, but they're probably good for it.
Starting point is 00:06:23 At least that's what President Trump thinks. When asked about Elon's claims, Trump said, I don't know, they're putting up the money the government isn't. They're very rich people. I hope they do. And then he pointed out that Elon just hates one of those people, and he understands because he hates people too. There you have it. In the meantime, more details are emerging about the scope of the project. A source speaking with the financial time said that Stargate wouldn't rent out its compute, commenting, the intent is not to become a data center provider for the world. It's for OpenAI. Another source said that details are still being worked out, stating they haven't figured out the structure, they haven't figured out the financing, they don't
Starting point is 00:06:54 the money committed. And yet the first data center is under construction in Abilene, Texas. Sam Altman posted a video of the sprawling site commenting, big, beautiful buildings. With that, though, we will conclude the headlines. Next up, the main episode. Today's episode is brought to you by Vanta. Trust isn't just earned, it's demanded. Whether you're a startup founder navigating your first audit or a seasoned security professional scaling your GRC program, proving your commitment to security has never been more critical or more complex. That's where Vanta comes in. Businesses use Vanta to establish trust by automating compliance needs across over 35 frameworks like SOC2 and ISO-2-2-2. Centralized security workflows, complete questionnaires up to 5X faster, and proactively manage vendor risk. Vanta can help you start
Starting point is 00:07:42 or scale up your security program by connecting you with auditors and experts to conduct your audit and set up your security program quickly. Plus, with automation and AI throughout the platform, Vanta gives you time back, so you can focus on building your company. Join over 9,000 global companies like Atlassian, Quora, and Factory, who use Vanta to manage risk, improve security in real time. For a limited time, this audience gets $1,000 off Vanta at vanta.com slash NLW. That's va-n-T-A.com slash NLW for $1,000 off. If there is one thing that's clear about AI in 2025, it's that the agents are coming. Vertical agents by. industry horizontal agent platforms, agents per function. If you are running a large enterprise,
Starting point is 00:08:29 you will be experimenting with agents next year. And given how new this is, all of us are going to be back in pilot mode. That's why Superintelligent is offering a new product for the beginning of this year. It's an agent readiness and opportunity audit. Over the course of a couple quick weeks, we dig in with your team to understand what type of agents make sense for you to test, what type of infrastructure support you need to be ready, and to ultimately, come away with a set of actionable recommendations that get you prepared to figure out how agents can transform your business. If you are interested in the agent readiness and opportunity audit, reach out directly to me, NLW at B-Super.A.I. Put the word agent in the subject line so I know what
Starting point is 00:09:07 you're talking about, and let's have you be a leader in the most dynamic part of the AI market. Hello, AI Daily Brief listeners. Taking a quick break to share some very interesting findings from KPMG's latest AI quarterly pulse survey. Did you know that 67% of BIAEG's, business leaders expect AI to fundamentally transform their businesses within the next two years, and yet it's not all smooth sailing. The biggest challenges that they face include things like data quality, risk management, and employee adoption. KPMG is at the forefront of helping organizations navigate these hurdles. They're not just talking about AI, they're leading the charge with practical solutions and real-world applications. For instance, over half of the
Starting point is 00:09:45 organizations surveyed are exploring AI agents to handle tasks like administrative duties and call center operations. So if you're looking to stay ahead in the AI game, keep an eye on KPMG. They're not just a part of the conversation. They're helping shape it. Learn more about how KPMG is driving AI innovation at KPMG.com slash US. Welcome back to the AI Daily Brief. Yesterday I had a classic thing happen. This show is a daily show, right? Six out of the seven days of the week, there is an AI Daily brief talking to you about the latest AI news and discourse. And you would think that daily is a frequent enough cadence to actually capture and be up to date with all the news.
Starting point is 00:10:23 Alas, sometimes even that isn't enough, and yesterday we had one of those situations, where the headlines part of the episode talked about how it appeared that Operator would be coming this week, and between the time that I finished recording and when it was actually published, Operator had come out. I had a feeling as I was recording that that was going to happen, but in any case, that means that today we get to actually look at Operator,
Starting point is 00:10:44 which is, of course, OpenAI's first true, or at least advertised to be true agent project. call it an agent that can use its own browser to perform tasks for you. So let's find out what it is, and then we're going to talk through seven ways that people are using it already. Operator has been long in the making. Indeed, even as recently as a couple of weeks ago, there were news articles coming out that said that were exploring things like why OpenAI hadn't released an agent yet. Their announcement post describes operator as an agent that can go to the web to perform tasks for you. Interestingly, it uses its own browser. And with that browser, it can look at a web page,
Starting point is 00:11:18 interact with it through typing, clicking, or scrolling. Open AI is to some extent planning a flag here around what an agent is. Referring to them as AI is capable of doing work for you independently, you give it a task and it will execute. They suggest that this research preview version of operator is good at repetitive browser tasks such as filling out forms, ordering groceries, and creating memes. Now, in terms of how it actually works, there is some similarity to the way that Anthropics computer use mode is designed. The agent takes constant screenshots to see what it's doing in the web browser and can take control using the mouse and keyboard. Unlike Anthropic, though, OpenAI has implemented this as a fully remote setup. After receiving instructions,
Starting point is 00:11:56 operator opens its own virtual browser window in a cloud instance. You can watch it carry out its task, or you can click away and get on with other work while operator works in the background. Users retain full control of their computer with operator running in its own fully contained browser. This, of course, limits the specific things that it can do, but it also makes it more usable at the same time. OpenAI has worked with specific major websites like Stubhub, DoorDash, and OpenTable to try to improve and smooth out the integration, but theoretically, operator can access any website that it needs to carry out its task. There is a lot of human in the loop here as well. OpenAI writes, if operator encounters challenges or makes mistakes, it can leverage its reasoning
Starting point is 00:12:32 capabilities to self-correct. When it gets stuck and needs assistance, it simply hands control back to the user, ensuring a smooth and collaborative experience. Indeed, in addition to helping operator deal with certain types of issues, taking over is also required to finalize certain tasks. For example, this version of operator does not have access to credit card details, so if that's part of completing the task, it hands the system back over to the user to complete that particular step. Operator also asks for feedback at critical moments within its tasks. Under the hood, OpenAI has fine-tuned a version of GPT40 to drive operator, which they're calling their computer using agent or COA. As far as benchmarks go, COA achieved an 87% success rate on Web Voyager, which is a live
Starting point is 00:13:11 website navigation test, and a 58.1% success rate on web arena which simulates e-commerce and content management situations. Much better than Vanella GBT40, but certainly not necessarily the level of reliability we'd want before these types of experiences become endemic. Speaking of which, as VentureBeat points out, TikTok parent bite dance also launched its own AI agent for controlling web browsers yesterday called UI Tars. They write its totally open source and both similarly impressive benchmark performance, which makes them wonder if people will be willing to pay for Chad Shp.T. Pro is $200 a month, which is the only way that you can get access to Operator at the moment. As has been the custom with OpenAI releases lately, the feature is only available to
Starting point is 00:13:49 US Pro users, with Sammultman saying that Europe will unfortunately take a while. So let's talk now about some of the ways that people are actually using Operator. Keep in mind, these are all very nascent, first-test kind of use cases, and it always inevitably takes some time to really figure out the best ways to use any new capabilities like Operator offers. Certainly, when it comes to how OpenAI was positioning this, it's a lot of the very basic assistant tasks that I've often on the show said I don't think are going to be the real drivers of agent behavior when it comes to consumers. Ultimately, whether I'm right and these aren't the long-term drivers of agentic behavior, or I'm wrong, and this is exactly what people end up wanting to use agents for, it's clear that
Starting point is 00:14:28 they're valuable as a test case and as a way to start training and giving agents capabilities. The first use case that many people shared was some version of grocery shopping. This was one of the examples, in fact, that the Open AI team used to demonstrate operators' capabilities. They gave it a shopping list written down on a piece of paper, says, can you buy these for me, please? An operator goes, brings the list to Instacart, and after it's found the items and added them to the cart, asks whether it should finalize the order. In a week when crypto has been booming, it's appropriate that another experimental use case, this one from Rowan Chung, who of course runs the rundown, is crypto investment research based on tokens that are actually worth looking into.
Starting point is 00:15:04 Obviously, you could generalize this use case as research. The reason that I thought this example was interesting to share, was that it demonstrates one part of the human agent interface. At one point, operator got hit with an RU-Human Captcha and pinged Rowan to take control again to confirm and move forward. Number three in another very common demonstration use case, and once again, one that I've railed on before, is travel planning. Why Combinator President Gary Tan writes, open AI operator is very impressive, planning an impromptu trip to Vegas. It's able to navigate JSX's website and handle unusual cases and basically figured out sold-out scenarios, change dates, and times, and now it's figuring out where to eat for Friday night for two. I will say that when it comes to this type of
Starting point is 00:15:42 assistant use case, the more complex the travel is, in other words, the more details that need to be solved, the more I can see this particular type of interface, which just chatters at you to get the information it needs to execute, being an actually useful update. A fourth use case, this one once again from Rowan, researching a good birthday gift for my mom based on what she likes. A couple things that were interesting about this experiment. First of all, there were certain times and websites that it couldn't access, and it was capable of switching gears and finding another site that would do something similar. It also, in addition to looking for specific items, took it a step farther and actually helped compare and find the best price across the web. Number five, staying on the theme of rote regular tasks,
Starting point is 00:16:20 A16Z partner Olivia Moore says I just gave operator a picture of a paper bill I got in the mail. From only the bill picture it navigated to the website, pulled up my account, entered my info, and asked for my credit card number to complete payment. Once again, you see here, that is not going to take that final step of actually inputting the credit card number without human approval, although presumably in the long run, that might be something that people get more comfortable with actually allowing, and various agent assistance actually enable as well. Sixth use case, and this is, I think, where it gets a little bit more interesting from a business standpoint, is actually using the tool for sales. This comes from Pocketflow AI's Helena Zhang,
Starting point is 00:16:53 and let's just listen to the 30 seconds of what she did. Hi, here's a list of powerful women at companies we would love to work with, and I want to reach out to their head of AI with such a message. So I have prompted operator and talking to the operator. This is just so cool. So basically what operator did here was take a list of names, find their LinkedIn profiles, and add a message to connection requests, effectively doing prospecting. Lastly, our seventh use case, and again, I saw a number of different examples of this, was using the agent to build apps. Baby AGI creator and VC Yohei writes, I used OpenAI operator to be. build, deploy, and open source a tool on GitHub using Replit Agent. Took about 30 minutes. He also gave
Starting point is 00:17:39 some feedback writing, while working with Replit Agent, it actually deployed the app, tested it and described the error back to Replit agent for me. Operator asked me a few more questions that I wanted, but it was mostly for safety, e.g. filling form, so I guess okay with it. It had trouble with a few things around UI, like knowing it needs to scroll a page to see the rest of it, and it needed pointers to find the Git feature in Replit. Once it found the Git feature, it didn't need my assistance to create a repo and open source after having the agent write a read-me. While a bit slower, this was even more automated than RMPLIT agent, especially testing features and working through errors, which is impressive. The app that Yohei builds, by the way, was, quote, the classic to-do app with
Starting point is 00:18:13 the twist, it's for agents, API for agent to create, read, update, delete tasks, user web UI for manually managing tasks, test UI for testing endpoints and API performance metrics. Kishan also made an app, sharing a video and tweeting, use chat, GPT operator to use Bolt to create a project management app, general agent using a coding agent and it worked pretty well. I even deployed the app. This is insane. So basically we had here exactly as he described this general agent, which is operator, using the specific bolt agent, which is a web coding agent, to create something and it worked. When you see things like this, which open up fundamentally new possibilities and things that were never possible before, that's why I'm more skeptical of the very basic superficial,
Starting point is 00:18:52 do my grocery shopping for me type of tasks. Sure, it could be that assistance get so good at those things that it's not even worth a tiny handful of minutes that it used to take to do them. But certainly what gets me excited and what I think is going to drive more uptake are these never before possible things like building complete applications in this way. Ultimately, the way that I would describe people's general attitudes towards this is that while it isn't a lightning bolt chat GPT style a moment, operator is just good. It's not great at everything yet. It has some challenges, but it's definitely a preview of the future and where we're headed.
Starting point is 00:19:23 I anticipate over the next few weeks we are going to see a ton of different use cases thrown at this thing, and probably some that start to take off as really and regularly valuable. I will, of course, be back here to share those with you as they happen, but for now, that is going to do it for today's AI Daily Brief. And until next time, peace.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.