The AI Daily Brief: Artificial Intelligence News and Analysis - The AI Office Tools That Actually Work

Starting point is 00:00:00 Today on the AI Daily Brief, these are the AI Office tools that actually work. Before that in the headlines, a deal with Microsoft appears to pave the way for an open AI for-profit conversion. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick announcements before we dive in. Thank you to today's sponsors, Super Intelligent, KPMG, robots and pencils, and Blitzy. And to get an ad-free version of the show, go to patreon.com.

Starting point is 00:00:34 slash AI Daily Brief. Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around five minutes. We have a bunch of stories today that just based on their significance to the industry alone, could be mains, starting with the news that Microsoft and OpenAI have reached a deal that would extend their partnership and pave the way for OpenAI to convert into a non-profit. In a joint statement, the pair of companies said that they had signed a non-binding MOU to define the next phase of their partnership.

Starting point is 00:01:00 In a separate blog post, OpenAI chairman Brett Taylor, explain the contours of the deal. Under the agreement, OpenAI's nonprofit would continue to exist and retain control over the operations of the company. Alongside a controlling interest, the nonprofit would also receive an equity stake in the for-profit public benefit company worth more than $100 billion. Taylor wrote that this would make it, quote, one of the most well-resourced philanthropic organizations in the world.

Starting point is 00:01:24 He reaffirmed the point that OpenAI has been making from the beginning that basically this is necessary to get the capital needed to actually do what they want to do. Taylor writes, This recapitalization would also enable us to raise the capital required to accomplish our mission and ensure that as OpenAI's PBC grows, so will the nonprofit's resources, allowing us to bring it to historic levels of community impact. Now, no other terms of the deal were disclosed, so we don't know how much longer Microsoft will have access to OpenAI's technology and IP.

Starting point is 00:01:51 The company said that they are still, quote, actively working to finalize contractual terms in a definitive agreement. Sources said that Microsoft was expected to take around a 30% stake, which would be valued at around $165 billion. The agreement appears to put an end to the negotiations between OpenAI and Microsoft, which have been ongoing for the better part of a year. When OpenAI announced plans to convert into a for-profit corporation, Microsoft was reportedly the big holdout. And while this means that likely all investors are now on board, there is still the hurdle of gaining the approval from the California and Delaware attorneys general. These folks haven't given any indication

Starting point is 00:02:23 that they're on board with the conversion. And recently, the AGs have expressed dismay at safety concerns regarding underage users. Earlier this month, California AG Rob Bonta said, My office working alongside Delaware Attorney General Jennings has been reviewing the proposed restructuring of Open AI. Together, we are particularly concerned with ensuring that the stated safety mission of Open AI as a nonprofit remains front and center. Open AI purports to build AI to benefit all of humanity. Humanity includes children, and before we can even get to benefiting, we need to get to not harming. Now, remember, the stakes are incredibly high for OpenAI. around $19 billion in funding is eligible to be clawed back if they can't close the for-profit

Starting point is 00:03:01 conversion by the end of the year. Failure to convert also puts future fundraising in jeopardy, would have implications for whether or not they could IPO, et cetera, et cetera, et cetera. Basically, for those keeping track, Open AI has absolutely cleared a major hurdle with this agreement, but now has to go fight a very different type of legal battle as well. Meanwhile, moving back over to the Microsoft side of things, even if they are coming to terms with OpenAI, it is still very clear that they are trying to stand in their own two feet a lot more going forward. According to Bloomberg reporting, Microsoft's AI CEO Mustafa Sullyman told staff on all hands

Starting point is 00:03:32 on Thursday that Microsoft was planning to make, quote, significant investments into training clusters. He said that it was critical that a company of Microsoft's size had the ability to be self-sufficient in AI should they choose to be. Last month, when Microsoft released their first in-house models, Suleiman noted that they had been trained on a cluster containing 15,000 Nvidia H-100s. That's smaller than the 50, 100, or even 200,000 chip clusters that we've seen. from other companies. And while the OpenAI relationship does hang over the decision-making at Microsoft,

Starting point is 00:04:00 that camp continues to insist that they aren't binary paths. So Levin claimed that Microsoft is simultaneously deepening their relationship with OpenAI while also building their own models. Of course, earlier in the week, the information reported that Microsoft would begin using Anthropics models to drive certain parts of their co-pilot suite as well. One more little model story around enterprises, Anthropic has rolled out their memory feature for teams and enterprise plans, allowing users more control over context in a business setting. The feature is structured around project-based memory, meaning that separate context silos can easily be maintained for different work teams. Memory can also be imported and exported

Starting point is 00:04:34 to allow them to be transported between different AI tools. They're also adding an incognito mode, which, when enabled, chats won't appear in the conversation history and won't be included in memory. That said, incognito chats are not automatically deleted from anthropic servers and will be stored for what they say are safety and legal reasons. I've said it before and I will say it again, the big watchword for 2026 Enterprise AI is context, context engineering, context orchestration, and this is just more evidence of that. An interesting update on the chip side of things. Chinese big tech companies are switching away from Nvidia in favor of their own chips.

Starting point is 00:05:07 The information reports that Alibaba and Baidu have started using internally designed chips for model training. Sources said that Alibaba has been using their own chips to train smaller models since earlier this year, while Baidu has started experimenting with training new versions of existing models with their chips. Now, neither company has fully abandoned NVIDIA, and presumably large-scale inference will still rely on foreign chips until domestic manufacturing can scale. But this is obviously a highly strategic move for Chinese companies to try to get off of their dependence of foreign infrastructure. Key figures in Beijing are encouraging this shift.

Starting point is 00:05:37 Wei Xiaujan, a professor at Tsinghua University, an advisor to the government, recently said that Asian nations need to reduce their dependence on NVIDIA. At a forum in Singapore this week, he said, it's unfortunate to see that we in Asia, including China, are emulating the U.S. when it comes to developing algorithms. and large models. He argued that continuing down that path could be, in his words, lethal for the region. CNBC mainwale reports that Beijing discreetly launched an $8.5 billion national AI fund at the beginning of the year. Shanxi-Wang, a director of CCP-affiliated think tank, the state information center, told reporters that China is, quote, consolidating its energy to do something big. And lastly today, speaking of big or at least interesting, Albania has become the first country to appoint an AI bot to a role in the government, known as Delia.

Starting point is 00:06:21 the voice agent has been part of the government services portal since January, helping citizens navigate bureaucratic tasks. On Thursday, in announcing his new government, Prime Minister Edirama said, Delia, the first cabinet member who's not physically present but has been virtually created by AI, will help make Albania a country where public tenders are 100% free of corruption. Officially the bot will focus on public procurement and will have control over deciding who will receive government contracts. Albania has a long and persistent history of political corruption, which has been one of the major economic issues for the nation. Prime Minister Rama previously expressed interest in using AI as an anti-corruption tool that could eliminate bribes, threats, and conflicts of interest.

Starting point is 00:06:58 Now, decision-making power will be transitioned away from the government ministry and handed to AI in order to enhance transparency and government spending. Now, what's interesting about this is that this kind of doesn't strike me as just for the headlines. It seems like they are actually trying to use it as an unbiased, uncorruptible third party to make decisions to actually solve a key problem. still whether it actually takes place and whether it becomes more than novelty, we'll have to wait and see. But I'll tell you, stuff that seems weird today is going to seem completely normal in the future. And it is going to be wild. With that, though, we close the headlines. Next up, the main episode.

Starting point is 00:07:34 If you are a regular listener, you will have heard about Super Intelligence Agent Readiness Audits at this point. But I wanted to tell you today about the full suite of Agent Readiness products that go beyond just the initial readiness report. Over the last six months, Super Intelligence has built out an entire agent planning suite. We help you move from discovery to planning to implementation. After you've completed your agent readiness audits, we help you double-click on your most important use cases with what we call our use case planning reports. These reports are going to help you understand what sort of technical preparation you need to do to be ready for a use case, what challenges you might face in implementation, and whether

Starting point is 00:08:11 you should be thinking about building, buying, partnering, or some combination. After that, you can even get a spec document in what we call our technical blueprint that gives either your developers or the developers of the partner you work with what they need to build exactly the agent that you're looking for. If you want to learn more about superintelligence agent planning suite, we've built a custom GPT to answer your questions. Just go to bit.ly slash super super agent. That's bit.l.ly slash super super agent, all one word. And if you have any questions, the agent can even help you book an appointment with our team. What if AI wasn't just a buzzword, but a business imperative? On You Can with AI, we take you inside the boardrooms and strategy sessions of the world's most forward-thinking enterprises.

Starting point is 00:08:56 Hosted by me, Nathaniel Wittamore, and powered by KPMG, this seven-part series delivers real-world insights from leaders who are scaling AI with purpose. From aligning culture and leadership to building trust, data readiness, and deploying AI agents. Whether you're a C-suite executive, strategist, or innovator, this podcast is your front-row seat to the future of Enterprise AI. So go check it out at www.kpmg.org.us slash AI podcasts or search you can with AI on Spotify, Apple Podcasts, or wherever you get your podcasts.

Starting point is 00:09:28 Today's episode is brought to you by robots and pencils. When competitive advantage lasts mere moments, speed to value wins the AI race. While big consultancies bury progress under layers of process, robots and pencils builds impact at AI speed. They partner with clients to enhance human potential through AI, modernizing apps, strengthening data pipelines, and accelerating cloud transformation. With AWS certified teams across U.S., Canada, Europe, and Latin America, clients get local expertise and global scale. And with a laser focus on real outcomes,

Starting point is 00:09:58 their solutions help organizers work smarter and serve customers better. They're your nimble, high-service alternative to big integrators. Turn your AI vision into value fast. Stay ahead with a partner built for progress. partner with robots and pencils at robots and pencils.com slash AI Daily Brief. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with infinite code context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise-scale code bases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements.

Starting point is 00:10:35 The Blitzie platform provides a plan, then generates and pre-compiles code for each task. Blitzy delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre-IDE development tool, pairing it with their coding co-pilot of choice to bring an AI-native STLC into their org. Blitzy is providing a limited time, 30-day free proof of concept for qualifying enterprises. The team will provide a 5x velocity increase on a real development project in your org. Visit blitzie.com and press book demo to learn how Blitzie transforms your STLC from AI-assisted to AI Native. That's BLITZY.com.

Starting point is 00:11:18 Welcome back to the AI Daily Brief. We love a good survey around here, especially one that has actually meaningful methodology and a big enough sample size to credibly back up the claims that it's making. We recently got another one from online learning platform, Udacity, who are now owned by Accenture, that has some interesting stuff to say about the state of AI use at work. So what we're going to do today is look at that study, compare it with some really interesting data from Ramp, and then connect the dots to a recent and recent Horowitz piece to ultimately hopefully give you a sense of which AI office tools are actually most useful right now. So first up, let's talk about this study.

Starting point is 00:11:55 The way that they sum it up is with their big banner headline, 90% of workers use it, most still don't trust it. Now, like I said, from a methodology perspective, Udacity surveyed, 2,000 professionals across a variety of different industries, job levels, and age groups. And as we've found with every study, even the ones that have been held up as examples of AI being overhyped or not actually useful enough, everyone is using these tools. I think at this point we can just say that everyone is using AI. 90% of workers that Udacity surveyed said that they have used some type of AI tool in their work. Unsurprisingly, for anyone who spent any time in

Starting point is 00:12:28 the enterprise AI space, about half of those workers, though, say that their employer doesn't pay for any tools, and broadly speaking just don't feel supported by their workplace in their AI tool usage. 42% said that their companies don't have clear AI use policies, and around a third of workers are admitting to using unauthorized tools. A full 72% of managers say that they've paid out of pocket for AI tools they need for work. There's also some interesting generational stuff that I think I'll say for a different time. The way that they sum it up is that Gen Z feels most comfortable with how AI may impact their careers, but they're also most critical of others who use it, with the speculation being that perhaps they just have the most clear-eyed view of how

Starting point is 00:13:06 disruptive and important simultaneously AI is going to be. But the thing that I wanted to focus on for the sake of this larger discussion and getting to this A16Z survey of which tools are actually performant is this number right here. According to the Udacity study, three in four workers regularly abandon AI tools mid-task, most commonly because their outputs lack accuracy or quality. Going further on this trust question, 45% of respondents said that they don't trust the quality of a colleague's deliverable if they know it was created with the help of AI, more than a third, think less positively of colleagues who regularly use AI at work, and 36% would rather colleagues avoid AI use in their deliverables altogether.

Starting point is 00:13:45 Udacity sums this up as trust being a major barrier. And what was really interesting to me is that this isn't the only place we've seen this idea of trust being a barrier show up recently. Ramp is a bank slash finance solution for startups and companies, which means that every month they process billions of dollars in business-related expenses. That creates a really interesting source of data. And each month, they actually rank the vendors that their customers are purchasing from to help people understand what are the emerging trends in what tools people are using, which companies seem to be penetrating the enterprise mainstream, and so on and so forth. The story of these monthly surveys has basically been AI, AI, AI, AI, for a very long time at this point.

Starting point is 00:14:25 point. From a pure play standpoint, the top SaaS vendors last month by new customer account included OpenAI and Anthropic in the number one and number three slot. But what gets more interesting is in the fastest growing vendor category. Here's how Ara Karazian, an economist at Ramp Economics Lab, explain this on Twitter. He writes, execs at large companies have told us the inability to trust AI is holding back adoption. So it makes sense that two of the fastest growing vendors on ramp are AI that's more trustworthy and reliable for enterprise use. So they're pointing down into that fastest growing category. And the two that they're looking at are the number two slot on overall new customer growth, which is called

Starting point is 00:15:04 Augment Code, and the number three on new overall spend growth, which is BrainTrust. Our calls Brain Trust a platform to monitor performance, toxicity, and hallucination in AI models. Brain Trust calls themselves the evals and observability platform for building reliable AI agents. So basically what you have here is infrastructure that surrounds AI and agents to make it more trustworthy, to hopefully improve its results. And that's the number three fastest growing vendor when it comes to new spend. Augment code, he describes as an AI coding platform that can work with enterprise-scale large codebases, and it's clear where they're trying to differentiate from some of the other

Starting point is 00:15:39 agentic coding platforms is around the things that are going to matter for an enterprise use case. So take this together, we've got a survey saying that trust is a big issue. We've got vendors that deal with trust showing up in the enterprise data, meaning that this is a problem that people are trying to solve. which brings us to this post from A16Z that's all about which AI office tools actually work. The thrust of the piece is basically A, there are a ton of tools. B, not all of them are great.

Starting point is 00:16:03 So C, here are the ones that actually work and for what. Now, by way of setup, the team at A16Z divides AI productivity tools into two different categories. One of the categories are the horizontal tools that are broad-based in general and are meant to handle things across apps and tasks. On the other end of the spectrum are vertical tools that are all about going deep on a very specific workflow, which could be email, could be slides, could be a spreadsheet. So what are some examples?

Starting point is 00:16:30 Well, on the horizontal side, you have general assistant tools like all of the generalist agents that we talk about on this show quite frequently. We've got Manus, Gen Spark, OpenAI's operator. Interestingly, they also put this new crop of agentic browsers like Dia and Comet in that category. Versus where I think a lot of folks have felt like the higher potential was, at least in the short term, which is the vertical focus tools. You've got companies like Gamma that are building PowerPoints and presentations. Platforms like Paradigm, which are focused on

Starting point is 00:16:57 spreadsheets, a bazillion platforms that are focused on meetings, note-taking, and all that sort of thing. And then, of course, a bunch of AI-focused email platforms as well. But the question isn't so much, are there tools for all these types of work, but instead do they work? So the way that they tested them was for each of the different high-level use cases to create basically a rubric for success and rate them in a green light, yellow light, red light kind of way. Basically, how well they did at each of those different dimensions. So, for example, when it comes to PowerPoint, slide design, one of the things that I think people most want Office AI to be good at, the five vectors that they judged against were generation

Starting point is 00:17:36 time, how long it took to get the output, visual design, did it actually look good, content quality, was the information presented well and comprehensively, for editability, once you had that original generation? Was it easy to work with to make better? And five, prompt alignment. How close to what you asked for did it actually give you? Now, they rated a combination of the vertical and horizontal tools against each other. They didn't strictly segment this. So in the case of PowerPoint, for example, they compare gamma, which is a more dedicated information presentation tool to GenSpark, Manus, OpenAI operator, and Claude, which are all obviously more general purpose and horizontal. They found that different tools had different strengths and weaknesses.

Starting point is 00:18:15 Interestingly, all of the tools except the one that was purpose-built for this use case had high prompt alignment, while Gamma, that vertical tool only had medium prompt alignment. However, overall, when it came to the PowerPoint or slide design use case, Gamma they ranked as the best. In three of the five categories, Generation Time, Visual Design and Editability, it was green lights, with content quality and prompt alignment being the two yellow lights. Of the general purpose tools, GenSpark did best, with two areas, content quality and prompt alignment in the green, and generation time visual design and editability, all in the yellow.

Starting point is 00:18:49 Now, part of what's valuable about this as a rating system is that I can see different people prioritizing different things in what they're looking for out of these tools, in a way that would make a simple thumbs up, thumbs down, who's better or worse type of rubric a lot less useful. For example, maybe I'm a great designer, or I have my own aesthetic, or I want to use mid-journey or something else, or I already have a slide template. I might care way less about visual design in that case, but much more about content quality. Maybe I don't want to spend a bunch of time editing to improve the comprehensiveness of the data. I just wanted to nail that. In that case, even though technically Gamma has more areas in the green than in the yellow, maybe I'm going to prioritize

Starting point is 00:19:26 Jen Spark or Manus because they rank highly in content quality. Likewise, if I'm trying to do this fast versus it's just going on in the background, that's going to have a big difference in how much I care about generation time. Still, from a practical takeaway standpoint, it sounds like if you want to go build slides with AI right now. Two tools to check out are Gamma, which is a purpose-built pool for information design. By the way, one of the things that's cool about Gamma is that it'll also create a website for you and a scrolling PDF all of the same time. Or if you want to try a more general purpose tool, it looks like Gen Spark might be your best bet. I should caveat here that I have not gone and replicated this myself. I'm just sharing what A16Z found, and a lot of

Starting point is 00:20:05 this is going to be very subjective, but it's presented in the context of you going and doing your own experiments as well. Next up, spreadsheets. The prompt that they had for this was trying to extract all the data from this PDF and calculate operating margin. So it was basically a prompt about working with the data in spreadsheets. For this use case, the five rating areas were processing time, data extraction, calculation accuracy, format design, and analysis quality.

Starting point is 00:20:30 One really positive thing on the spreadsheet use case is that all of the tools they tested had high calculation accuracy. They were all in the green. When it came to data extraction, three of the four general purpose tools, Manus, GenSpark, and Open AI operator were all high, while the vertical tool shortcut AI was only medium. That said, that vertical tool shortcut performed in the high category on calculation accuracy, format design, and analysis quality. So, again, from a takeaway standpoint, if you want to go experiment with this yourself, and as before, you want to try one general purpose tool and one vertical tool, it looks like A16Z would suggest trying Manus for the general purpose approach and shortcut AI for the vertical. For use case number three email, the tools that they're looking at are basically assistants that are embedded in email. In this category, they looked at three of the vertical tools, Fixer, Serif and Jase, as well as one more general purpose assistant that's embedded in their AI browser.

Starting point is 00:21:22 The prompt was email to schedule a dinner next Thursday, and the five categories of review were draft quality, customization, context awareness, chat UI availability, and calendar coordination. Fixer was the only one that scored in the green or yellow across all of the different categories. Use case number four is research. And I think that outside of drafting content, this is one of the most ubiquitous personal use cases for AI, among basically every professional that I run into at least. The prompt was to summarize and compare the latest quarterly cloud revenue growth from Microsoft, Amazon, and Google,

Starting point is 00:21:53 in a table with sources, then analyze the drivers behind the results in a short report. So it's important to note here that the prompt includes not just data collection, but also organization and presentation. For this, they compared all general purpose tools, Manus, OpenAI, Operator, and then two browsers, Comet, and DIA. And the TLDR is that they pretty much all did this pretty well.

Starting point is 00:22:12 In fact, the only tool to score in the red in any of the categories, which, by the way, for research were process time, data accuracy, table quality, analysis depth, and source attribution, was the DIA browser, which scored low in both analysis depth and source attribution. Manus and Comet both had three out of the five categories in green in green, while for OpenAI operator it was reversed, three in the yellow and two in the green. 1.0.2, if time matters to you, the native browsers were insanely fast. While it took about four minutes for Manus and about five minutes for Operator, the prompt took 20 seconds in Dia and

Starting point is 00:22:45 just eight seconds in Comet. And like I said, it seems that Comet did as well as operator and close to as well as Manus in a tiny fraction of the time. The last use case they shared was meeting note-taking, where they focused on vertical tools, Granola Mem and Motion, and they used ChatGBT's record mode as a more lightweight alternative. First, it's very clear that the more lightweight weight alternative was not well suited to this, having four of the five categories in the red. The five categories, by the way, were note quality, customization, collaboration and integration, real-time support, and retrievability in search. All three of the vertical tools had three categories in the green and two in the yellow, although the distribution of those were a little different

Starting point is 00:23:23 in each case. Meaning, once again, if you were trying to figure out which you should be using, you might want to go dig into the specific so you can prioritize what matters most to you. Overall, A16Z had three big observations. The first is simply that there is already a clear dividing line between these categories of tools. The vertical products and the horizontal products are emphasizing different things, and their strengths and weaknesses follow from that. The second observation is that the competition, particularly around the horizontal products, is very intense.

Starting point is 00:23:52 They write, general assistance in agenic browsers are in a race to become the core UI for work. Given the importance of both speed and accuracy, companies that are closer to the model development may have a better chance at delivering. Major research labs are still entering the race. Anthropic has recently launched a browser copilot for Claude, and we expect more attempts from OpenAI and other players. Meaning without saying you shouldn't invest time in something like Manus or GenSpark, did they have a very challenging road ahead, given the competition that's likely to come from the foundation layer. Lastly, as much as the vertical and horizontal are clear in how they're trying to differentiate, A16-Z thinks that convergence

Starting point is 00:24:26 is coming. They write, the sharp lines between vertical and horizontal agents are starting to blur, as Vertical products look to jump into new categories and horizontal platforms double down on popular use cases. I think taking a step back, my general advice would be a few parts. First of all, I think it's pretty clear from this that no one, in really any of these categories, can claim total dominance. Because of that, we're probably in a period where, as difficult as it is, the best strategy is to be omnivorous when it comes to which tools you're trying.

Starting point is 00:24:56 I think from an AI usage hygiene perspective, the more time you can take to have at least a small stable of tools that you're using, the better off you're going to be in keeping up with the latest developments. The second thing that comes from that, though, and this one is maybe a little bit relieving, is that it feels like the performance is close enough in a lot of these categories, that it may make more sense to invest in figuring out how to really get the most out of the tool that you're already working in, rather than trying to jump around wildly. For example, if you are a Manus or a GenSpark partisan, I'm not sure that anything in these tests at least suggest that you need to jump ship to go to the other area, and probably your time is better spent on figuring out how to work

Starting point is 00:25:33 with the particular foibles and strengths of the tool that you are already engaged with. Lastly, I think it's probably going to be important not to get too attached to any particular tool. This landscape is changing incredibly fast, and the thing that's most useful today might not be the thing that's most useful tomorrow. Ultimately, view each prompt as its own little adventure, and I'm sure you're going to get great results. For now, that's going to do it for today's AI Daily Brief. Until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - The AI Office Tools That Actually Work

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.