The AI Daily Brief: Artificial Intelligence News and Analysis - The AI Office Tools That Actually Work
Episode Date: September 13, 2025Not all AI office tools live up to the hype. Today we dig into new surveys, enterprise spending data, and an a16z analysis to uncover which AI tools actually perform in real-world workflows. From slid...es and spreadsheets to email, research, and meeting notes—we break down the tools worth your time right now.Brought to you by:KPMG – Discover how AI is transforming possibility into reality. Tune into the new KPMG 'You Can with AI' podcast and unlock insights that will inform smarter decisions inside your enterprise. Listen now and start shaping your future with every episode. https://www.kpmg.us/AIpodcastsBlitzy.com - Go to https://blitzy.com/ to build enterprise software in days, not months Robots & Pencils - Cloud-native AI solutions that power results https://robotsandpencils.com/Vanta - Simplify compliance - https://vanta.com/nlwThe Agent Readiness Audit from Superintelligent - Go to https://besuper.ai/ to request your company's agent readiness score.The AI Daily Brief helps you understand the most important news and discussions in AI. Subscribe to the podcast version of The AI Daily Brief wherever you listen: https://pod.link/1680633614Interested in sponsoring the show? nlw@aidailybrief.ai
Transcript
Discussion (0)
Today on the AI Daily Brief, these are the AI Office tools that actually work.
Before that in the headlines, a deal with Microsoft appears to pave the way for an open
AI for-profit conversion.
The AI Daily Brief is a daily podcast and video about the most important news and discussions
in AI.
All right, friends, quick announcements before we dive in.
Thank you to today's sponsors, Super Intelligent, KPMG, robots and pencils, and Blitzy.
And to get an ad-free version of the show, go to patreon.com.
slash AI Daily Brief.
Welcome back to the AI Daily Brief Headlines Edition, all the daily AI news you need in around
five minutes.
We have a bunch of stories today that just based on their significance to the industry alone,
could be mains, starting with the news that Microsoft and OpenAI have reached a deal
that would extend their partnership and pave the way for OpenAI to convert into a non-profit.
In a joint statement, the pair of companies said that they had signed a non-binding MOU to define
the next phase of their partnership.
In a separate blog post, OpenAI chairman Brett Taylor,
explain the contours of the deal.
Under the agreement, OpenAI's nonprofit would continue to exist and retain control over the
operations of the company.
Alongside a controlling interest, the nonprofit would also receive an equity stake in the for-profit
public benefit company worth more than $100 billion.
Taylor wrote that this would make it, quote, one of the most well-resourced philanthropic
organizations in the world.
He reaffirmed the point that OpenAI has been making from the beginning that basically this
is necessary to get the capital needed to actually do what they want to do.
Taylor writes,
This recapitalization would also enable us to raise the capital required to accomplish our mission
and ensure that as OpenAI's PBC grows, so will the nonprofit's resources, allowing us to bring it to
historic levels of community impact.
Now, no other terms of the deal were disclosed, so we don't know how much longer Microsoft will
have access to OpenAI's technology and IP.
The company said that they are still, quote, actively working to finalize contractual terms
in a definitive agreement.
Sources said that Microsoft was expected to take around a 30% stake, which would be valued at
around $165 billion. The agreement appears to put an end to the negotiations between OpenAI and
Microsoft, which have been ongoing for the better part of a year. When OpenAI announced plans to
convert into a for-profit corporation, Microsoft was reportedly the big holdout. And while this
means that likely all investors are now on board, there is still the hurdle of gaining the
approval from the California and Delaware attorneys general. These folks haven't given any indication
that they're on board with the conversion. And recently, the AGs have expressed dismay at safety
concerns regarding underage users. Earlier this month, California AG Rob Bonta said,
My office working alongside Delaware Attorney General Jennings has been reviewing the proposed restructuring
of Open AI. Together, we are particularly concerned with ensuring that the stated safety
mission of Open AI as a nonprofit remains front and center. Open AI purports to build AI to
benefit all of humanity. Humanity includes children, and before we can even get to benefiting,
we need to get to not harming. Now, remember, the stakes are incredibly high for OpenAI.
around $19 billion in funding is eligible to be clawed back if they can't close the for-profit
conversion by the end of the year. Failure to convert also puts future fundraising in jeopardy,
would have implications for whether or not they could IPO, et cetera, et cetera, et cetera.
Basically, for those keeping track, Open AI has absolutely cleared a major hurdle with this agreement,
but now has to go fight a very different type of legal battle as well.
Meanwhile, moving back over to the Microsoft side of things,
even if they are coming to terms with OpenAI, it is still very clear that they are trying to stand
in their own two feet a lot more going forward.
According to Bloomberg reporting, Microsoft's AI CEO Mustafa Sullyman told staff on all hands
on Thursday that Microsoft was planning to make, quote, significant investments into training
clusters.
He said that it was critical that a company of Microsoft's size had the ability to be
self-sufficient in AI should they choose to be.
Last month, when Microsoft released their first in-house models, Suleiman noted that they had
been trained on a cluster containing 15,000 Nvidia H-100s.
That's smaller than the 50, 100, or even 200,000 chip clusters that we've seen.
from other companies. And while the OpenAI relationship does hang over the decision-making at Microsoft,
that camp continues to insist that they aren't binary paths. So Levin claimed that Microsoft is
simultaneously deepening their relationship with OpenAI while also building their own models.
Of course, earlier in the week, the information reported that Microsoft would begin using Anthropics
models to drive certain parts of their co-pilot suite as well.
One more little model story around enterprises, Anthropic has rolled out their memory feature for
teams and enterprise plans, allowing users more control over context in a business setting.
The feature is structured around project-based memory, meaning that separate context silos
can easily be maintained for different work teams. Memory can also be imported and exported
to allow them to be transported between different AI tools. They're also adding an incognito mode,
which, when enabled, chats won't appear in the conversation history and won't be included
in memory. That said, incognito chats are not automatically deleted from anthropic servers
and will be stored for what they say are safety and legal reasons. I've said it before and I will
say it again, the big watchword for 2026 Enterprise AI is context, context engineering, context
orchestration, and this is just more evidence of that.
An interesting update on the chip side of things.
Chinese big tech companies are switching away from Nvidia in favor of their own chips.
The information reports that Alibaba and Baidu have started using internally designed chips
for model training.
Sources said that Alibaba has been using their own chips to train smaller models since
earlier this year, while Baidu has started experimenting with training new versions of existing
models with their chips.
Now, neither company has fully abandoned NVIDIA, and presumably large-scale inference will still rely on foreign chips until domestic manufacturing can scale.
But this is obviously a highly strategic move for Chinese companies to try to get off of their dependence of foreign infrastructure.
Key figures in Beijing are encouraging this shift.
Wei Xiaujan, a professor at Tsinghua University, an advisor to the government, recently said that Asian nations need to reduce their dependence on NVIDIA.
At a forum in Singapore this week, he said, it's unfortunate to see that we in Asia, including China, are emulating the U.S. when it comes to developing algorithms.
and large models. He argued that continuing down that path could be, in his words, lethal for the
region. CNBC mainwale reports that Beijing discreetly launched an $8.5 billion national AI fund at the
beginning of the year. Shanxi-Wang, a director of CCP-affiliated think tank, the state
information center, told reporters that China is, quote, consolidating its energy to do something big.
And lastly today, speaking of big or at least interesting, Albania has become the first country
to appoint an AI bot to a role in the government, known as Delia.
the voice agent has been part of the government services portal since January, helping citizens navigate bureaucratic tasks.
On Thursday, in announcing his new government, Prime Minister Edirama said,
Delia, the first cabinet member who's not physically present but has been virtually created by AI,
will help make Albania a country where public tenders are 100% free of corruption.
Officially the bot will focus on public procurement and will have control over deciding who will receive government contracts.
Albania has a long and persistent history of political corruption,
which has been one of the major economic issues for the nation.
Prime Minister Rama previously expressed interest in using AI as an anti-corruption tool that could eliminate bribes, threats, and conflicts of interest.
Now, decision-making power will be transitioned away from the government ministry and handed to AI in order to enhance transparency and government spending.
Now, what's interesting about this is that this kind of doesn't strike me as just for the headlines.
It seems like they are actually trying to use it as an unbiased, uncorruptible third party to make decisions to actually solve a key problem.
still whether it actually takes place and whether it becomes more than novelty, we'll have to wait and see.
But I'll tell you, stuff that seems weird today is going to seem completely normal in the future.
And it is going to be wild.
With that, though, we close the headlines.
Next up, the main episode.
If you are a regular listener, you will have heard about Super Intelligence Agent Readiness Audits at this point.
But I wanted to tell you today about the full suite of Agent Readiness products that go beyond just the initial readiness report.
Over the last six months, Super Intelligence has built out an entire agent planning suite.
We help you move from discovery to planning to implementation.
After you've completed your agent readiness audits, we help you double-click on your most
important use cases with what we call our use case planning reports.
These reports are going to help you understand what sort of technical preparation you need
to do to be ready for a use case, what challenges you might face in implementation, and whether
you should be thinking about building, buying, partnering, or some combination.
After that, you can even get a spec document in what we call our technical blueprint that gives either your developers or the developers of the partner you work with what they need to build exactly the agent that you're looking for.
If you want to learn more about superintelligence agent planning suite, we've built a custom GPT to answer your questions.
Just go to bit.ly slash super super agent. That's bit.l.ly slash super super agent, all one word.
And if you have any questions, the agent can even help you book an appointment with our team.
What if AI wasn't just a buzzword, but a business imperative?
On You Can with AI, we take you inside the boardrooms and strategy sessions of the world's most
forward-thinking enterprises.
Hosted by me, Nathaniel Wittamore, and powered by KPMG, this seven-part series delivers
real-world insights from leaders who are scaling AI with purpose.
From aligning culture and leadership to building trust, data readiness, and deploying AI
agents.
Whether you're a C-suite executive, strategist, or innovator, this podcast is your front-row seat
to the future of Enterprise AI.
So go check it out at www.kpmg.org.us slash AI podcasts or search you can with AI on Spotify,
Apple Podcasts, or wherever you get your podcasts.
Today's episode is brought to you by robots and pencils.
When competitive advantage lasts mere moments, speed to value wins the AI race.
While big consultancies bury progress under layers of process, robots and pencils builds impact at AI speed.
They partner with clients to enhance human potential through AI, modernizing apps,
strengthening data pipelines, and accelerating cloud transformation.
With AWS certified teams across U.S., Canada, Europe, and Latin America,
clients get local expertise and global scale.
And with a laser focus on real outcomes,
their solutions help organizers work smarter and serve customers better.
They're your nimble, high-service alternative to big integrators.
Turn your AI vision into value fast.
Stay ahead with a partner built for progress.
partner with robots and pencils at robots and pencils.com slash AI Daily Brief.
This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with infinite code context.
Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise-scale code bases with millions of lines of code.
Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements.
The Blitzie platform provides a plan, then generates and pre-compiles code for each task.
Blitzy delivers 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the sprint.
Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre-IDE development tool,
pairing it with their coding co-pilot of choice to bring an AI-native STLC into their org.
Blitzy is providing a limited time, 30-day free proof of concept for qualifying enterprises.
The team will provide a 5x velocity increase on a real development project in your org.
Visit blitzie.com and press book demo to learn how Blitzie transforms your STLC from AI-assisted to AI Native.
That's BLITZY.com.
Welcome back to the AI Daily Brief.
We love a good survey around here, especially one that has actually meaningful methodology and a big enough sample size to credibly back up the claims that it's making.
We recently got another one from online learning platform, Udacity, who are now owned by Accenture,
that has some interesting stuff to say about the state of AI use at work.
So what we're going to do today is look at that study, compare it with some really interesting
data from Ramp, and then connect the dots to a recent and recent Horowitz piece to ultimately
hopefully give you a sense of which AI office tools are actually most useful right now.
So first up, let's talk about this study.
The way that they sum it up is with their big banner headline, 90% of workers use it,
most still don't trust it.
Now, like I said, from a methodology perspective, Udacity surveyed,
2,000 professionals across a variety of different industries, job levels, and age groups.
And as we've found with every study, even the ones that have been held up as examples of AI
being overhyped or not actually useful enough, everyone is using these tools. I think at this point
we can just say that everyone is using AI. 90% of workers that Udacity surveyed said that they
have used some type of AI tool in their work. Unsurprisingly, for anyone who spent any time in
the enterprise AI space, about half of those workers, though, say that their employer doesn't
pay for any tools, and broadly speaking just don't feel supported by their workplace in their AI
tool usage. 42% said that their companies don't have clear AI use policies, and around a third of
workers are admitting to using unauthorized tools. A full 72% of managers say that they've paid
out of pocket for AI tools they need for work. There's also some interesting generational
stuff that I think I'll say for a different time. The way that they sum it up is that Gen Z feels
most comfortable with how AI may impact their careers, but they're also most critical of others who use
it, with the speculation being that perhaps they just have the most clear-eyed view of how
disruptive and important simultaneously AI is going to be. But the thing that I wanted to focus on
for the sake of this larger discussion and getting to this A16Z survey of which tools are
actually performant is this number right here. According to the Udacity study, three in four workers
regularly abandon AI tools mid-task, most commonly because their outputs lack accuracy or quality.
Going further on this trust question, 45% of respondents said that they don't trust the quality
of a colleague's deliverable if they know it was created with the help of AI, more than a third,
think less positively of colleagues who regularly use AI at work, and 36% would rather colleagues
avoid AI use in their deliverables altogether.
Udacity sums this up as trust being a major barrier.
And what was really interesting to me is that this isn't the only place we've seen this
idea of trust being a barrier show up recently.
Ramp is a bank slash finance solution for startups and companies, which means that every month they process billions of dollars in business-related expenses.
That creates a really interesting source of data.
And each month, they actually rank the vendors that their customers are purchasing from to help people understand what are the emerging trends in what tools people are using,
which companies seem to be penetrating the enterprise mainstream, and so on and so forth.
The story of these monthly surveys has basically been AI, AI, AI, AI, for a very long time at this point.
point. From a pure play standpoint, the top SaaS vendors last month by new customer account
included OpenAI and Anthropic in the number one and number three slot. But what gets more
interesting is in the fastest growing vendor category. Here's how Ara Karazian, an economist at
Ramp Economics Lab, explain this on Twitter. He writes,
execs at large companies have told us the inability to trust AI is holding back adoption. So
it makes sense that two of the fastest growing vendors on ramp are AI that's more trustworthy and
reliable for enterprise use. So they're pointing down into that fastest growing category. And the two
that they're looking at are the number two slot on overall new customer growth, which is called
Augment Code, and the number three on new overall spend growth, which is BrainTrust.
Our calls Brain Trust a platform to monitor performance, toxicity, and hallucination in AI models.
Brain Trust calls themselves the evals and observability platform for building reliable AI
agents. So basically what you have here is infrastructure that surrounds AI and agents
to make it more trustworthy, to hopefully improve its results.
And that's the number three fastest growing vendor when it comes to new spend.
Augment code, he describes as an AI coding platform that can work with enterprise-scale large
codebases, and it's clear where they're trying to differentiate from some of the other
agentic coding platforms is around the things that are going to matter for an enterprise use
case.
So take this together, we've got a survey saying that trust is a big issue.
We've got vendors that deal with trust showing up in the enterprise data, meaning that this
is a problem that people are trying to solve.
which brings us to this post from A16Z that's all about which AI office tools actually work.
The thrust of the piece is basically A, there are a ton of tools.
B, not all of them are great.
So C, here are the ones that actually work and for what.
Now, by way of setup, the team at A16Z divides AI productivity tools into two different categories.
One of the categories are the horizontal tools that are broad-based in general
and are meant to handle things across apps and tasks.
On the other end of the spectrum are vertical tools
that are all about going deep on a very specific workflow,
which could be email, could be slides, could be a spreadsheet.
So what are some examples?
Well, on the horizontal side, you have general assistant tools
like all of the generalist agents that we talk about on this show quite frequently.
We've got Manus, Gen Spark, OpenAI's operator.
Interestingly, they also put this new crop of agentic browsers
like Dia and Comet in that category.
Versus where I think a lot of folks have felt like the higher potential was,
at least in the short term, which is the vertical focus tools. You've got companies like Gamma
that are building PowerPoints and presentations. Platforms like Paradigm, which are focused on
spreadsheets, a bazillion platforms that are focused on meetings, note-taking, and all that sort of
thing. And then, of course, a bunch of AI-focused email platforms as well. But the question isn't
so much, are there tools for all these types of work, but instead do they work? So the way that
they tested them was for each of the different high-level use cases to create basically a rubric for
success and rate them in a green light, yellow light, red light kind of way.
Basically, how well they did at each of those different dimensions.
So, for example, when it comes to PowerPoint, slide design, one of the things that I think people
most want Office AI to be good at, the five vectors that they judged against were generation
time, how long it took to get the output, visual design, did it actually look good, content
quality, was the information presented well and comprehensively, for editability, once you
had that original generation? Was it easy to work with to make better? And five, prompt alignment.
How close to what you asked for did it actually give you? Now, they rated a combination of the
vertical and horizontal tools against each other. They didn't strictly segment this.
So in the case of PowerPoint, for example, they compare gamma, which is a more dedicated information
presentation tool to GenSpark, Manus, OpenAI operator, and Claude, which are all obviously
more general purpose and horizontal. They found that different tools had different strengths and weaknesses.
Interestingly, all of the tools except the one that was purpose-built for this use case had high
prompt alignment, while Gamma, that vertical tool only had medium prompt alignment.
However, overall, when it came to the PowerPoint or slide design use case, Gamma they ranked as the
best.
In three of the five categories, Generation Time, Visual Design and Editability, it was green lights,
with content quality and prompt alignment being the two yellow lights.
Of the general purpose tools, GenSpark did best, with two areas, content quality and prompt
alignment in the green, and generation time visual design and editability, all in the yellow.
Now, part of what's valuable about this as a rating system is that I can see different people
prioritizing different things in what they're looking for out of these tools, in a way that would
make a simple thumbs up, thumbs down, who's better or worse type of rubric a lot less useful.
For example, maybe I'm a great designer, or I have my own aesthetic, or I want to use mid-journey
or something else, or I already have a slide template. I might care way less about visual design
in that case, but much more about content quality. Maybe I don't want to spend a bunch of time editing
to improve the comprehensiveness of the data. I just wanted to nail that. In that case, even though
technically Gamma has more areas in the green than in the yellow, maybe I'm going to prioritize
Jen Spark or Manus because they rank highly in content quality. Likewise, if I'm trying to do this
fast versus it's just going on in the background, that's going to have a big difference in how much
I care about generation time. Still, from a practical takeaway standpoint, it sounds like if you want to go
build slides with AI right now. Two tools to check out are Gamma, which is a purpose-built
pool for information design. By the way, one of the things that's cool about Gamma is that it'll
also create a website for you and a scrolling PDF all of the same time. Or if you want to try
a more general purpose tool, it looks like Gen Spark might be your best bet. I should caveat here
that I have not gone and replicated this myself. I'm just sharing what A16Z found, and a lot of
this is going to be very subjective, but it's presented in the context of you going and doing
your own experiments as well.
Next up, spreadsheets.
The prompt that they had for this was trying to extract all the data from this PDF and
calculate operating margin.
So it was basically a prompt about working with the data in spreadsheets.
For this use case, the five rating areas were processing time, data extraction, calculation
accuracy, format design, and analysis quality.
One really positive thing on the spreadsheet use case is that all of the tools they
tested had high calculation accuracy.
They were all in the green.
When it came to data extraction, three of the four general purpose tools, Manus, GenSpark, and Open AI operator were all high, while the vertical tool shortcut AI was only medium.
That said, that vertical tool shortcut performed in the high category on calculation accuracy, format design, and analysis quality.
So, again, from a takeaway standpoint, if you want to go experiment with this yourself, and as before, you want to try one general purpose tool and one vertical tool, it looks like A16Z would suggest trying Manus for the general purpose approach and shortcut AI for the vertical.
For use case number three email, the tools that they're looking at are basically assistants that are embedded in email.
In this category, they looked at three of the vertical tools, Fixer, Serif and Jase, as well as one more general purpose assistant that's embedded in their AI browser.
The prompt was email to schedule a dinner next Thursday, and the five categories of review were draft quality, customization, context awareness, chat UI availability, and calendar coordination.
Fixer was the only one that scored in the green or yellow across all of the different categories.
Use case number four is research.
And I think that outside of drafting content,
this is one of the most ubiquitous personal use cases for AI,
among basically every professional that I run into at least.
The prompt was to summarize and compare the latest quarterly cloud revenue growth
from Microsoft, Amazon, and Google,
in a table with sources,
then analyze the drivers behind the results in a short report.
So it's important to note here that the prompt
includes not just data collection, but also organization and presentation.
For this, they compared all general purpose tools,
Manus, OpenAI, Operator,
and then two browsers, Comet, and DIA.
And the TLDR is that they pretty much all did this pretty well.
In fact, the only tool to score in the red in any of the categories,
which, by the way, for research were process time, data accuracy,
table quality, analysis depth, and source attribution,
was the DIA browser, which scored low in both analysis depth and source attribution.
Manus and Comet both had three out of the five categories in green in green,
while for OpenAI operator it was reversed, three in the yellow and two in the green.
1.0.2, if time matters to you, the native browsers were insanely fast. While it took about
four minutes for Manus and about five minutes for Operator, the prompt took 20 seconds in Dia and
just eight seconds in Comet. And like I said, it seems that Comet did as well as operator and close
to as well as Manus in a tiny fraction of the time. The last use case they shared was meeting
note-taking, where they focused on vertical tools, Granola Mem and Motion, and they used ChatGBT's
record mode as a more lightweight alternative. First, it's very clear that the more lightweight
weight alternative was not well suited to this, having four of the five categories in the red.
The five categories, by the way, were note quality, customization, collaboration and integration,
real-time support, and retrievability in search. All three of the vertical tools had three categories
in the green and two in the yellow, although the distribution of those were a little different
in each case. Meaning, once again, if you were trying to figure out which you should be using,
you might want to go dig into the specific so you can prioritize what matters most to you.
Overall, A16Z had three big observations.
The first is simply that there is already a clear dividing line between these categories of tools.
The vertical products and the horizontal products are emphasizing different things,
and their strengths and weaknesses follow from that.
The second observation is that the competition, particularly around the horizontal products,
is very intense.
They write, general assistance in agenic browsers are in a race to become the core UI for work.
Given the importance of both speed and accuracy,
companies that are closer to the model development may have a better chance
at delivering. Major research labs are still entering the race. Anthropic has recently launched a browser
copilot for Claude, and we expect more attempts from OpenAI and other players. Meaning without saying
you shouldn't invest time in something like Manus or GenSpark, did they have a very challenging
road ahead, given the competition that's likely to come from the foundation layer. Lastly, as much as the
vertical and horizontal are clear in how they're trying to differentiate, A16-Z thinks that convergence
is coming. They write, the sharp lines between vertical and horizontal agents are starting to blur, as
Vertical products look to jump into new categories and horizontal platforms double down on popular
use cases.
I think taking a step back, my general advice would be a few parts.
First of all, I think it's pretty clear from this that no one, in really any of these categories,
can claim total dominance.
Because of that, we're probably in a period where, as difficult as it is, the best strategy
is to be omnivorous when it comes to which tools you're trying.
I think from an AI usage hygiene perspective, the more time you can take to have at least a small
stable of tools that you're using, the better off you're going to be in keeping up with the latest
developments. The second thing that comes from that, though, and this one is maybe a little bit relieving,
is that it feels like the performance is close enough in a lot of these categories, that it may
make more sense to invest in figuring out how to really get the most out of the tool that you're
already working in, rather than trying to jump around wildly. For example, if you are a Manus or a
GenSpark partisan, I'm not sure that anything in these tests at least suggest that you need to
jump ship to go to the other area, and probably your time is better spent on figuring out how to work
with the particular foibles and strengths of the tool that you are already engaged with.
Lastly, I think it's probably going to be important not to get too attached to any particular tool.
This landscape is changing incredibly fast, and the thing that's most useful today might not be
the thing that's most useful tomorrow. Ultimately, view each prompt as its own little adventure,
and I'm sure you're going to get great results. For now, that's going to do it for today's AI Daily Brief.
Until next time, peace.
