The AI Daily Brief: Artificial Intelligence News and Analysis - The AI Scientist That Does 6 Months of Work in a Day

Starting point is 00:00:00 Today on the AI Daily Brief, an AI scientist that can do six months of work in a single day. Before that in the headlines, Gemini 3 hype hits fever pitch. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, friends, quick notes before we dive in. Firstly, thank you to today's sponsors, KPMG, Robo, Robots and Pencils, and Blitzy. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief or subscribe on Apple Podcasts. If you are interested in sponsoring the show, send us a note at sponsors at AIDDailybrief.A.I. to learn more.

Starting point is 00:00:42 And lastly today, we are rounding the corner on the AI-R-O-I benchmarking study. Thank you to all of you who have already contributed. I'm pretty sure this is now one of the biggest collections of AI-R-O-I information. You have now about a week left to get your AI use cases in. Anyone that adds three or more will get the full detailed readout of the report. Again, you can find the information about that at ROIurvey.com. AI and there is one week left. With that, let's join the hype train, my friends. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. It is a big

Starting point is 00:01:16 risk as a podcaster to talk about speculation about an imminent release. The chances are just so good that by the time someone hears the show, the thing that you're talking about people all hyping up will actually be out and the show will instantly be dated. However, in this case, I don't care. First of it's a huge part of the conversation going on right now all over the AI community. And second, I kind of feel like it's similar to when you're in a restaurant and you've been waiting for your meal for a while and you head to the washroom in hopes that by the time you come back, your food will be sitting there waiting for you, all glittering and ready. In other words, if it happens to be the case that Google pops in and drops Gemini 3 right

Starting point is 00:01:55 on top of my head, I'll take it. Now, generically, people have been getting excited for the Gemini 3 release for a while. However, it certainly feels basically confirmed at this point with a teasing tweet from Google CEO Sundar Bichai on Friday. Sundar retweeted Polly Market showing 69% odds of the model being released this week with two thinking face emojis. Other Googlers basically all over X are also teasing the release. You even have some of the open AI folks getting excited. With Adam GPD posting, I'm excited for the rumored Gemini 3 model. Seems like it has the potential to be a real banger.

Starting point is 00:02:32 Now, as Vraser X pointed out, if even an open AI employee is this chilled about Google's rumor Gemini 3, you don't need a decode a ring to see what's going on. OpenAI must have an absolute monster model lined up for December. Business Insider certainly views the final few weeks of the year as a shootout between the Google and OpenAI teams. They wrote, if Gemini 3 is a smash hit, and right now, insiders tell Business Insider that the new model is extremely impressive, then it could give Google a shot at taking the top spot, a position it's been vying to reclaim since the generative AI boom began.

Starting point is 00:03:03 Many are betting on Google. Chubby wrote, edgy take, neither Open AI nor Anthropic will have a good answer to Gemini 3 anytime soon. Gemini will remain the best and increasingly popular AI model for a considerable time. Testing catalog went one better, adding, Google will likely be the first to reach Level 3 and actually make a publicly available product offering at a Level 3 scale very soon. Now, Level 3 refers to the 5-level framework for AI.

Starting point is 00:03:29 that came out of deep mind and then was refined by Open AI, with Level 3 being agents or systems that can take actions. The hype is, in fact, getting so hypey that some are making fun of it. Boyantung-us writes, Gemini 3 is so powerful it made Chuck Norris concede defeat. Andre Carpathy said, I heard Gemini 3 answers questions before you ask them, and that it can talk to your cat.

Starting point is 00:03:49 Some think the entire AI narrative is riding on the model being transformative, with DMT Capital commenting, if Gemini 3.0 doesn't cure cancer or world hunger, it's going to be incredibly over. Now, Polymarket is currently pricing in a Tuesday release, so we probably won't have all that much longer to wait to find out. Now, moving over to the market side of the house, despite the fear on Wall Street, Berkshire Hathaway is buying into the AI bubble to the extent that that's what we have. On Friday, regulatory filings disclosed that Warren Buffett's investment firm had purchased around $4.9 billion worth of Google stock during Q3. The same filing showed that

Starting point is 00:04:24 Berkshire had further trimmed their positions in Bank of America and Apple. Berkshire now holds a 0.3% stake in Google, which is relatively modest by their standards. Even after the selling, they still hold a 7.7% stake in Bank of America and around 1.5% of Apple. Still, it's one of the largest new positions bought by Berkshire since they began piling up cash in 2023. Around a third of the firm's portfolio, some $382 billion, is still held in cash as of the end of last quarter. For many investors, Berkshire buying AI stocks will be a huge signal to re-examine their views on a potential AI bubble. Although Warren Buffett has announced his retirement at the end of the year, Berkshire is still an embodiment of Buffett's investing style. And when it comes

Starting point is 00:05:05 to tech, the style isn't necessarily that great. Buffett famously refused to buy into big tech as it led one of the longest blow markets in U.S. history during the 2010s. They finally bought Apple in 2016, but until now haven't owned any of the other Mag 7. typically doesn't invest in high-growth companies. Instead, they're a value investor looking for companies that are mispriced based on current metrics. Still, Buffett admitted that he blew it by not investing in Google earlier. In 2018, he said, I had seen the product work, I knew the kind of margins they had. I didn't know enough about technology to know whether this really was the one that would stop the competitive race. Buffett's longtime partner, the late Charlie Munger put it more bluntly.

Starting point is 00:05:43 In 2019, Munger said that he didn't feel badly for not seeing the rise of Amazon coming, but that he felt, quote, like a horse's ass for not identifying Google better. I think Warren feels the same way. Now, importantly, this isn't necessarily a massive bet on AI for Berkshire. Google is still only the 10th largest position for the firm, and they are notably not buying into the speculative semiconductor or data center management companies. But it is still a major position and suggests that Berkshire thinks Google will have a strong position as a U.S. tech leader in the medium to long term.

Starting point is 00:06:13 It's also, frankly, not the kind of position you would put on if you believe the music is about to stop on a massive bubble in that sector. The position came about sometime in Q3, so Berkshire is already up at least 30% on it in just a few months. Google's stock rallied another 4% in AfterHours markets over the weekend following the Berkshire disclosure. Now, staying on the bubble theme, a week after sending the bubble talk into overdrive, Michael Burry has shut down his hedge fund. Burry famously bet against the housing market in 2008, so when he revealed a big short on Palantir and Nvidia, some believed betting against the AI bubble would be his next triumph. The media reported the Palantir short as a $9 billion bet. However, Burry corrected them last Thursday, noting that

Starting point is 00:06:54 they got the math wrong and that he had only bought around $9 million worth of bearish Palantir options. The relatively small size suggests Burry didn't have many investors left after repeatedly shorting stocks over the past decade. And indeed, in a letter to investors dated October 27th, Bury said that he would be liquidating the fund in returning capital. He acknowledged, my estimation of value in securities is not now and has not been for some time in sync with markets. Now, the letter leaked towards the end of last week, but based on the date, Burry had already made the decision to close the fund when he deliberately made headlines by disclosing his positions

Starting point is 00:07:27 early. Indeed, despite shutting down the fund, he is still pushing his short thesis on X, suggesting the AI-Cap-X boom will roll over next year and send the NASDAQ plummeting. The big question is whether he's still worth paying attention to. In a weekend op-ed, Bloomberg's Jonathan Levin asked with the obsession with Michael Burry says about ourselves. writes Levin, we're obsessed with contrarian investors that make concentrated hero bets on macro outcomes, and our fascination has only grown as an artificial intelligence boom pushes valuations ever higher. In Easily, my Mo's viewed tweet of all time, I put it a little bit more crisply.

Starting point is 00:07:59 An entire generation watched The Big Short, thought Michael Burry was cool, and spent the next decade calling everything a bubble. There was actually a really phenomenal post from an account called TMT breakout on X that basically argues that Sam Altman in OpenAI's aggressive announcement of all of these deals popped the non-bubble and put AI into a more scrutinized and reasonable place. They write, bad news for the AI Bulls and Bears. The past few weeks has brought an end to that paradigm and led us to an unexpected turning point in the dynamics of the AI trade and narrative. On the three-year anniversary of ChatGBT's release, no less,

Starting point is 00:08:33 and we have Sam's $1.4 trillion $30 gigawatt splurge to thank for it. Sam Splurge opened up AI Pandora's box, shifting the AI narrative in unexpected ways. Basically, they argue that the dealmaking was so ubiquitous and overwhelming that it actually made people take a big pause. They write the ironic thing, if Sam splurge would have been about half the size, things would have continued to grind along.

Starting point is 00:08:55 Investors would have enjoyed the 27 and 28 visibility, maybe even building the energy for a large vertical assent in price action. Instead, we had the opposite effect, pouring too much gasoline on the fire and drowning out the energy for a big move up. The conclusion, we think the straight-line, giddy phase of the AI trade will give way to something healthier, a phase where fundamentals and idiosyncrasies matter even more. Tech will always be a narrative in boom-and-bust-heavy investing sector, that's part of the fun, but in a landscape where sentiment is more balanced, stock picking will become more relevant.

Starting point is 00:09:23 That's a good thing. Sam Splurge popped the non-bubble, but the AI trade isn't broken. It's simply entering a more mature, scrutinized phase. Interesting stuff, but that is going to do it for today's headlines. Next up, the main episode. What if AI wasn't just a buzzword, but a business imperative? On You Can with AI, we take you inside the boardrooms and strategy sessions of the world's most forward-thinking enterprises. Hosted by me, Nathaniel Wittemore, and powered by KPMG, this seven-part series delivers real-world insights from leaders who are scaling AI with purpose,

Starting point is 00:09:59 from aligning culture and leadership to building trust, data readiness, and deploying AI agents. Whether you're a C-suite executive, strategist, or innovator, this podcast is your front row seat to the future of Enterprise AI. So go check it out at www.kpmg.org.us slash AI podcasts or search you can with AI on Spotify, Apple Podcasts, or wherever you get your podcasts. Meet Rovo, your AI-powered teammate. Rovo unleashes the potential of your team with AI-powered search, chat, and agents, or build your own agent with Studio. Rovo is powered by your organization's knowledge and lives on Atlassian's trusted and secure platform, so it's always working in the context of your work. Connect Robo to your favorite SaaS app so no knowledge gets left behind.

Starting point is 00:10:46 Robo runs on the teamwork graph, Atlassian's intelligence layer that unifies data across all of your apps and delivers personalized AI insights from day one. Robo is already built into Jira, Confluence and Jira service management standard, premium, and enterprise subscriptions. Know the feeling when AI turns from tool to teammate. If you rovo, you know. Discover Rovo, your new AI teammate powered by Atlassian. Get started at ROV as in VictoryO.com. AI changes fast. You need a partner built for the long game. Robots and pencils work side by side with organizations to turn AI ambition into real human impact. As an AWS certified partner,

Starting point is 00:11:24 they modernize infrastructure, design cloud native systems, and apply AI to create business value. and their partnerships don't end at launch. As AI changes, robots and pencils stays by your side so you keep pace. The difference is close partnership that builds value and compounds over time. Plus, with delivery centers across the U.S., Canada, Europe, and Latin America, clients get local expertise and global scale. For AI that delivers progress, not promises, visit robots and pencils.com slash AI Daily Brief.

Starting point is 00:11:53 This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with Infinite Code Context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise-scale codebases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzy platform, bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task. Blitzy delivers 80% plus of the development work autonomously, while providing a guide for

Starting point is 00:12:20 the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre-IDE development tool, pairing it with their coding pilot of choice to bring an AI-Native SDLC into their org. Visit blitzie.com and press get a demo to learn how Blitzy transforms your SDLC from AI assisted to AI native. Welcome back to the AI Daily Brief. As we sit here, Ansi in the Gemini 3 waiting room, there is a big discussion going on right

Starting point is 00:12:53 now in the AI community about this new AI scientist called Cosmos. Now, if you spend any time in and around the AI community, you'll know that one of the big promises that all the big labs talk about all the time, and which Sam Altman from OpenAI has a particular penchant for is the idea of AI advancing scientific research and doing so in some independent or mostly autonomous fashion. For as long as I've been paying attention to Altman's comments on AI, the scientific discovery use cases of AI have been the thing that have seemed to drive him more than any other. He wrote about this back in June in his post called the gentle singularity. Back then, pre-GPT-5, he wrote, we already hear from scientists that they are two or three

Starting point is 00:13:34 times more productive than they were before AI. From hereon, the tools we have already built will help us find further scientific insights and aid us in creating better AI systems. He's reiterated these themes in basically every interview he's done. When asked recently about his AGI definition, he said when AI can bring completely new discoveries, and he added, but you see all these examples now on Twitter where scientists in these different fields are saying it made a small discovery or came up with a novel approach or it figured something out, basically acknowledging that you are seeing more and more the first glimpses of AI for scientific discovery. Perhaps this is why OpenAI's former chief product officer, Kevin Wheel, began Open AI for science, with the goal to build an AI powered platform

Starting point is 00:14:14 for accelerating scientific discovery. And so with all this as background, I noticed as many did over the weekend that Altman himself had tweeted about this new announcement for a thing called Cosmos from Edison Scientific. Sam added, this is exciting. I expect we are going to see. see a lot more things like this, and it will be one of the most important aspects of AI. Congrats to the future house team. So what is Cosmos? CEO Sam Rodriguez explained it like this. On Twitter slash X, he posted,

Starting point is 00:14:40 today we're announcing Cosmos, our newest AI scientists available to use now. Users estimate Cosmos does six months of work in a single day. One run can read 1,500 papers and write 42,000 lines of code. At least 79% of its findings are reproducible. Cosmos has made seven discoveries so far, which we're releasing. today in areas ranging from neuroscience to material science and clinical genetics in collaboration with our academic beta testers. Three of the discoveries reproduced unpublished findings, and four are net new, validated contributions to the scientific literature. AI accelerated science is here.

Starting point is 00:15:15 So let's talk about what these discoveries were. As Rodriguez said, three of the discoveries saw Cosmos independently reproducing findings that were previously made by human scientists. In the first they write, Cosmos reproduced a claim from a then-unpublished manuscript, using metabolomics data, identifying nucleotide metabolism as the dominant altered pathway in hypothermic mice brains. The second discovery related to Parovskite solar cells, which are a new, lightweight, low-cost solar technology, that has many benefits, but which is very sensitive to moisture. Cosmos confirmed that humidity during heat treatment is the key factor in how well they

Starting point is 00:15:47 work and found a fatal filter point, above which a certain amount of humidity the cells fail. In the third discovery, Cosmos found the same mathematical patterns and how neurons connect across different species. Now, in the next four discoveries, Edison, which is the company behind Cosmos, claims that Cosmos made novel contributions to the scientific literature. The fourth discovery was that Cosmos found statistical evidence that higher levels of the enzyme, SOD2, may help reduce heart tissue damage in humans, which supports earlier findings seen in mice. In the fifth discovery, Cosmos used large genetic data sets to propose a new molecular explanation for how a specific genetic variant may lower the risk of type 2 diabetes. In the sixth discovery,

Starting point is 00:16:25 Cosmos created a new method to map the order of molecular changes that lead to tau buildup in Alzheimer's disease. Finally, in the seventh discovery, Cosmos found that the neurons first affected in Alzheimer's show reduced expression of flippease genes as mice age, which may make those neurons more vulnerable. Now, not being a scientist in any of these fields, I don't really have any sense of how significant these discoveries are. And obviously, now that Edison has gone public with this, I presume that lots and lots of scientists who are in these fields will go actually dig in and validate it for themselves. and of course I think it is going to be a very necessary skill as we head into the age of AI scientific discovery to be extremely skeptical of claims as a default position, even if you have no prior reason to doubt the source of those claims. We just generally need to keep our skepticism very high. Still, it seems extremely, extremely promising.

Starting point is 00:17:15 And so how does this work? Sam Rodriguez acknowledged that these numbers are out of sync in a positive way with previous estimates of where agendas were. In that same announcement post on X, he wrote, We are aware that the six-month figure is much greater than estimates by other AI labs like Meter about the length of tasks that AI agents can currently perform. So how do they do this? They write, Our core innovation in Cosmos is the use of a structured, continuously updated world model.

Starting point is 00:17:43 Cosmos' world model allows it to process orders of magnitude more information than could fit into the context of even the longest context language models, allowing it to synthesize more information and pursue coherent goals over longer time horizons than any of our prior agents. Now, one note here is, as Simon Smith points out, when I looked at the Cosmos paper, it wasn't clear what world model meant. I got the sense that it's a knowledge graph to which agents add information as they collect it, which is cool and useful, but probably not what most people mean by world model. Given that we have recently been talking about world models, I think the distinction is important. Carlos Perez tries to simplify what they have going on under the hood. He wrote, we hear AI scientist and think it's just a chatbot that's good at summarizing Wikipedia.

Starting point is 00:18:24 I was skeptical too. Most of these systems are toys. They can do a cool analysis, but they lose focus after a few steps. They can't run a real long-term investigation. The real problem wasn't raw intelligence, it was coherence. Imagine trying to write a book with 100 different people who can't see what anyone else is writing. You get a mess of disconnected paragraphs. That's what previous AI agents were like.

Starting point is 00:18:45 Brilliant but hopelessly siloed. So the team behind Cosmos didn't just try to build. a smarter brain, they built a shared consciousness. They call it a structured world model, which sounds complex, but the idea is genius. Think of it like a giant live updating whiteboard. Cosmos unleashes hundreds of AI agents in parallel. One read scientific papers, another analyzes data. When an agent finds something, it puts it on the whiteboard. Crucially, every other agent can see the whole board. Now, for those of you who have been following along for a while, this sounds to me a bit like Dr. Strange, but for scientific research. Where you spin up a lot of

Starting point is 00:19:18 instances of a thing doing similar work, so that it can, in aggregate, outperform. Nico McCarty writes, The general idea is that science follows a series of steps and that many of these steps can be automated. Those steps are, search the literature, read stuff, use your reading to come up with new hypotheses, try to draw connections between things, analyze data to draw conclusions, write up your results, repeat. He continues, Cosmos uses two separate agents, one for data analysis, another for literature searches,

Starting point is 00:19:43 to go out and do these tasks while sharing information with each other. The agents can then see what the other agents have learned, which is super useful. They exist within a single world model. A single run of Cosmos can execute up to 42,000 lines of code across 166 different data analysis agents and also read 1,500 scientific papers using 36 literature review agents. Each run takes up to 12 hours. So that's the gist. You spin this thing up, give it a huge prompt, and then let it cook. Now, it is important to note that even the team themselves don't think that things are perfect. First of all, they say, you have to know how to use it and it's much closer to a deep research tool. It's pricey at $200 a run.

Starting point is 00:20:20 And they point out, while Cosmos certainly does produce outputs that are the equivalent of several months of human labor, it also often goes down rabbit holes or chases statistically significant yet scientifically irrelevant findings. They point out, we often run Cosmos multiple times on the same objective in order to sample the various research avenues it can take. Which brings up one of the interesting questions. In that same post from Nico, he wrote, I'm not wholly convinced that the idea of extremely long runs will be palatable to most biology researchers. My take is that researchers are looking for more of a real-time collaborator, where you're constantly prompting and getting immediate feedback, rather than just delegating huge open-ended tasks to agents.

Starting point is 00:20:57 Now, this harkens to me to the conversation that I had with SWIX a couple weeks ago about the autonomy spectrum when it comes to coding agents. One of the things that we are figuring out from a user experience expectation standpoint across all these different domains of AI uses, is to what extent people want really fast real-time collaboration versus agents that go off and do things on their own. That balance is going to be a toggle and a spectrum, and it's not exactly clear for different use cases what the optimal combination is going to be. Andrew White, another co-founder of Edison Scientific, responded to that and said, love the pushback on autonomy versus interaction. It's something we struggle with internally. It's cost prohibitive right now,

Starting point is 00:21:33 but I would rather run 10 Cosmos jobs and then choose or edit the analysis I like, rather than agonizingly try to tell an agent exactly what to do. Now, what about the claims of the six-month estimate? In their blog post, they write, the most surprising part of our work on Cosmos was finding that a single Cosmos run can accomplish work equivalent to six months of a PhD or post-doctoral scientist. Moreover, the perceived work equivalency scales linearly with the depth of the Cosmos run, providing one of the first inference time-scaling laws for scientific research. They say that they were skeptical when they first got the results, but then share why they think it's valid. So how did the methodology for collecting this actually work?

Starting point is 00:22:08 Basically, this comes from estimates obtained from polling Cosmos' beta users. The beta users would give the team a research objective, they would run Cosmos for them, give them the outputs, and then pull them on how much time they estimated it would have taken them to come to the same conclusion. They write, the average across seven scientists was 6.14 months for a 20-step Cosmos run. Of course, they point out, human estimates of time saved are intrinsically suspect. And Niko has an issue here as well. Niko writes, the paper tries to quantify the time it would take for a human scientist to complete the work that Cosmos performs, but I find it a bit hand-wavy.

Starting point is 00:22:41 They say it takes a typical researcher 15 minutes to read a paper and two hours to write a Jupiter notebook for data analysis, and since Cosmos can read 1,500 papers per run, it offers huge time savings. But he continues, human scientists don't need to read hundreds of pages to make a discovery. The best scientists have an innate ability to triangulate to innovation, find the right combo of papers and discussions that enable them to make conceptual advances. seems difficult to replicate. The team at Cosmos agrees, at least in part, writing, human estimates of time saved are intrinsically suspect. However, they point to two reasons they think that Cosmos' work packages do actually equate to months of scientists' time. The first are the three discoveries that had been previously made but unpublished by humans, and the second was independent time estimates, which got that single paper 15 minutes piece.

Starting point is 00:23:25 Now, whether you think all of those numbers add up to exactly the right metric, again, I think it's fine to be skeptical. But there's clearly something powerful going on here, and progress being made. Computational biologist Zachary Flamholz wrote about his use of the tool. His conclusion, it is an understatement to say that I was impressed with what Cosmos did. From the well-structured discovery report, it was obvious that Cosmos understood my research question on par with my own understanding. This was new for me and AI tools. Previously, I used this research question to test other commercially available chatbots and none have sufficiently understood my question with the correct nuance and scientific context to advance my understanding of the question,

Starting point is 00:24:01 let alone do work on the problem. The last paragraph reads, I'm writing this post and starting this blog, because my experience with Cosmos is causing me to reimagine what my career will look like. Until now, commercial AI tools have been an efficiency multiplier for which I am very grateful. But Cosmos is different. The Scientific Enterprise will remember November 5, 2025. Stay tuned. Big words, of course, but very interesting stuff. You can find out more about Cosmos at Edison Scientific.com, and certainly I think that this will be a theme that we keep coming back to. For now, that's going to do it for today's AI Daily Brief. Appreciate you listening or watching as always. And until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - The AI Scientist That Does 6 Months of Work in a Day

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.