The AI Daily Brief: Artificial Intelligence News and Analysis - When Will AI Make Scientific Discoveries?

Starting point is 00:00:00 Today on the AI Daily Brief, when will AI start making novel scientific discoveries? Before that in the headlines, the U.S. government says actually DeepSeek stinks. The AI Daily Brief is a daily podcast and video about the most important news and discussions in AI. All right, quick announcements before we dive in. First of all, thank you to today's sponsors, Notion, Blitzy, Insightwise, and Pencils and Robots. To get an ad-free version of the show, go to patreon.com slash AI Daily Brief, or you can now subscribe to the audio-only version ad-free in, Apple Podcast. So if you just listen and you get it through Apple Podcasts, you can now subscribe

Starting point is 00:00:40 directly there. I'm working on having Spotify's ad-free subscription coming as well. So soon you will have multiple choices, Patreon, Apple, and Spotify. But with that, let's dive in. Welcome back to the AI Daily Brief Headlines edition, all the daily AI news you need in around five minutes. We kick off today with a new report from the National Institute of Standards and Technology that finds deep seek lacking in both performance and security. Announcing the report, Commerce Secretary Howard Lutnik wrote, Today, thanks to President Trump's AI Action Plan, the Commerce Department and NIST's Center for AI Standards and Innovation,

Starting point is 00:01:14 have released a groundbreaking evaluation of American versus adversary AI. Result, American AI models dominate. Our systems outperform deep seek across nearly every benchmark. The report is clear. Deepseek lags far behind, especially in cyber and software engineering. These weaknesses aren't just technical. They demonstrate why relying on foreign, AI is dangerous and short-sighted, allowing our adversaries to control AI poses serious risks

Starting point is 00:01:37 to our security by setting the standards, driving innovation, and keeping America secure. The Department of Commerce is helping ensure continued U.S. leadership in AI. Now, in addition to lack of performance, NIST found that DeepSeek was more expensive than comparable U.S. models. They claimed that one of the U.S. models used as a reference cost 35% less on average to complete the 13 performance benchmarks that were tested. NIST also found that Deepseek's models were 12 times more likely than U.S. Frontier models to allow malicious instructions designed to derail them from the desired task.

Starting point is 00:02:07 NIST wrote, hijacked agents sent fishing emails, downloaded and ran malware, and exfiltrated user login credentials all in a simulated environment. Deepseek's models were also far more susceptible to jailbreaking, responding to 94% of malicious requests after using a common jailbreaking technique compared to just 8% for U.S. reference models. NISC found that Deepseek's models echoed four times as many inaccurate and misleading CCP narratives as U.S. models, and finally and perhaps most concerningly, they found, adoption of Chinese models has greatly increased since DeepSeek R1 was released. They anchored this claim on the statistics that downloads of Deepseek models are up

Starting point is 00:02:40 1,000% since January of this year. Now, some people were a little skeptical of this, but mostly it was just a Rorschach test for wherever your politics happened to already be. How much it changes any sort of AI policy, I'm a little bit more skeptical of. Next up, a little in the AI hardware realm, Apple has scrapped plans to iterate on the Vision Pro and will shift focus to developing AI smart classes. It seems that meta's Raybans were just to, too compelling, forcing Apple to scuttle their existing plans. Bloomberg's Apple

Starting point is 00:03:06 specialist Mark German reports that Apple had plans to develop a cheaper, lightweight version of the Vision Pro, which was on track for release in 2027. Sources said that last week, however, Apple announced internally that staff would be pulled off that project to accelerate work on smart glasses instead. Plans are reportedly to work on two different versions of the glasses product. A cheaper version dubbed the N50 will compete with the original meta-raybans, and a higher spec version will include a display to go up against the newly released meta-rayband display. The N50 is expected to be ready for unveiling next year ahead of a 2027 release,

Starting point is 00:03:37 while the display version isn't expected until 2028. The design for Apple's glasses seems to stick close so far to META's product line. Apple plans to use voice controls and integrated AI as the core interface. The glasses will also feature speakers for music and cameras for media recording, and Apple is also reportedly exploring integrating health tracking capabilities for the device as well. The reporting also seems to suggest that Apple has come to consider it a mistake to have tried to sell a $3,500 consumer device that's not comfortable enough to use for long periods of time. Right Skirman, Apple executives have acknowledged the product shortcomings in private,

Starting point is 00:04:08 viewing it as an over-engineered piece of technology. The game's not over, but it certainly puts more evidence of the column that smart glasses, for the moment, are at the very tippy top of the format war for AI devices. Speaking of devices, Amazon has released a new line of smart devices built specifically for their AI assistant Alexa Plus. The entire range of Echo Smart speakers have been upgraded with new custom silicon featuring an AI accelerator. This will allow the new devices to provide local inference for Amazon's AI models, and the devices will also include a new custom sensor platform designed to make ambient AI feel more natural. The sensors include cameras, audio, ultrasound, Wi-Fi radar, and an accelerometer. Basically, everything imaginable

Starting point is 00:04:47 to make the AI model aware of its surroundings. The goal is to make interactions with the ambient to AI feel more natural and responsive. Some of the devices enable new AI features across Amazon's smart home systems. The latest ring cameras now have facial recognition technology that can keep track of friends and family and distinguish them from lurking strangers. Another feature called Search Party allows users to send out an alert across networked cameras in a neighborhood to search for a lost pet. This is the first big product refresh since Amazon hired product chief Panos Panay away from Microsoft back in 2023. Panay told Bloomberg, my belief is that our job is to make devices the next big business at Amazon. And AI is very clearly right at the core of the

Starting point is 00:05:24 strategy, which Penae articulated as great products made even better through ambient AI. Lastly today, a relevant one given all the dust up around SORA 2 and the presumption that all of this is leading to ads in chat GPT, meta has crossed the Rubicon and will start to target ads based on users' AI chats. Meta announced a change to their recommendation system on Wednesday that will see AI interactions used to personalize content and advertising delivery across their apps. Meta gave the example of a user asking their chatbot for nearby hiking recommendations. The user might then be served hiking-related content and ads for hiking boots or other gear. Users won't be able to opt out, but sensitive topics will be automatically excluded.

Starting point is 00:06:03 These include politics, religion, sexual orientation, health, and racial origin. The change will go live in December, and the policy won't apply in the UK, Europe and South Korea due to their stricter tech privacy rules, although Meta plans a compliant rollout at a later date. Christy Harris, the privacy policy manager at Meta said, people's interaction simply are going to be another piece of the input that will inform the personalization of feeds and ads. We're still in the process of building the first offerings

Starting point is 00:06:25 that will make use of this data. Look, it feels pretty inevitable that ads are going to be a part of the AI landscape. I think to some extent the question will be, for people who are already paying for subscriptions are they going to have to deal with ads as well? What sort of privacy controls are there be? All of these questions remain to be seen, but it is not surprising that we are getting to this point

Starting point is 00:06:43 frankly, it's just surprising that it's taken this long. For now, that's going to do it for today's headlines. Next up, the main episode. Chatbots are great, but they can only take you so far. I've recently been testing Notion's new AI agents, and they are a very different type of experience. These are agents that actually complete entire workflows for you in your style, and best of all, they work in a channel that you already know and love

Starting point is 00:07:06 because they are purpose-built Notion super users. Notion's new AI agents completely expands the range of what Notion can do. It can now build documents from your entire company's knowledge base, organize scattered information into organized reports, basically do tasks that used to take days and get them complete in minutes. These agents don't just help with work, they finish it. Getting started with building on Notion is easier than ever. Notion agents are now your very own super user to help you onboard in minutes.

Starting point is 00:07:31 Your AI teammates are ready to work. Try Notion AI for free at the link in our show notes. This episode is brought to you by Blitzy, the Enterprise Autonomous Software Development Platform with Infinite Code, context. Blitzy uses thousands of specialized AI agents that think for hours to understand enterprise-scale code bases with millions of lines of code. Enterprise engineering leaders start every development sprint with the Blitzy platform bringing in their development requirements. The Blitzy platform provides a plan, then generates and pre-compiles code for each task. Blitzy delivers

Starting point is 00:08:01 80% plus of the development work autonomously while providing a guide for the final 20% of human development work required to complete the sprint. Public companies are achieving a 5x engineering velocity increase when incorporating Blitzy as their pre-IDE development tool, pairing it with their coding co-pilot of choice to bring an AI-Native STLC into their org. Blitzy is providing a limited time, 30-day free proof of concept for qualifying enterprises. The team will provide a 5x velocity increase on a real development project in your org. Visit blitzy.com and press book demo to learn how Blitzie transforms your STLC from AI-assisted to AI Native. That's BLITZY.com.

Starting point is 00:08:38 As a consultant, responding to proposals can often feel like playing tennis against a wall. You're serving against yourself trying to guess what the client really wants. That all changes with the Insight Wise proposals platform. Now you've got an AI coach that thinks just like your client. It returns to the brief time and time again, identifying opportunities, showcasing your track record, and making recommendations to improve your pitch. Suddenly you're on center court, but this time you've got a secret weapon. Insight Wise does a way with all the time-consuming manual work,

Starting point is 00:09:07 so you can focus on winning more business more often. Generate reports, pull insights from your own data, build competitive advantage, and go to sleep before 2 a.m. When it comes to proposals, you only get one shot. With insight-wise, make yours an ace. AI isn't a one-off project. It's a partnership that has to evolve as the technology does. Robots and pencils work side-by-side with clients

Starting point is 00:09:29 to bring practical AI into every phase, automation, personalization, decision support, and optimization. They prove what works through applied experimentation and build systems that amplify human potential. As an AWS-certified partner with Global Delivery Centers, robots and pencils combines reach with high-touch service, where others hand off they stay engaged, because partnership isn't a project plan. It's a commitment. As AI advances, so will their solutions. That's long-term value. Progress starts with the right partner. Start with robots and pencils at robots and pencils.com slash AI Daily Brief.

Starting point is 00:10:04 Welcome back to the AI Daily Brief. Today we are discussing AI and scientific discovery. And of course, at least part of the context has to be the launch of SORA 2. We talked extensively about this in yesterday's episode, but if you had to take just one meme that best summed up the enfranchised critique, let's say, of OpenAI's announcement, not just of the SORA 2 model, but of the SORA app to go with it, it was really a critique of what Open AI is choosing to spend its time on. One example of this was Rudder Tushar who said, Sam Altman two weeks ago, we need $7 trillion and 10 gigawatts to cure cancer. Sam Altman today,

Starting point is 00:10:40 we are launching AI slot videos marketed as personalized ads. Now, there was a lot of that going around, enough that it clearly got under Sam Altman's skin. In fact, he responded to that one saying, I get the vibe here, but we do mostly need the capital for building AI that can do science. And for sure, we are focused on AGI with almost all of our research effort. It is also nice to show people new cool tech products along the way, make them smile, and hopefully make some money given all that compute need. When we launched chat GPT, there was a lot of who needs this and where is AGI. Reality is nuanced when it comes to optimal trajectories for a company. So let's hold aside Sam's response in whether you think it's legitimate or not.

Starting point is 00:11:15 The point is that a lot of people are saying, we were promised AGI in scientific discovery and we got another social media app. It harkens back to the famous Founders Fund manifesto where they wrote, we wanted flying cars. instead we got 140 characters, this sense that there is some deeper value to big, massive machines and inventions and discoveries, as opposed to just social media attention sucks. This question has actually been lurking around the open AI space for a lot of 2025. One of the big stories of this year, of course, was Mark Zuckerberg parading around the valley, trying to poach engineers to stock his superintelligence lab. And while a lot of that effort was very successful and a lot of incredibly talented people came over to Meta,

Starting point is 00:11:55 there were some who turned down even extremely generous compensation packages, and a lot of the scuttle butt and buzz around AI circles was that those folks just weren't willing to risk that ultimately what they were going to have to spend all of their brain power and all of Meta's compute on was improving ad click-through rates. Now, one of the things that was interesting about this conversation yesterday as it happened alongside SORA was that we didn't just have to speculate

Starting point is 00:12:20 that some number of researchers and top minds weren't going to be interested in that sort of work. We actually had an example of researchers who had left OpenAI and Google and meta to build something that had a much heavier scientific discovery type of focus. The company was called Periodic Labs. Its goal is to use and build AI that can actually accelerate discovery in fields beyond computer science, such as physics and chemistry. And its announcement was very explicitly positioned

Starting point is 00:12:48 as being about AI researchers getting sick of working on consumer AI and moving on to a higher purpose. Indeed, the New York Times article about periodic labs starts with one of these stories. They write, this summer Mark Zuckerberg invited Rashab Agrawal to join the company's new AI lab, offering him millions of dollars in stock and salary. With the new lab, Zuckerberg said he wanted to build superintelligence, a technology that could eclipse the power of the human brain. Though no one knew how to create superintelligence, he urged Dr. Agarwal to make a leap of faith. In a world that is changing fast, Zuckerberg told him, the biggest risk you can take is not taking any risk. But although Dr. Agarwal was already a meta-employee, he turned down the offer to join another company.

Starting point is 00:13:26 That company is, of course, periodic labs. In discussing periodic's goals, founder Liam Fida said, the main objective of AI is not to automate white-collar work. The main objective is to accelerate science. Now, we talk in this show all the time about the difference between efficiency AI and opportunity AI, and this is, of course, in the context of enterprises deploying AI at work. The idea of efficiency AI is thinking about AI simply as a way to do what is currently done, but faster, cheaper, or maybe better, but still doing the same thing. There's nothing wrong with efficiency AI. People should leverage those gains. They're going to become table stakes. But the real opportunity in what will differentiate

Starting point is 00:14:04 companies I have always said and believed is those who think about it as a new opportunity technology, a technology, in other words, that opens up things that weren't possible before. The founders of periodic labs are taking a similar assessment and have built their company around the premise of answering how to actually make that real. and figure out a gap in the market that opens up that possibility. So what does Periodic Labs do? Simply put, their goal is to accelerate science. In their announcement post, they wrote,

Starting point is 00:14:29 Our goal is to create an AI scientist. Science works by conjecturing how the world might be, running experiments and learning from the results. Intelligence is necessary but not sufficient. New knowledge is created when ideas are found to be consistent with reality. And so at periodic, we are building AI scientists and the autonomous laboratories for them to operate. And really at core of what they think is missing, is that while, yes, current models have read

Starting point is 00:14:53 everything that's available, ultimately to make new discoveries you need practical application and experimentation. As they put it, as any scientist knows, though rereading a textbook may give new insights, they eventually need to try their idea to see if it holds. So basically what they want to do is connect the dots between human researchers, AI agent experiment designers, and autonomous and robotic labs where those experiments can be conducted. The way the neuron framed it was this. Human evaluers initiate, AI agent designs experiments, the robotic lab executes them,

Starting point is 00:15:24 and nature itself provides a reward signal, did the experiment work, and from there, data improves the models. Basically taking the scientific method and AIifying it. The company is starting with physical sciences, because they say physics is a verifiable environment. They note that AI has progressed fastest in domains with data and verifiable results. And this is what they mean when they say that nature is the reinforcement learning environment. Now, part of the strategy is to collaborate with industry right from the get-go. For example, they are already working with a semiconductor manufacturer on issues around heat dissipation on their chips.

Starting point is 00:15:56 Now, as part of the coming out party this week, Periodic Labs announced that it had raised over $300 million in seed funding from a who's-who of investors including Andrewson Horowitz, Excel, Nvidia, Jeff Bezos, and many, many more. One of their investors, Bain Capital Ventures, wrote the rare investment announcement post that uses historical analogy well. They begin with an exploration of the difference in science before. the telescope and after the telescope. They write, The history of science is full of similar examples in which technological progress

Starting point is 00:16:23 enables the invention of new scientific instruments, which in turn leads the new scientific discovery. Basically, that there is a relationship between the discoveries of science and the technology of science, that they are mutually reinforcing in a positive feedback cycle. The point is this. Galileo, they write, had the newly invented telescope. We have newly developed AI systems. What can we see now that we couldn't be for? To the extent there was concern around the sloppification of AI with SORA and before it the Meta Vives app, the periodic announcement saw almost the inverse excitement, with just an incredible amount of enthusiasm, not just from the AI industry, but from many different parts of academia, science, research, and beyond.

Starting point is 00:17:01 Now, one of the other comparison points in that August article about turning down Zuckerberg's offers was all of the people who were leaving to work with former OpenAI CTO Miramirani at her new startup Thinking Machines Labs. And while thinking machines isn't as aggressively self-styled as a place for physical scientific research as periodic labs, they are very self-consciously trying to take a different approach to spreading and expanding knowledge around AI as opposed to the other labs. And right as OpenAI was announcing SORA 2, we also got the first product released from them. The product is called Tinker, and it's an API for training and fine-tuning custom models. Essentially, it's an AI infrastructure as a service, and it's meant to reduce the barrier to

Starting point is 00:17:39 entry for model training substantially. Thinking Machines Lab provides the GPU cluster and the software stack, leaving customers to focus on training data and model design. Marotti posted, Tinker brings frontier tools to researchers offering clean abstractions for writing experiments and training pipelines while handling distributed training complexity. It enables novel research, custom models, and solid baselines. In comments to Wired, she added, we believe Tinker will help empower researchers and developers to experiment with models and will make frontier capabilities much more accessible to all people. We're making what is otherwise a frontier capability accessible to all, and that is completely game-changing. There are a ton of smart people out there, and we need as many smart people

Starting point is 00:18:16 as possible to do frontier AI research. Wrote thinking machine scientist John Schulman, Tinker provides an abstraction layer that is the right one for post-training R&D. It's the infrastructure I've always wanted. So the goal here is very much a democratization of frontier AI research. They're looking to help speed up innovation by enabling researchers or startups to test their ideas and days instead of weeks or months. They're trying to level the playing field, making it possible for smaller labs, universities, or even individuals to do meaningful AI work without billion-dollar budgets. They're looking for path to make AI models more useful while keeping costs down. And certainly, while there are powerful benefits from a broad research and democratization standpoint, there is also

Starting point is 00:18:54 a ton to be excited about here for enterprises. In the first wave, right after ChatGBTBT, a lot of enterprises tried to train their own models, mostly to discover the bitter lesson, that generalist models simply outperformed their limited amounts of data, even if it was data that was contextual to their firm. Bloomberg was one big example of this. However, there's still a lot of interest in custom-trained layers for models that take advantage of proprietary and non-public data, and Tinker opens up some great possibilities on that front. Basically, if it works as promised, they will be able to fine-tune models on their corpus of data without big infrastructure and an extensive AI engineering team. What's more, it opens up the possibility that individual teams

Starting point is 00:19:31 within an enterprise could actually think about this type of custom training rather than just having to wait for what the central AI group does. There's no reason, for example, that a marketing analytics team, especially if they had the right support, couldn't actually go explore that kind of customization, again, without having to get in line and wait for whatever other types of big AI infrastructure projects are going on across the company. Now, how much thinking machines is going to care about that enterprise use case remains to be seen, but it's certainly valuable. The first reactions are extremely positive. Technium, the co-founder of Distributed Training Collective Noose Research, wrote,

Starting point is 00:20:02 I had the privilege of being part of the beta for Tinker. It's a really nice project. Simple APIs for training models can make it a lot easier to properly leverage resources to get results. UC Berkeley PhD student Tyler Griggs writes very hackable and lifts a lot of the LLM training burden, a great fit for researchers who want to focus on algs and data, not infra. Former OpenAI co-founder and Vibecoat terminology coiner, Andre Carpathy writes, Tinker is cool. If you're a researcher or developer, Tinker dramatically.

Starting point is 00:20:26 simplifies LLM post-training. You retain 90% of algorithmic creative control, while Tinker handles the hard parts that you usually want to touch much less often, meaning you can do these at well below 10% of typical complexity involved. Compared to the more common and existing paradigm of upload your data will post-train your LLM, this is, in my opinion, a more clever place to slice up the complexity of post-training, both delegating the heavy lifting, but also keeping majority of the data and algorithmic creative control. GDP at Amazon actually drew the connection between thinking machines and periodic labs. They write, Thinking Machines vision, whole world as an AI-R-L-powered lab. In that sense, it's similar to that of periodic labs, but possibly much more expansive.

Starting point is 00:21:03 Periodic labs will be the equivalent of Bell Lab of today's world, and they will conduct reinforcement learning on the basis of feedback on real-world feedback, e.g. experiments in material science. We need that, and I am rooting for their success. Thinking Machines aims to treat the whole world as an AI-R-L-powered lab. Currently, AI training, and particularly RL, is out of reach of most engineers and organizations. It is considered achievable only for a handful of labs with big egos. Also, big labs cannot collect real-world feedback data beyond a point. The real world provides the best reinforcement learning rollout data.

Starting point is 00:21:33 For example, the kind of feedback cursor gets when users either accept or reject suggestions by cursor tab model. Customer support conversations and action trajectories that leave users delighted or disappointed. Factory floor decision-making data with outcomes. With such diverse data not accessible to the big labs as it is not on the internet, the models can be made much more intelligent. There is actually an entire show that I have had half formed on the back burner about the whole world as a reinforcement learning lab and how much the labs are orienting towards that being the future

Starting point is 00:22:01 of model development. But you kind of get a little bit of a taste of it here. And what's exciting is that even if it isn't making as much headlines, this AI-driven science is not just something for the future. It seems to be moving forward right now. Over the summer, there were multiple reports of frontier models doing novel mathematics, for example. Open AI chief product officer Kevin Weill was so enthused by what he saw from science.

Starting point is 00:22:22 scientist working alongside GBT5 that he's now incubating a division called OpenAI for science. Earlier this week, an MIT student going by Asher posted, if I had a nickel for every MIT professor who told me GPT5 made a novel research discovery in the past week, I'd have two nickels, which isn't a lot, but it's strange that it happened twice. He elaborated that one of the breakthroughs was in biology and the other in math. Now one can be skeptical of Twitter hype posting, but it does seem to be reflected in many people's experience. Sam Alman even reposted it saying does feel like this is really starting to happen in tiny ways. Pryn summed up, probably the most surprising development in AI for me over the past six months

Starting point is 00:22:59 is that GPT5 Pro and even GPT5 thinking can make very small novel scientific discoveries. As a reminder, these are models that think for under 40 minutes and are not nearly as advanced as OpenAI's unreleased multi-agentic models, which we know can work autonomously for hours. It's quite easy to see why OpenAI decided to launch the OpenAI for Science Initiative now, presumably just a few weeks before the model that won gold on the IMO, IOI, and the ICPC becomes available to the public. Exciting times. Now look, I do think that OpenAI has a bit of a communication problem right now, where if Sam is

Starting point is 00:23:30 sincere, when he basically implies that the revenue from social media style applications is a relevant part of their plan to get to AGI, they've got to connect those dots explicitly because otherwise people are just going to do it for them. If they articulate why ads matter, people can disagree, but at least they won't be speculating. But still, for those of you who are disappointed by the social media orientation of the AI labs, just know that there is a heck of a lot out there happening that is about insanely ambitious, world-changing scientific research. And it's not far off in the future.

Starting point is 00:24:01 It's happening right now. I'll continue to try to cover as much of that as I can as it comes up. For now that, that's going to do it for today's AI Daily Brief. Appreciate you listening or watching. As always, and until next time, peace.

The AI Daily Brief: Artificial Intelligence News and Analysis - When Will AI Make Scientific Discoveries?

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.