Everyday AI Podcast – An AI and ChatGPT Podcast - EP 454: OpenAI’s Deep Research - How it works and what to use it for

Episode Date: February 4, 2025

Another "Deep Research"? OpenAI just released its version of Deep Research, not to be confused with Google's own Deep Research. But not all AI-powered Deep Research tools are the same, ...believe it or not. We show you how OpenAI's newest agentic tool works and how you can use it.Newsletter: Sign up for our free daily newsletterMore on this Episode: Episode PageJoin the discussion: Ask Jordan questions on OpenAIUpcoming Episodes: Check out the upcoming Everyday AI Livestream lineupWebsite: YourEverydayAI.comEmail The Show: info@youreverydayai.comConnect with Jordan on LinkedInTopics Covered in This Episode:1. Overview of OpenAI’s Deep Research2. Comparison to Google’s Deep Research3. How Deep Research Works4. Use Cases, Limitations, and Best Practices5. Deep Research Larger ImplicationsTimestamps:02:30 Daily AI news07:10 Breaking down OpenAI's Deep Research11:10 Specifying Earnings for Analysis18:02 AI Research Model Comparison20:17 OpenAI's Deep Research vs. Operator25:19 OpenAI's Agentic Digital Detective29:00 Challenging Benchmark: "Humanity's Last Exam"31:51 AI's Expanding Context Window34:01 Top AI Tools for Research37:11 AI Content Filtering & Safety Measures42:55 Researching Nike's Fiscal Numbers44:42 Nike vs. Adidas Financial Comparison47:44 Quality Over Quantity in Research50:38 Renegotiate Contracts for AI Tools54:23 Deep Research Use CasesSend Everyday AI and Jordan a text message. (We can't reply back unless you leave contact info) Start Here ▶️Not sure where to start when it comes to AI? Start with our Start Here Series. You can listen to the first drop -- Episode 691 -- or get free access to our Inner Cricle community and all episodes: StartHereSeries.com Also, here's a link to the entire series on a Spotify playlist. 

Transcript
Discussion (0)
Starting point is 00:00:00 This is the Everyday AI Show, the everyday podcast where we simplify AI and bring its power to your fingertips. Listen daily for practical advice to boost your career, business, and everyday life. Meet Firefly AI Assistant, now live and Adobe Firefly, the All In One Creative AI Studio. Just describe what you want to create and the assistant handles the rest, orchestrating multi-step workflows across Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome. The assistant accelerates execution. Wait, there's another deep research in town.
Starting point is 00:00:50 Yeah, you didn't misread that the other day when Open AI just announced their version of deep research, which is very different than maybe the deep research that you've been using from Google. So today we're going to explain to you what Open AI's new. deep research is how it works and what you should use it for. All right. I hope you're excited for today's show. I am too because this is a pretty big, agentic step from Open AI. And it's already changing the way that I use the internet, just like Google's deep research
Starting point is 00:01:31 did when it came out. All right. I'm excited to talk. Hope you are too. And if you're new here, what's going on, y'all? My name's Jordan Wilson, and this is Everyday AI. This is your daily live stream podcast and free daily newsletter, helping us all not just understand what's going on with AI and all these AI developments,
Starting point is 00:01:49 but how we can all actually use them to grow our companies and our careers. Yeah, because if you thought you were using deep research and on the cutting edge, well, not if you're still using Google's version, although I think it's very good and very different. So we're going to get to that in a couple of minutes. So if you are new here, thank you for tuning in. Make sure if you're on the podcast, check out your show notes. we always leave some very helpful notes in there, as well as a link to our website.
Starting point is 00:02:13 That's going to be your best friend. We want you to be the smartest person in AI at your company or in your department. And our website is how you do that, your everyday AI.com. There, you can sign up for our free daily newsletter where we recap the podcast episode that we do each and every day, as well as keep you up to date with all of the other latest AI news and what it means for you. And we have like 430 some podcast episodes, all sorted by category, all for free on our website. All right, so make sure you go check that out.
Starting point is 00:02:43 And while you're there, make sure you check out our 2025 AI predictions and roadmap series. Yeah, just because it's February doesn't make that any less valuable. I'm going to keep talking about it at least for another week or so because you need to go check it out. A lot of these things that I, you know, we released like two or three weeks ago have already started to come true. So you need to go pay attention to that. All right. So I am very excited today to talk about Open AI's deep research.
Starting point is 00:03:10 Live stream audience, thank you for tuning in. Let me know if you want to see an Open AI versus Google Deep Research show in the future. I'll put that together. But let's first start as we normally do by going over the AI news. So meta may not use some of their most powerful AI models, according to a new report, if it's too dangerous. So Meta CEO Mark Zuckerberg has announced plans to eventually make AGI or artificial general intelligence openly available. However, the company has outlined scenarios where it might not release certain AI systems due
Starting point is 00:03:46 to potential risks. So according to new reports, Meta's new policy document, the frontier AI framework identifies quote unquote high risk and critical risk AI systems. These are systems capable of aiding in cybersecurity, chemical and biological attacks. The distinction between them lies in the severity and manageability of the potential outcomes. High risk systems could facilitate attacks but are not as reliable as critical risk systems, which could lead to catastrophic results that cannot be mitigated. Yeah, that's not scary stuff.
Starting point is 00:04:18 So yeah, if a system is deemed high risk, meta will reportedly limit internal access and delay release until those risks are mitigated. So at least that's some good news there. All right, some not so good news. Salesforce is cutting jobs while doubling down on AI. Speaking of trends that I talked about, I said, that's going to be a big trend, is you're just going to see a lot of companies not hiring in 2025 or only hiring for AI roles.
Starting point is 00:04:45 So Salesforce, no different, just announced that they are initiating job cuts that affect more than 1,000 roles as reported by sources familiar with this, with the situation, even as it continues to hire for new AI products. So the company is focusing on hiring salespeople for its AI agent products while maintaining a focus on profit margins due to pressure from activist investors. So displaced employees will have the opportunity to apply for other internal roles, according to the source. All right.
Starting point is 00:05:20 And then last but not least, Open AI making some more big partnerships in Asia. So Open AI is making significant moves in Asia now partnering with South Korea's K-K-K-K-K-K-A-O, and also, as we talked about in our newsletter yesterday, Japan's SoftBank to expand its AI services in the region. So the newest collaboration with Kiko, hopefully that's how it's pronounced, K-A-K-A-O. So live-stream audience, let me know if I'm getting that one wrong, if you know. But it will include developing a Korean-language assistant called KANNA in integrating Open-A-I-E-I technology into Keko talk, one of the region's most popular messaging apps. So Keko will also use ChatGPT Enterprise internally,
Starting point is 00:06:05 enhancing its operational capabilities. So OpenAI's expansion here is partly driven by competition from Chinese AI firm Deepseek, which has gained traction in English language-generative AI. So these new partnerships will help Open AI train its model on Asian language content, broadening its linguistic capabilities and market reach. So this comes just hours after Open AI announced a huge new venture with SoftBank.
Starting point is 00:06:33 And they established SB OpenAI Japan, so a brand new company or a new joint venture to market an enterprise AI solution called Crystal Intelligence in Japan. That's Crystal with an I, not a Y. So according to sources there, SoftBank will invest $3 billion annually to integrate Crystal Intelligence and Open AI's chat GPT enterprise across its group companies. All right. So yeah, a lot going on, as always, with AI news. All right. So let's get into it.
Starting point is 00:07:08 Love to love to see all the live stream people. So thank you for tuning in as always. Yeah, if you're a regular listener of the podcast, maybe you want to ask questions when we have guests on. Come join the live stream. 7.30 a.m. Central Standard time. We do it on LinkedIn, Twitter, everywhere else. So thank you, everyone for joining us, Michael and Big Bogey.
Starting point is 00:07:26 Douglas, Marie, Zolfia, everyone else. You know, thanks for joining. Jackie, Fred, my Chicago peeps. Good to see you all. So let's get straight into it and talk about Open AIs deep research, what it is, how it works, and what it can be used for. All right. So here's first things first. Probably not going to have access to it, at least right now.
Starting point is 00:07:53 So right now, it is only available to chat, GIsle. P.T Pro subscribers. So that is the $200 a month pro plan. All right. But it will be rolling out, quote unquote, soon, according to Sam Altman, to chat GPT plus users. So if you have that $20 a month subscription, you will get 10 searches a month. All right. If you have the $200 a month pro plan right now, you get 100 searches a month. And even free users will get a few searches a month. So I'm guessing Free users are going to get maybe like two. And then we have the chat GPT plus users with 10 a month and then chat GPT pro. But right now, you only have access to it with chat GPT pro.
Starting point is 00:08:38 So if you do want FII, y'all know I do have the pro account. So if you have questions, you want me to run, go ahead and leave them now. And maybe I'll leave some of those results in the newsletter. So access is the biggest thing. So very few people are going to have access to this now, but I would, I would guess by sometime in March, OpenAI might start rolling out access to everyone else. All right. So we're going to do this one a little different and look at it live right away.
Starting point is 00:09:11 All right. So live stream audience, do me a favor. I have a kind of different setup here. So hopefully everyone can see my screen here as I share it. So live stream audience, if you could, let me know if you can see my screen, that would be great. And we're going to do just two quick examples live, right? Nothing like doing generative AI demos on a live stream, right? Like what could go wrong? Well, everything. So hopefully y'all kids can see my screen here. But I'm going to go ahead and
Starting point is 00:09:42 kind of even walk or talk the podcast audience through what we have going on here. So if you are on the pro plan or once it does roll out to plus subscribers, you will see a new button here called Deep Research. All right. So it's a little different. It's not a mode that you would select in the drop down menu, right? Which now it's like naming alphabet soup, right? Like, oh, there's 03 mini and 03 mini high and 01 Pro, right? If you have the pro account.
Starting point is 00:10:15 All right. Thanks, Michael and Marie and, uh, Sandra for let me know you can see the screen. All right. So now when you log into chat, GBT, once you do have access to this, you will see a new deep research button. All right, and it is very much different than the deep research from Google. I'm going to show you how. But we're going to do just two of these live right now. So you can kind of see how it works. And most of these take between five and 30 minutes. That's why I'm going to start two different deep research queries at the beginning of this show. And then,
Starting point is 00:10:49 the very end, we're going to check in on them and see how they did. All right. So the first one, I'm saying, provide an analysis of Nike's latest quarterly performance compared to Adidas and Under Armour, include key financial metrics, recent earnings, commentary, and relevant market news cite all sources and highlight any discrepancies in analyst opinions. All right. So this is something, think, competitive.
Starting point is 00:11:16 I mean, there's so many different use cases. All right. So actually, if you share this show, I have 10 use cases, business use cases. I already built at the end. So if you share that, I'll send it, share it with you. All right. So I'm going to go ahead now and click the deep research button. Okay, so you have to have that toggled on. And then I'm going to go ahead and click the send button. So what's going to happen first is deep research reads this. And then it's going to ask me some questions. So it's just like Google's deep research does, it wants to clarify. Because before it starts off on a five to 30 minute journey, you really want to make. make sure it has it right. So it's saying to provide a detailed and accurate analysis,
Starting point is 00:11:52 could you specify which quarter you're referring to? All right. So I'm going to say, I'm just going to say Q4 of 2024. Actually, I'm not even sure if all Q4 earnings are out for these companies. So I'm actually going to say Q3 of 2024 just to make sure because, yeah, I think quarterly earnings, at least for a lot of the tech companies, have happened this week. So I want to make sure, I don't know if Nike and Under Armour and Adidas. So I'm just going to say quarter of three of 2024. So I know that all that information should be there. And it should be publicly available. So for publicly traded companies, you know, you can always go and look at their, what is it, 10Ks or whatever to see all their financials. All right. So that's all. It's asking me some
Starting point is 00:12:32 other, some other questions. Do you have any preferred sources? So I'm just going to say, I'm going to say preferred sources. I'm going to say reputable sources, right? I would normally go through and leave a little bit better feedback on this, right? And also, huge tip, I would first go through and give it access to more information that's relevant to the reason that I'm using deep research, right? So if you've taken our prime prompt polish course going through the refined queue steps, I would generally go through that, but we're doing a live demo here. So I'm going to go ahead, just answer those two questions and go ahead and click the send button. All right. So I'm going to wait and just make sure.
Starting point is 00:13:13 So now deep research is responding and it says got it. It's going super slow today. So that's always great for live demos. It's essentially saying, got it. I will provide the findings once the research is complete. So what I've seen here is it kind of gets sometimes hung up on that before it gives you this new prompt where it says like starting research. And then you can kind of see a progress bar.
Starting point is 00:13:35 So if you don't see that right away, don't worry about it. It normally pops up. So even right now, it's just saying starting research and it's not actually researching just quite yet. All right. So now, like I said, I haven't tested how many concurrent deep research chats we can do. So I'm going to go ahead, pop open another one. And for this one, hopefully live stream audience, let me know if you can see the new tab. So I'm keeping this one very open-ended.
Starting point is 00:14:05 And I'm doing this for a reason. And then at the end, I'm going to walk you through these two deep research queries. So this one, I'm just saying, find the latest news and information about deep seek. All right. Yeah, something that the whole world is talking about. So some qualifying questions saying, are you looking for information on deep seek AI, the language model, or something else related to deep seek? Also, do you need technical details, business updates, or general news? So I'm going to say deep seek AI, the LL.
Starting point is 00:14:38 M company and I'm going to say, you know, V3 and R1, those are the things that I want to know about. And now it's asking for like the news. And I'm going to say just, I'm going to say just give me everything, right? All right. So now I'm going to go ahead, click enter on those. So we should have, let me just go over and check. Yep, there we go. So we have our Nike's market dynamics already working.
Starting point is 00:15:04 And I'm going to go ahead and explain here in a second kind of this activity. and sources. And then I'm guessing our deep seek researcher here, right? That's so meta to say deep, deep, deep seek researcher within deep research mode in open AI, not to be confused with Google's deep research, right? All right. So it looks like hopefully the second one will kick off here. It's being a little slow this morning, but that's okay.
Starting point is 00:15:31 All right. So let's get back into the details to understand it. All right. So we're already kicked off the AI news for today. We already got two of our kind of deep research prompts running. Now, while we give it time, because like I said, it's going to take anywhere from five to 30 minutes. All right. So let's go over how to use it.
Starting point is 00:15:54 So I just showed that to you. So number one, you know, maybe you just joined in the middle of this live. Well, you have to have that $200 a month right now, chat GPT Pro plan. And then make sure you always click the deep research button. Okay, it is not a drop down. You have to click that button. So here's how Open AI describes it. And this is very important because it is agentic and we're going to show you how it's actually agentic and what that kind of means.
Starting point is 00:16:20 But how Open AI describes it is an agent that uses reasoning to synthesize large amounts of online information and complete multi-step research tasks for you available to pro users today plus in Teams Next. good questions here coming in. So if you do have questions, please go ahead and get them in. So a couple questions already coming in. Let's see. Michael says, can you upload files right now? You cannot. Another good question here from Marie. Is there a token limit with deep research? Yes, I believe The context window is, I believe, 200,000, but I will say this. I don't know if that will be the same. Adobe just introduced an entirely new way to create, bringing the power and precision of its creative suite into one conversational experience.
Starting point is 00:17:21 Meet Firefly AI Assistant, now live in the Adobe Firefly app, the all-in-one creative AI studio. Powered by Adobe's Creative Agent, Firefly AI Assistant lets you start with your vision, just describe what you want, and shape the outcome as it takes form with the assistant. The assistant orchestrates multi-step workflows, drawing on 60-plus pro-grade tools across Adobe Creative Cloud apps, including Photoshop, Illustrator, Premiere, Lightroom Express, and more to help bring your ideas to life. You can also get started with creative skills, a growing library of pre-built workflows for
Starting point is 00:17:57 common creative tasks, like batch editing photos, creating mood boards, portrait retouching, and creating social variations. Every step the assistant takes is visible so you can refine, redirect, or take over at any time. You stay in the driver's seat as the creative director. Adobe Firefly AI assistant now in public beta. See it today at firefly.adobie.com. For chat GPT plus users or if that 200K is more aligned to pro users. So just keep that in mind.
Starting point is 00:18:34 All right. also let's first quickly talk about the difference, right? So not just the difference between Google deep research and Open AI's deep research, which I can't even really get into, but I'll say this. Google deep research doesn't really have agentic or reasoning capabilities. All right. It essentially uses Google caches of web pages, right? And it goes through a ton.
Starting point is 00:18:59 Google usually can go through 20, 50, 500, more than a thousand. cached pages. So as far as I know, it doesn't actually visit those pages, but Google has a very updated cache version of all of those pages. And essentially, it uses, Google Deep Research uses a transformer model and summarizes all of the information on all of those pages. Whereas Open AI's deep research is very different. It is agentic.
Starting point is 00:19:28 So I will say that Google's deep research is omnidirectional, right? So after you do confirm, you have to do. do the same thing in Google deep research. And I did a whole episode on that. So you can go back and listen to that. But it's omnidirectional. It's not going to change directions based on the findings. Opening eyes deep research is a little different. It is agentic. Right. So if it finds something in the beginning of its research that changes the direction or the path, it is going to shift and pivot and go and look into those things a little bit more. So, and it is using a reasoning model. So Google, uses a transformer non-reasoner model, whereas deep research uses a fine-tuned version of an unreleased
Starting point is 00:20:12 model in 03. So we do have O3 Mini and O3 Mini-Hi already released even for free users inside of chat GPT. But according to OpenAI, this new deep research is using a fine-tuned version of the yet-to-be released O3 full model. All right. So that's important to keep in mind. And that's one of the things that differentiates it from Google deep research, right? So multi-directional and agentic versus omnidirectional and more transformer model-based. Then we have to talk about the difference between this and Open AI's other recent kind of agentic or, you know, task-based features, right? Because these are different because just in the last couple of weeks now, we've gotten three pretty big steps. You could say two of these are agentic. So we got chat GPT tasks.
Starting point is 00:21:05 So with this, you can schedule chat GPT to do something for you. And one of the biggest things, I know this is confusing, one of the biggest things people are using chat GPT tasks for is to research, right? So you can't at least right now, you can't schedule deep research and you can't schedule operator, right? So essentially, you can schedule prompts to be run at any time, but you can do really anything within the GPT40 set of capabilities in tasks. Operator is a little different.
Starting point is 00:21:37 Operator operates technically outside of chat GPT. And that also right now is only available to chat GPT pro, where task is available to chat GPT plus as well. So operator, it is literally a virtual agent that operates a virtual desktop, right? Very cool. you can very much control it. Whereas deep research, the newest release from OpenAI, you don't have that kind of granular control.
Starting point is 00:22:09 However, it does work, quote, unquote, inside chat GPT. And it is going to go off on its own. So in theory, you could accomplish the same things in deep research and operator, but there's more human in the loop if you wanted to do that in operator, if that makes sense. Because deep research, it's working right now, right? I can do a lot of those things in operator, but like, why would you want to? Right?
Starting point is 00:22:33 Because if you wanted to do a certain predetermined step of like agentic work, you might as be doing it as the human, right? But operator, I think is a very already, I think it's already being slept on and it's amazing. All right. So hopefully that makes sense. So we do have three very different new offerings and features from chat GPT. So Ken is asking, is it better than Google deep research or not?
Starting point is 00:23:01 Well, it's different. It's different. And it depends on what you want it to do, right? So one thing we'll see as we look at the results. Chad Chb-T, their version of deep research does not necessarily go to the same. It's not a quantity play. I think Google does a quantity play. And by in doing so, they hope the quality,
Starting point is 00:23:25 rises to the top. And I love Google Deep Research. Don't get me wrong. I call it one of the best tools of 2024. I mean, Google has come roaring back the last two months. Actually, should have a very special guest from Google this week for you all that I think you're going to want to pay attention to. So it's different, Ken. It's very much an agentic tool, an agentic deep researcher versus something that just goes to potentially hundreds of websites. And both of them work kind of the same way. Another important thing for people to keep in mind, because everyone's like, oh, Open AI, just copied Google. Oh, not really, right? The term deep research was actually pegged to reporting on OpenAI back to, I believe, of June of last year,
Starting point is 00:24:16 before Google's deep research, right? So, you know, it was reported on more than eight months ago, kind of this deep research feature within Open AI. So everyone's like, oh, Open AI just copied Google. I mean, I don't know. You know, but it is very similar, but it's also very incompletely different. But yeah, maybe if there's an appetite, y'all, if you want to see a dedicated Google deep research versus OpenAI deep research, let me know. All right.
Starting point is 00:24:43 So like I said, I call OpenAI deep research more of an agentic detective. And it makes iterations in between research runs like a human would. And Open AI did distinctly say this is an important step toward their goal of achieving AGI or artificial general intelligence or essentially when one AI system is way smarter than any human on every single meaningful knowledge work task. All right. So I actually put this little tweet out there, a quote from when OpenAI had their announcement video, which was Sunday night.
Starting point is 00:25:19 So from Mark Chen, who leads research, he said, We think it's important for our models to start doing autonomous tasks for much longer in an unsupervised way. Our ultimate aspiration is a model that can uncover new knowledge for itself. All right. So it's a deep research on the surface. But opening AI is literally saying this is the first step toward AI that can go work unsupervised. so without human supervision, right, and go learn new or uncover new knowledge for itself, not for us, right?
Starting point is 00:26:04 I don't know why no one was talking about that. Like, I heard that. I was actually driving and listening to the live stream in my car. I heard that. I was like, wait, what? I was like, WTF. Did he just say that? And I went back and listened to it.
Starting point is 00:26:16 And I'm like, yeah. He just said, our ultimate aspiration is a model that can uncover new knowledge for itself. So you have to think right now. right? We are kind of training and using this. And Open AI is presumably going to be using all of this training data to make an operator model that could just make itself better, right? We always talk about the steps needed to go from AGI to the big scary ASI artificial super intelligence. And that's essentially when AI can make itself better. Well, Open AI kind of just said the quiet part out loud.
Starting point is 00:26:51 It's like, hey, we're hoping our future models are just going to go work. on their own and discover new knowledge for itself, not for the human user. All right. So, yeah, had to point that one out. All right. So what am I talking about a digital detective in this omnidirectional and how the heck is this thing agentic, right? Because if you've used Google Deep Research, I mean, it's amazing, but I wouldn't think
Starting point is 00:27:17 anyone's calling it agentic, right? It's just looking through a cache of dozens or hundreds or more than a thousands of webpages. and summarizing the most important information to resolve your query. OpenAI's operator, much different. And we're going to look at this. So when you use it, there's an activity tab in a source tab. So you can see as it searches for things. And if it finds something, and according to OpenAI,
Starting point is 00:27:44 it looks like their version of deep research might actually visit the page and not work on a cached version, which would make sense, because although I'm sure that OpenAI is working on that to improve speed and to cut down inference and compute costs on this new model. But you can literally see it and go through and look at the step-by-step activity. So it's kind of like how right now, if you're using 03 mini or 01, you can kind of see a summarizing of the chain of thought, right? You don't get the raw chain of thought, which I know a lot of people are complaining about.
Starting point is 00:28:18 I don't think it's necessarily a big deal. But you can see a summarized version. of what this new open AI's deep research is doing. And you can see when it goes in another direction because it might be reading a story, it might find something that maybe goes against something the user said. Or if you're talking about something that's very recent,
Starting point is 00:28:42 it might find brand new news, right? It might find brand new news that I didn't know about when I asked my question, right? So that's the beauty of this, right? It doesn't work at the quantity, But I think from a quality perspective, it is right up there, maybe even higher than Google's deep research. All right. Let's go over all the fine print here.
Starting point is 00:29:01 So it is an autonomous research agent, and here's the functionality. So it can independently navigate the web to gather information from multiple sources. It uses internal simulated reasoning through the O3 model, essentially the built-in chain of thought process, to plan and execute multi-step research tasks. So it can integrate with external tools, right? That's the other thing right now. So including code execution like Python in processing of multimodal inputs, although that part is not yet released, but it will be released soon. It then can produce structure output.
Starting point is 00:29:35 So yes, you can also direct deep research what you want. Do you want a table? Do you want a long, long verbios blog post? Do you want something that's more bullet points, right? Do you want something in a fun? tone and the serious tone, right? And also use the context window of that operator chat to steer it in any direction before or after. So that's the thing. Once it's done, you can continue to work in that window. That's another important thing to keep in mind. So let's talk about the actual model
Starting point is 00:30:07 that's running this thing. What's the engine? Well, it's using a fine-tuned version of the 03 reasoner model. This is an unreleased version. And also, the benchmarks on this thing are nutty, right? And you might be wondering, well, how? Well, number one, it's a model that we don't have access to anywhere else, the O3 model. This is, again, according to OpenAI, a fine-tuned version of the full unreleased O3 model, whereas right now, you know, whether you are a free, even free users have a little bit of O3 Mini, and then, you know, paid users have O3 Mini. Hi. Yes, the naming is bad. The company knows the naming is bad. Don't worry. but the benchmarks are outstanding.
Starting point is 00:30:51 So there's a new kind of, I'll call it a trendy benchmark, I guess, that was just created, I believe a couple of weeks ago called Humanities Last Exam, right? So one thing is with all of these models, there's arguments now that they're being overfitted or over trained to just perform on benchmarks, right? Because obviously these benchmarks make their way into the actual training data, right? eventually because they get talked about on the internet. And then the large language models gobble them up. But so this new Humanities last exam, it is a much more difficult kind of benchmark.
Starting point is 00:31:27 And it blew everyone else away by getting a 26% on that exam. And you'll see even the GPT4-O model, which I think is still probably because of all of the tools it has access to, it is the most powerful model in the world. I'll say that because even the O models right now, you don't have access to every all these other tools, right? You need tool use to have a model. So even the GBT40 model on this humanity's last exam got a 3.3%. Not that great, right?
Starting point is 00:32:00 There you have Grock actually did a little better with a 3.8. Claude. Everyone loves Claude. I'm not a big Claude fan. It got a 4.3. Open AIs, 01, got a 9.1. Then you had the 03 mini high, got a 13. And then this new deep research got a 26.6.
Starting point is 00:32:20 It is more than twice as good as the next best model, which is OpenAI. And it is almost three times as good as the next best competitor, not named OpenAI, which was Deep Seeks R1. And then Gemini's thinking got a 6.2. So exponentially better. However, I am curious if OpenAI. just used deep research or if deep research had access to other tools. Like as an example, did deep research have access to operator during this? Or did it just use deep research?
Starting point is 00:32:56 I'm not sure. However, that shows the power of an agenic tool that can browse the web, but also can use a reasoning model at the same time. It's fantastic, right? And we saw reports that OpenAI actually, hired countless PhD students to specifically train the O3 model. So that could be another reason as well why the full O3 model did very well, but you combine it with the ability to go research in an agentic way.
Starting point is 00:33:35 It's powerful. All right. So right now, like I said, it does support a large context window up to 200,000 tokens. I don't know once this gets rolled out to the chat GPT Plus, if you will still have that same context window, or if it will be 100,000 or if it'll be 32,000. So that matters because if you want to give deep research, a lot of context before it gets started,
Starting point is 00:33:59 you know, it can obviously go browse the web for its own context. But if you want to, you know, copy and paste a lot, or if you want to have some back and forth conversation before you get started, you will have to see once it's released on the plus version, what the context window is, and it does incorporate that multimodal handling to synthesize information. So let's talk about some professional uses, right? And again, at the end, if you share this, if you repost this on, you know, Twitter or LinkedIn, you know, send me a message, but I'll go through and look later in the week.
Starting point is 00:34:29 I will share with you a list of 10, very specific in what I think are fantastic use cases. But, I mean, talk about research and analysis and finance. That's going to be huge. All right. I actually think this is going to potentially crush the management consultant industry. If they do not use it, all right, give me like five to ten minutes and I'll show you that at the end. But I also think this is going to accelerate tasks like market research, competitive analysis, and literature reviews. Also, consumer decision making, that's a use case that, at least for me personally, I don't care as much about.
Starting point is 00:35:04 But that's something that opening I talked a lot about, that a lot of their internal employees that have been using this, presumably for months are using this to help them make better essentially e-commerce decisions, right? For me, that's not what's getting me out of the bed in the morning at 7.30 a.m. To me, it's redefining how we work as knowledge workers, but there's obviously a lot of, you know, consumer decision making or if part of, you know, what you do in your role is, hey, which vendor are we going to use this year? Are we going to use the same vendors? Well, it's great for that. And then enterprise productivity, right? Just any internal research, right? So, so much of what we do as knowledge workers is reading the internet. Think about it, right? Maybe you're
Starting point is 00:35:45 working on a new project. You're looking up competitors. You're helping do R&D for a new product line, whatever it is. So much of what we do as knowledge workers is we read the internet. Right. So the use cases are there. And that's why Google's deep research immediately and still is, I think, a top five, a top five AI tool. And I think this deep research from open AI immediately catapults itself into that top five, top three, top three place right there. So like we said, right now for its constraints in current deployment, right now, you only have 100 user, 100 queries a month on that $200 a month plan. So live stream audience, if you, if you want to test it out, if you don't want to pay that $200 a month, I still probably have like 50 or so
Starting point is 00:36:35 queries, I'll go ahead and run it for you and put those results in our newsletter. All right. And it is right now designed to balance in-depth research capabilities with significance computational costs. So this is the first version. This is the worst. It is going to be, I do believe it's not going to have that five to 30 minutes delay in the future. And also future plans include scale down versions for lower tiers and wider geographic access once regulatory concerns are addressed yet. So right now, there's a certain country. in the, I believe in the EU, Iceland, some others that just don't have access to this yet because of those kind of existing laws in those places.
Starting point is 00:37:15 Yeah, we got to talk about this. I'm not going to talk about all the sunshine and rainbows and not talk about, well, are there ethical concerns or regulatory considerations? Heck yeah. So Open AI did share and we'll link to that in the newsletter today. All the benchmarks. So they did say that there's reduced hallucination rates. But the agent still does require human oversight to verify critical information.
Starting point is 00:37:41 So just because you have an agentic reasoning model going out there and helping you with your work does not mean the human in the loop can kick their feet up and sip on coffee like I'm doing now. If anything, I think this even heightens the increased role in responsibility of the human in the loop. Right. I think the more hands off you can be with agentic AI. systems, the more vigilant the human actually has to be because it is human nature as these AI systems get more and more capable. It is human nature for us as humans to spend less time verifying or giving it good input to get it going in the right direction. So also, here's the other thing that I don't think people are talking about. Biasis. Guess what's on the internet? A bunch of garbage.
Starting point is 00:38:34 right? So unfortunately, you cannot have a definitive way to steer operator where you want it to go and to avoid, quote unquote, bad websites or low quality or websites that have biases, right? You can't do that. It is a reflection of the internet, which is a reflection of society. You can with some nice prompt engineering, kind of, you know, do some white listing or blacklisting of certain websites or certain news organizations. And Open AIs deep research to its credit does do a good job of usually asking you before it gets started at what type of sources do you want, but you can't avoid that entirely. All right now, there are some safety measures to limit disallowed content and mitigate misuse, right? So there are some built-in kind of roadblocks and guardrails in there, which is a good thing. And like we talked about, it's initially restricted from certain regions that have stricter data and privacy regulations. the EU, UK, etc. All right. So let's go back and now visit.
Starting point is 00:39:40 Let's see if both of ours worked. All right. So this is why we did too. One of them timed out. All right. So this is brand new. All right. So unfortunately, our one about Deepseek timed out.
Starting point is 00:39:53 So make sure to go check out the newsletter. We'll share the results for that one there. But let's look at the first one. So the first prompt that I gave deep research was provide an analysis of Nike's latest quarterly performance compared to Adidas and Under Armour include key financial metrics, recent earnings commentary, and relevant market news. Cite all sources and highlights any discrepancies and analysis opinions. All right.
Starting point is 00:40:19 So now all I have to do, I do hope OpenAI improves this user interface a little bit. So essentially, you have to go click this research completed thing right here. And it's very hard to see because it's in this very, small light gray font. But if you really want to see what it did under the hood, you got to make sure to go find that and click that. So once I click that, now I'm going to have this kind of two-tier thing, activity and sources. All right. So I'm not going to go through this whole thing. I'll actually probably share it in a document. And if you want, if you want, you can go download it in our newsletter. But here's kind of this agentic process it went through. So I'm reading to you the summarized version of deep research chain of thought.
Starting point is 00:41:03 So it says, I'm gathering info on Nike's fiscal year, ensuring Q3-20204 matches others. This involves comparing Adidas and Under Armour's quarters for precise analysis. I'm pulling together Nike's Q3-2024 earnings focusing on sales, income, EPS, and gross margin. So you'll see, it's already doing things that I didn't tell it to, right? I didn't talk about sales, income, EPS, gross margins. Why is doing this on its own? Because it is using a reasoning model. And then it says, comparing these with Adidas and Under Armour's similar quarters for a
Starting point is 00:41:33 comprehensive and analyst. And then the first thing it searched for, so it's searched for Nike Q3 earnings comparisons. All right. So you can go through and see. So it said, I'm exploring Nike's raw financial data for quarter three and comparing it with other sportswear giants. The gross margin increase to 51.3% might not be Nike. So it's already starting to see some some things that aren't regular in its research. So I can go and I can click, I can go click this website. So we went to some website called Fibre to Fashion, right? So I don't know. I'm looking at this website. I don't know anything about it. So if this was important, what I would do, and maybe this isn't a good source. Maybe it is. I don't know. But if you run this,
Starting point is 00:42:23 I always encourage you to run this a second time after you go through and read and see where it went to. So I've done this a lot after I use it. I will either whitelist or blacklist certain websites because as an example, I saw it really went to the Financial Times a lot. And I'm assuming that's because Open AI had a partnership with the Financial Times. But I saw that it wasn't really bringing in a diverse enough kind of information set. So, you know, after you run it the first time, I would look at what operator or sorry, at what deep research did or didn't do, especially when it comes to the quality of sources that it went to.
Starting point is 00:43:04 All right. So after it went to this Fibri 2 fashion, it says, seems like the article mostly highlights Adidas and Under Armour. So the downside is it doesn't take you to the exact page. It usually just takes you to the main domain, which I'm not a huge fan of. I wish there was, or no, actually, let me just double check that. I believe if you go to the sources tab,
Starting point is 00:43:27 it might take you to the actual, Let's see here. Okay, so here. Okay, so at the bottom, I'm just double checking. Okay, so maybe it doesn't. I thought that you could find it. So in some instances, it will give you the actual page that it went to. So I'm opening another specific page.
Starting point is 00:43:51 So on investing.com, it gave me the actual page it went to, whereas on this Fibre to fashion, it just gave me the main domain. So, you know, make sure to check the activity. and sources. All right. So let's keep going because I want you all to quickly understand and see this kind of chain of thought. So after it went to the Fibre to Fashion, then it went to investors.nike.com. So I would have liked to see it start there. But, you know, hey, that's why, that's why, you know, deep research is in charge and not me. But I could have told it, you know, hey, either start with this or I could have said, oh, and you know what? I was wrong, y'all. You can
Starting point is 00:44:33 start with a file. So I said that that was coming soon. I don't know if that was available right when they released it on Sunday, late Sunday night, but it looks like right now you can start with a file. So I would always, always, always recommend starting before you kind of send deep research off on its own, start with the most high quality source information possible. right? If you've taken our free prime prompt polish course, we go through the refined Q process. I would first go through that refined Q process before you send deep research off on its own. All right. So then it went to investors.9i.com. Then it says, I'm figuring out Nike's Q3 fiscal year numbers for net income and diluted EPS, then looking at Adidas revenue.
Starting point is 00:45:20 So then after that, it's searched for underarmers. All right. And it's talking, it's talking through. it's saying, interestingly enough, result zero aligns with Under Armour's official site for quarter three, 2024 results. So interestingly enough, result zero is usually a knowledge graph, right? So that's what I assume result zero means not result one. All right. So then it says it went to retail.insight.network. I'm scrolling through here, right? I'm not going to take 30 minutes. And then it went to Global Newswire, then it went to Finance Hill. It looks like a report that they did on that. Then it went through and searched for Adidas last. All right. So that was all the activity. And then you can go to the sources. So it looks like it's cited 16 different sources. And then at the
Starting point is 00:46:17 bottom, you can go to all sources. So it looks like sometimes deep research will look at websites. but not use them in its kind of final report that it puts together, right? So for podcast audience, you probably didn't see this, but it did complete a final report here, right? So I have a very good looking. So here it says, it says Nike Q3, 2024 versus Adidas and Under Armour, Financial Performance Comparison. And then in the middle of this little document, it puts together, which is great, it puts citations for all major facts, right?
Starting point is 00:46:52 because the first thing it says Nike reported a solid but modest performance in quarter 3, 2024. Revenue was essentially flat year over year at 12.4 billion. And then to make sure, oh, is that made up? Well, right there, I can hover over. That is from investors.9i.com. I'm going to click on it. And then I'm going to search for that same number, right? So let's go for 12.4. And there we go. Third quarter revenues were slightly up on both a reported and currently neutral basis. at $12.4 billion. So within, let's go back there. Okay.
Starting point is 00:47:29 So hopefully you all saw that right there. There we go. There it is. All right. So we can double check there on that citation that deep research provided. So let's just take a look. So, you know, we have the Nike fiscal quarter three, 2024 highlights there.
Starting point is 00:47:46 Then it goes into Adidas, quarter three highlights. We keep scrolling down. There's underarmors. all cited and sourced from it looks like 16 different sources. Then we have comparative analysis and market context. So here it's looking at the different financial metrics across all three companies, whereas first it broke it down company by company. And then we have market share, market news and context, analyst opinion, which we did ask for
Starting point is 00:48:17 and discrepancies. So we have a pretty solid. lid, right? So this isn't super long, which I actually like because sometimes I found that it's just entirely too long. So I'm actually checking here at the word count to see how long this report actually was. So, okay, so it was a 1500 word report full of citations. So pretty fantastic, if you ask me. And yet entirely different than Google's deep research. All right. So that's a wrap. I didn't want this to be a multiple hour long. Sorry, it looks like Jose said the video feed might have been stuck. Michael,
Starting point is 00:49:06 to answer this question, no, it does not visit as many sources as Google. Yes, Google. I generally get it to visit anywhere between 200 to 300. I've gotten it to visit up to like 1,300 sites, but it's different, right? This is one of those instances where I don't necessarily think that quality or sorry, I don't necessarily think quantity is as important as quality and reasoning, right? So I think the Google Deep Research, they each have their own fantastic use cases. I think Google Deep Research casts a wider net and doesn't really use that reasoning or that logic. Whereas Open AI does it probably more.
Starting point is 00:49:49 like a human would, right? I doubt that, you know, for most research tasks, you're going to go to 200 or 300, you know, websites. There might be some instances, but for the most part, I think you're day-to-day researching, you know, that most knowledge workers would do. I don't think you're going to, you know, 200, 300. I think you're probably going to 16 to 17, right? Hopefully those are high-quality sources, but that's why I always encourage you, if you are using deep research, run it a second time. First, that's the thing. You set it and forget it.
Starting point is 00:50:22 You come back, you look at the sources, you look at what it did, and you just give it feedback, right? So you start a brand new deep research chat, you know, based on the feedback and results. I cannot emphasize enough how important that is. It's the same kind of quote unquote golden rules that you would do from basic prompt engineering, right? The concept of a five-shot chain of thought or, you know, a 30-1. two shot chain of thought is always going to outperform a five shot. Your second and your third iteration of this is always going to be better. And you do have to do a lot of work providing good context before. I did this for the sake of brevity to go a little faster. The more and better
Starting point is 00:51:06 and high quality information you start with is going to help your first output. Your second output is going to be exponentially better because you can look at the activity and the sources and you can steer it in the right direction. I cannot emphasize enough. That right there is a cheat code, y'all. All right. So I hope this was helpful, y'all. And here, let's end it with one hot take. It is hot take Tuesday. Consultancies are going to completely be screwed if they do not use this, period, right? And you might be saying like, oh, Jordan, that's, that's a crazy take. No, it's not. I was not surprised at all to see the one video that OpenAI featured most prominently on its deep research page was from none other than Bain & Company, right? One of the biggest management consultancies in the world.
Starting point is 00:52:06 Yeah, this is, you know what, if you're a young consultant out there, this is the future. And I think this is really going to disrupt the entire consulting industry. I'm not going to come in with a hot enough take yet and say, oh, you know, the consulting industry is going to die. No, but if your company is using a management consultant right now, you need to renegotiate your contracts and you need to have a very detailed and great understanding of how they are using agenic AI research tools. I am not kidding.
Starting point is 00:52:39 Fortune, even Fortune 500 companies, if you have a long-term contract with a, you know, my friends of these companies aren't going to like this. You need to renegotiate the terms immediately because this expert, this does the, this does the job for that these companies would do. It's not doing it 100%, but it is doing it in 10% of the time. And oftentimes, if they're using the tools correctly, it is doing a much better job. this is going to completely disrupt how companies, medium-sized companies grow, right? What I see happening is the kind of the middle tier of companies that use management consulting.
Starting point is 00:53:22 So not your global Fortune 100s, right? But I'd say that middle tier, they're probably going to stop using management consultant companies unless they are honest, more honest, right, and say, hey, we used to spend, you know, We used to bill you, I don't know, 200 hours of research. Now with these AI tools, that's 20 hours, right? We cut it down by 90%. Right. Professional services, I said this on my 2025 AI predictions and roadmap, professional
Starting point is 00:53:53 services are going to go through a price shock this year. And this is perfect proof. If you are a medium-sized company paying six, seven figures or more to a management consultant company, you need to immediately go back to them, renegotiate those terms, and ask them how they are using this software. And you can't, like, they're not going to say, oh, we don't use chat. Yeah, they all use chat chbt. All the stories have come out, PwC, Deloitte, right? All these companies have invested, you know, tens of thousands or hundreds of thousands of seats to
Starting point is 00:54:27 use chat chbt. So they need to be using this. And one of the biggest core skills of management consultants, you know, is researching and then making sense of all of that. And here's the thing, y'all, I don't care what anyone says. I won national writing awards, right? I don't really talk about that a lot on the show, right? I won ACP Story of the Year.
Starting point is 00:54:54 I was a Pulitzer Fellow. I used to be a pretty good writer. Chad GPT is about a writer than me, right? Not the stuff you read online. People are always like, oh, look at this. It's so bad. AI stinks. No, it doesn't.
Starting point is 00:55:06 You stink at AI, right? Just like how I could go draw a stick figure and be like, look, art sucks. No, it doesn't. I suck at art, right? AI is a better writer than me. Someone that used to win national awards, right? This deep research is a better management consultant than your management consultant company. So they need to be using it.
Starting point is 00:55:32 I can't stress that enough. Sorry. just went on a random hot take there, y'all. I hope this was helpful going over Open AIs Deep Research, how it works and what you want it to be used for. If you're still listening, if you want to see a question, if you want me to run, I can't do them all. I think I have like 50 left.
Starting point is 00:55:51 You know, maybe just type in, you know, question to run or something like that. And I'll go through, I'll pick a couple. I'll put the results in the newsletter. I hope this was helpful. Like I said, if it was, click that repost button. All right. or if you're on Twitter, LinkedIn, I think that's the only way I can see. Just send me a message.
Starting point is 00:56:09 Give me a couple days and I'll send this to you. I built out 10 use cases for deep research that I think are really, really good, right? I did this same thing for the new tasks feature. And the feedback I got from the people that shared it, their minds were blown, those that actually put it into use. So you're going to want to go ahead, click that repost button there on LinkedIn or on Twitter. If this was helpful, if you want access to, to those use cases. Thank you for tuning in. Please go to your everyday AI.com.
Starting point is 00:56:37 Sign up for the free daily newsletter. We're going to be recapping today's show and a whole lot more keeping you not just in the loop, but making you hopefully the smartest person in AI at your company or in your department. Also, keep a lookout in our in our email newsletter. Should have a pretty amazing guest coming up this week and a lot of great guests in partnerships and in fun announcements lined up for February and March. So thank you for tuning in. Hope to see you back tomorrow and every day for more everyday AI. Thanks y'all. Meet Firefly AI assistant. Now live in Adobe Firefly, the Allman One Creative AI Studio. Just describe what you want to create in your own words and the assistant handles the rest, orchestrating multi-step workflows across Adobe Creative
Starting point is 00:57:24 Cloud apps, including Photoshop, Premiere Express, and more in one conversational interface. You direct the outcome while the assistant accelerates execution. Stand control with the ability to step in and refine at any time. See it. today at firefly.adobie.com. And that's a wrap for today's edition of Everyday AI. Thanks for joining us. If you enjoyed this episode, please subscribe and leave us a rating. It helps keep us going.
Starting point is 00:57:57 For a little more AI magic, visit Your EverydayAI.com and sign up to our daily newsletter so you don't get left behind. Go break some barriers and we'll see you next time.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.