Lenny's Podcast: Product | Career | Growth - OpenAI researcher on why soft skills are the future of work | Karina Nguyen (Research at OpenAI, ex-Anthropic)

Starting point is 00:00:00 Not only are you working at the cutting edge of AI and LMs, you're actually building the cutting edge. When I first came to Andarbite, I was like, oh, God, I really love for the engineering. And then the reason why I switched to research is because I realized, oh, my God, cloud is getting better at front end. Cloud is getting better at, like, coding. I think God can, like, develop new apps. What skills do you think will be most valuable going forward for product teams in particular? Creative thinking, and you kind of want to, like, generate a bunch of ideas. and like filter through them and not just load the best product experience.

Starting point is 00:00:33 I think it's actually really, really hard to teach the model how to be aesthetic or really good visual design or like how to be extremely creative in the way they're right. What do you think people most misunderstand about how models are created? When you taught the model some of the self-knowledge of you actually don't have a physical body to operate in the physical world. The model would get like extremely confused. Today my guest is Karina Nguyen. Karina is an AI researcher at OpenAI, where she helped build canvas, tasks, the 01 Chain of Thought model, and more. Prior to Open AI, she was at Anthropic, where she led work on post-training and evaluation

Starting point is 00:01:12 for the Cloud 3 models, built a document upload feature with 100K context windows, and so much more. She was also an engineer at New York Times, was a designer at Dropbox and at Square. It's very rare to get a glimpse into how someone working on the bleeding edge of AI and LLMs operates and how they think about where things are heading. In our conversation, we talk about how teams at Open AI operate and build product, what skills she thinks you should be building as AI gets smarter, how models are created, why synthetic data will allow models to keep getting smarter, and why she moved from engineering to research after realizing how good LMs are going to be at coding.

Starting point is 00:01:49 If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube. It's the best way to avoid missing future episodes, and it helps the podcast tremendously. With that, I bring you Karina Nguyen. This episode is brought to you by Interpret. Interpret unifies all your customer interactions, from gong calls to Zendesk tickets, to Twitter threads, to appstore reviews, and makes it available for analysis. It's trusted by leading product orgs like Canva, Notion, Loom, Linear, Monday.com, and Strava,

Starting point is 00:02:19 to bring the voice of the customer into the product development process, helping you build best-in-class products faster. What makes Interpret special is its ability to build and update customer-specific AI models that provide the most granular and accurate insights into your business, connect customer insights to revenue and operational data in your CRM or data warehouse to map the business impact of each customer need and prioritize confidently, and empower your entire team to easily take action on use cases like win-loss analysis, critical bug detection, and identifying drivers of churn with interprets AI assistant wisdom.

Starting point is 00:02:53 Looking to automate your feedback loops and prioritize your roadmap with confidence, like Notion, Canva, and Linear, visit each other. B-N-T-E-R-P-R-E-T-R-E-T-R-E-T-O-C-R-E-T-T-R-E-T-R-E-T-R-E-N-E-T-R-E-N-E-T-E-R-T-E-R-E-N-E-T. This episode is brought to be here, big fan of the podcast and The Nes-Let. Vanta is a longtime sponsor of the show, but for some of our newer listeners, what does Vanta do and who is it for? Sure. So we started Vanta in 2018 focused on founders, helping them start to build out their security programs and get credit for all of that hard security work with compliance certifications like SOC2 or ISO- 2701. Today, we currently help over 9,000 companies, including some

Starting point is 00:03:49 startup household names like Atlassian, Ramp, and Lang Chain, start and scale their security programs and ultimately build trust by automating compliance. centralizing GRC and accelerating security reviews. That is awesome. I know from experience that these things take a lot of time and a lot of resources and nobody wants to spend time doing this. That is very much our experience, but before the company and some extent during it. But the idea is with automation, with AI, with software,

Starting point is 00:04:17 we are helping customers build trust with prospects and customers in an efficient way. And, you know, our joke, we started this compliance company so you don't have to. We appreciate you for doing that. And you have a special discount for listeners. They can get $1,000 off Vanta at vanta.com slash Lenny. That's V-A-N-T-A-com slash Lenny for $1,000 off Vanta. Thanks for that, Christina. Thank you.

Starting point is 00:04:45 Karina, thank you so much for being here. Welcome to the podcast. Thank you so much, Lenny, for inviting me. I'm very excited to have you here because not only are you working at the cutting edge of AI and LLMs, you're actually building the cutting edge of AI in LMs. You recently launched this feature, which basically the first agent feature of OpenEI. I also just did this survey.

Starting point is 00:05:08 I don't know if you know about this. I did a survey of my readers and asked them what tools do you use every day in your work and most used. And chat GPT was number one above Gmail, above Slack, above anything else. 90% of people said that use chat dbt regularly. It's absurd. And it wasn't around two years ago.

Starting point is 00:05:25 Yeah. Also, we're recording this the week that Open Eye announced Stargate, which is this half a trillion dollar investment in AI infrastructure. So there's just like a lot happening constantly in AI. And you have a really unique glimpse into how things are working, where things are going, how work gets done. So I have a lot of questions for you. I want to talk about how you operate and how you work at Open AI, where you think things are going, what skills are going to matter more and less in the future. And also just where things are going broadly. So how does that sound?

Starting point is 00:05:54 Sounds great. so much. Yeah, I was extremely lucky to join early days on Toppaic and kind of learned a lot of things there and I joined Open AI around like eight months ago. So yeah, I'm excited today. It's more into it. Okay, I'm going to definitely ask you about the differences between those, but I want to start more technical and just dive it in. I want to talk about model training. People always hear about models being trained, these big models, how much data takes, how long it takes, how much money to us, it takes how we're running out of data, which I want to talk about. Let me just ask you this question. What do you think people most misunderstand about how models are

Starting point is 00:06:34 created? Model training is more an art than a science. And in a lot of ways, like we as model trainers, think a lot about, like, data quality. It's one of the most important things in model training is like, how do you ensure the highest quality data for certain like interaction, model behavior that you want to create? But the way you debug models is actually very similar, the way you debug software. So one of the things that I've learned early days at Anthropoc was like, we've discovered, especially with like clot-3 training when you taught the model some of the self-knowledge of like, hey, like, you actually don't have a physical body to operate, like, in the physical world.

Starting point is 00:07:21 But then at the same time, we had data that kind of taught the model some of the function calls, which is like, this is how you set the alarm. And so the model would get extremely confused about whether it can set an alarm, but it doesn't have a body in the physical world. So it's like the model gets confused, and sometimes it's like over-refused. So sometimes it says, I don't know, like,

Starting point is 00:07:46 sorry, I cannot help you. And so there is always like a, balanced trade-off between how do you make the model to be more helpful for users, but also not being harmful in other scenarios. So it's always about, like, how do you make the model more robust and, like, operate across, like, a variety of diverse scenarios. That is so funny. I never thought about that.

Starting point is 00:08:11 Most of the data that's trained on is kind of, like, assuming it's like a human describing the world and how they operate and there's, they assumes there's a body and you could do things. in the model told you don't have a body. Yeah. Okay. I want to talk a little bit about data. While we're on this topic, I know you have strong opinions here.

Starting point is 00:08:27 There's kind of this meme that models are going to stop getting smarter because they're running out of data. They're trained in a large part on the internet, and there's only one internet, and they've already been trained on it. What more can you show them about the world? And there's this trend of synthetic data, this term synthetic data. What is synthetic data? Why do you think this important?

Starting point is 00:08:45 Do you think it's going to work? I think there are two questions here. We can unpack one at a time, but people say if you're hitting the data wall, I think people think more in the terms of like pre-trained large models that are trained on the entire internet to predict the next token. But what actually the model is learning during that process is actually how do you compress the compression algorithm here? The model learns to compress a lot of knowledge. and it learns how to model the world. So the next prediction of the word, like, teach me how to drive, basically,

Starting point is 00:09:28 and you only have, like, a few words that will match that, a car. So the model actually learns about the world in itself. So it's like, it's modeling human behavior. Sometimes it's modeling. And when you talk to, like, pre-chain models, which are very, very large, they're actually extremely diverse and extremely creative

Starting point is 00:09:48 because you can talk to almost any Reddit user through Pigeen model. But I think what's happening right now was like new paradigm of like O1 series is of like the scaling in post-chaining itself is not hitting the wall. And that's because basically we went from like raw data sets from pre-term.

Starting point is 00:10:15 train models to infinite amount of tasks that you can teach the model in the post-training world via reinforcement learning. So any task, for example, like how to search the web, how to use the computer, how to write, well, like all sorts of tasks that you're like trying to teach the model, all the different skills. And that's why I'm you saying, like, there's no data wall or whatever, because there will be infinite amount of tasks. And that's how the model becomes extremely super diligent. And we are actually getting saturated in all benchmarks. So I think the bottleneck is actually in evaluations.

Starting point is 00:10:55 We don't have all the frontier like evas, like, I don't know, GPQA, which is like a Google-proof question answering, like PhD level. And diligent the benchmark is like getting to like, I don't know, more than like 60, 70%, which is what HD gets. So it's like literally hitting the wall and evolves. I want to follow both those threads. So the first is on this idea of synthetic data. It's a simple way to understand it that the models are generating the data

Starting point is 00:11:27 that future models are trained on. And you ask it to generate all these ways of doing stuff, all these tasks, as you described, and then the newer models trained on this data that the previous model generate. Some tasks are synthetically curated. So this is like an active research area is like how do you can you construct like new tasks model to like learn? Sometimes, you know, like when you develop products, you get a lot of like data from the product and like use the feedback and you can use that data to like this like post-training

Starting point is 00:12:01 world. Sometimes you still want to like use like human data because actually some of the tasks can be like really, really hard to teach. Like experts only know, like, certain knowledge about some chemicals or biological knowledge. So you actually need to tap into the expert knowledge a lot. So yeah, I think to me, like synthetic data training is more for like product.

Starting point is 00:12:34 It's like a rapid model iteration for similar product outcomes. And we can dive more into it. But the way we made Canvas and tasks and new, like, part of features for Chichibati was mostly done by synthetic training. Let's actually get into that. That's really interesting. I want to talk about e-vals, but let's follow that thread. So talk about how this helped you create Canvas. So when I first came to Open AI, I really had this idea of, like, okay, like, it would be really cool for Chachapiti to actually, like, change the visual interface, but also change, like, the way. It is with people.

Starting point is 00:13:13 So going from like being a chat bot to more of a collaborative agent and the collaborator, it is like a step towards like more gigantic systems that become like innovators ultimately. And so the entire team of like applied engineers, designers, product, like research kind of like got like formed in the air almost out of like nothing. which is like a collection of people who just got together and the rapidly started iterating with each other. Actually, like, Kavis is like one of the, I would say like the first project at Open AI where researchers and applied in key years

Starting point is 00:13:54 started working together from the very beginning of the product development cycle. And I think like there's a lot of things that we have learned on the way. But I definitely came to with the mindset of like, we need to do like really, rapid model situation such that it would be much easier for engineers to, you know, work with the latest model possible, but also learn from like user feedback or like early, like internal dog food, how to be improved the model very rapidly. And, you know, it's really

Starting point is 00:14:29 hard to like kind of like figure out like how people, when you deploy a product, how people would be able to use it. And so, like, the way you synthetically train the model is basically figuring out, like, what are the most core behaviors that you want this product feature to do? And for Canvas,

Starting point is 00:14:51 for example, it was, it came down to, like, three main behaviors. It was, how do you trigger Canvas for prompts, like, write me a long essay when the user intention is mostly, like, iterating over long documents, or write me,

Starting point is 00:15:07 piece of code, or when to not trigger canvas for prompts like, can you tell me more about precedent, like, I don't know, some of the general questions. So you don't want to let trigger canvas because the user intention is mostly getting answered, not necessarily like iterates always a long document. The second behavior is how do you teach the model to update the document when the user asks. So one of the behaviors that the target model is actually have some agency on autonomy to literally go to the document

Starting point is 00:15:46 and select specific sections and either deleted or edit, so highlighted and rewrite certain sections. So sometimes the model, sometimes the user would just, like, say, change the second paragraph to be something friendlier. And we would have to, like, teach the model to,

Starting point is 00:16:05 literally find the second paragraph in the document and change it to a friendly tone. So basically you teach both how to trigger edit itself, but also how do you teach the model to get higher quality edit for that document. In case of coding, for example, there's also like the question of like how good the model is of like completely rewriting the document versus like having like very specific targeted edits. So that's like another like layer of decision boundary within like edit itself. It's like select the entire document that like rewrite completely or you want to like have like very popular custom behavior. And you know like when you first launch the model, we would bias the model towards like

Starting point is 00:16:49 more rewrites because you thought the quality of the rewrites were like much higher. But over time you're like kind of shifting based on like user feedback and what's your learning from iterative deployment. Lastly, the third behavior that we taught genetically, the model is how to make comments on any document. So the way we used it is like we would use a one model to produce, to like simulate like use a conversation. Let's say like, write me a document about XYZ. But then we used O1 to like produce the document. And then we kind of injected like user prompt to be like,

Starting point is 00:17:31 oh, make some comments, critique my piece of writing, or critique this piece of writing that you just made. And then we taught the model to make comments on the document, on very specific document. So it's also like what kind of comments you want the model to make. Like, do they make sense or not? Like, how do you teach the quality of that? And it all came down to like measuring progress via very robust evolves.

Starting point is 00:18:01 But yeah, this is how you like used like a long like kind of like synthetic data generation for like the staining. Okay. This is so interesting. So you talk about this idea of teaching the model and you mention how it's using synthetic data to teach the model different behaviors. Is a simple way to think about it? Basically, that's where you do that by showing it what success looks like using basically e-valves. Is that the simple way to think about it? Like here's what you doing this successfully would look like and that teaches it.

Starting point is 00:18:29 Okay, I see this is what I should do. Great. Yeah. Amazing. Yeah, you got it. Okay, got it. I want to start unpacking what your day-to-day looks like as you're building these sort of things. Is it like you sitting there talking to some version of chat GPT crafting these evils? Sometimes I do that. Sometimes I do sit with chat GPT. Actually, I think I learned this so much from Antarctica. It's like people spend so much time just like prompting models and like quality a little bit backbush all the time. and you actually get a lot of new ideas

Starting point is 00:19:01 how do you make the model better? It's like, oh, like, this response is kind of weird. Like, why is it doing this? And you start, like, debugging or something or, like, you start, like, figuring out, like, new methods and, like, how do you teach the model to respond in a different way, like, have better personality, let's say? So it's the same thing of, like, how personality is made, like,

Starting point is 00:19:25 in the models with those. It's, like, very similar methods. But yes, I think my time atopaday I have changed. I think when they first came, I was mostly like research IC work. So I was like building a lot of like, I was like writing code, like, you know, changing models, writing evas, working with PMs and like designers to like learn, teach them how to like even think about like evaluations. I think it was like really cool experience. And I think this is like an adoption of like how do we like do this. this like prior management of like AI features or like AI models.

Starting point is 00:20:05 Yeah, but now it's like mostly like, you know, like management and like mentorship. I'm still like doing SC like research code after like 4 p.m. although, but yeah, it's kind of like changed. All right. Don't talk too much about being a manager because everyone's firing their managers. Who needs managers anymore? That's what I hear now. Just kidding.

Starting point is 00:20:28 It's interesting that so much of your time was spent on teaching product teams how evals integrate and how important that is. And I've heard this a few times and I haven't personally experienced it yet. So I think it's an important threat to follow is just how writing these evaluations is going to become increasingly an important part of the job of product teams, especially when they're building AI features and working with elements. So can you just talk a bit more about what that looks like? Is it like sitting there with an Excel spreadsheet basically showing like here's the input, here's the output, here's the output. but here's how good the result was. Talk about what that actually looks like very practically. It certainly depends on what you're developing,

Starting point is 00:21:05 but there are various types of like evaluations. So sometimes I do ask product managers or there's also like new role that we have like model designers to kind of like go through some of the user feedback maybe or like think of like various like user conversations that should have triggered like under this. of some sense is it should trigger canvas. And then you have this like ground truth label of like, okay, with this conversation, it should look trigger a careless. Under this conversation,

Starting point is 00:21:37 it should not trigger a canvas. And you have this like very deterministic kind of like evolve that for like this is about behavior is just like this. When we were launching tasks, for example, like, how do you make correct schedules is like actually really hard for the model. But we built out like some of the deterministic evaluations that is like, okay, like if the user says like 7 p.m. it should like, the model should say 7 p.m. So if you can like have a different domestic evolves whether it's like pass or fail.

Starting point is 00:22:11 So yeah, and like the way it works is like, sometimes I ask, product managers just like, like, go create like a Google sheet, like have different tabs and like, what's the current behavior, what's like the ideal behavior? And like why or like some modes And sometimes we usually use it for evils, sometimes we use it for training.

Starting point is 00:22:32 Because if you give the spreadsheet to like a one model, it can probably figure out like how to teach itself a good behavior. And I think there are a second type of like eviles that is kind of more prevalent is like human evaluations. And you can have specific trainers or you can have like internal people to, when you have like a conversation of the prompt and then you have like various completion of models, you kind of choose the win rate, which model is the best, which model produce the highest quality comment or edit. And then you can have like continuous win rates.

Starting point is 00:23:13 And as you develop new models, it should always like win over the previous models. So it depends on what you want to measure. So interesting. Like basically what I'm hearing in the something, I'm learning about as I talk to people is product development might move from this like, here's a spec, PRD, let's build it together, and then cool, let's review it. Are we happy with this? From that to, hey, AI, build this thing for me, and here's what correct looks like.

Starting point is 00:23:42 And I'm spending all my time on what does correct look like on e-vales, essentially. You definitely want to like measure progress for the model. And this is where e-balances is because, like, you can have prompted model as a baseline already. And the most robust evolves is the one where prompted baselines get the lowest score or something. And then because then you know, like, if you're trained a good model, then it should like, just like hill climb on that evolve all the time while not like also like regressing on like other intelligence evolves. So it's like I think it's more what, that's what I'm saying like it's more of an R than science.

Starting point is 00:24:20 is like, okay, like, if you optimize the model for this behavior, like, you kind of don't want to, like, brain damage in, like, other areas of intelligence or this is happening, like, all the time in every lab and every, like, research team. I would say, like, prompting is, like, also a way to, like, prototype, like, new product ideas. Like, early days at Andarberg, when I was working on, like, file uploads feature. I remember I was just, like, you know, prompting the model. to just like, I mean when we were like launching like a hundred key context, I was just like prototyping this in a local person, which I did the demo, like, people really, really loved it and they just like wanted like API for like file uploads or something. And then that's when it clicked to me like I also like wrote a blog post

Starting point is 00:25:10 Alampton like a new way of like product development or like prototyping for designers and for like product measures. For example, one of the features that I wanted to do is have like personalized startup prompts. So whenever you come to like cloud, like it should like recommend you like starter prompts based on what your interests are. And so like you can literally do it like prompting for that. And it's like another feature was like generating titles for the conversations. It's this very small micro-age spirits, but I'm really proud of. The way we did that was we took like five latest conversation from the user, like, asked the model, like, what's the style of the user?

Starting point is 00:26:00 And then, like, for the next kind of new conversation, the generated title will be of the same, like, style. The user's like really little, like, micro-experience is like this. That's so cool. Did you do that atthropic or at OpenAI? At anthropic. Okay, cool. I love the file upload feature that Cloud has, by the way,

Starting point is 00:26:18 Chat ChaptiPT doesn't have that yet, is that right? I think it has. I think the way it's implemented is very different, though. Okay, maybe it's the PDF feature because I use it all the time with call it. Okay. That's cool. So I needs to get on that. Man, it's wild how many features you built that I use every day and that many people use every day.

Starting point is 00:26:35 This prototyping point you made is really important. It's something that comes up a ton on this podcast also of how that is maybe the way that AI has most impacted the job of product builders recently is just prototyping. Instead of going from showing just like, here's a PRD, here's a PRD, here is a, design PMs more and more, just here's the prototype of the idea that I have if it's working, you can play with it. Yeah. Yeah. Okay.

Starting point is 00:26:57 I want to spend a little more time on how you operate. So you talked about, you built this in launch of those tasks features. Is that the way you describe your tasks? Yeah. So talk about how that emerged. And let's better understand just how you collaborate with product teams and how OpenA works in that way, whatever you can share there. I think canvas and tasks are going into the bucket of all project.

Starting point is 00:27:18 where it's like more like short or like medium terms. And actually the way canvas and tasks came about to be was like it started with like one person prototyping and creating like a spec. It's kind of like PRD. It's like creating a spec of like the behavior of the model. I don't think like tasks is like extremely like grand brady. groundbreaking feature necessarily. What makes it like really cool

Starting point is 00:27:52 is because the models are so general, model can now search, they can like write sci-fi stories, they can like search for stocks, they can like summarize the news every day because the models are so general, like giving something familiar to people that like, you know, notification is like very familiar.

Starting point is 00:28:11 Like having reminders is like very familiar. So like creating like a form factor for the people who like very, very familiar. Same with as like cameras, right? Google Docs is very familiar. But then you add like magical AI moment and it becomes like very powerful. But the way it comes

Starting point is 00:28:27 usually like operationally like yeah, it starts as like a prototype like literally prompted prototype of like how you would want like the model to behave. For like tasks, for example, like you kind of like need to design a little bit like design

Starting point is 00:28:41 system design thinking is like okay like well if the user says like remind me to go to lunch at 8 a.m. tomorrow. Okay, what kind of information does a model need to extract from that prompt in order to create a reminder? And so this is how you, like, design, like, a spec for a new feature, like a tool. Canvas and tasks are all tools.

Starting point is 00:29:07 So it's like, how do you, like, create the tool stack? And then it's like mostly, like, developing JSON schema. I was like, okay, like from this problem, maybe the model should extract like the time to the user requested. And then you're thinking about like, which format you want the time to be? And then like, how do you want the model to like notify you? It's like basically the user should give instruction to the model. And then this instruction would like fire off like every day or something at that particular time. So for example, if you say, like, search, like, every day I want to, like, learn now about the latest AI news, the models should do a ride into, like, okay, like, search for the latest AI news.

Starting point is 00:29:55 And this will, this task will get fired at that particular type that the user requested. And then, you know, like, your design is like tool spec. And then, actually, I don't know, like, I feel like sometimes, like, it's like through conversations, I, like either like people ask me to like join the like team and they're like oh my god like we need to be searchers or like we need like some support like we need like to train the models or sometimes like with canvas canvas is like mostly like I just pitched the idea like it got staffed quite immediately during the break um so I know like it's like depending on the project and usually with staffing it's like mostly like a product manager um model designer actual product design a couple of researchers don't apply to like applied engineers. Depends on the complexity of projects. And then like, you know, for tasks, it took like, two months or so to go from like zero to one basically.

Starting point is 00:30:56 Oh wow. For canvas was like four or five months, I guess, to go from zero to one. But yeah, and then like, you know, you teach product managers how to like build evils and maybe, you know, how do we not only like ship the better feature, but how do we think like more logo term? Like what kind of like cool features do you want tasks to have? Like I think it would be nice for tasks to be like a little bit more personalized.

Starting point is 00:31:26 It would be nice to have like to create tasks via voice on a mobile, right? Like so you kind of need to like, this is how you get like research roadmap right here. It's like thinking like how the feature will be developed in the future. And then from there, it's like, you start creating data sets, like, with Iwas, you want to make sure that goes well. And then, like, you need to have, like, a trade-off between, like, what methods you want to use. And the reason why I really love, like, synthetic, like, relying purely and synthetic data instead of, like, collecting data from humans is because it's, like, much more scalable. It's cheap, less than have, like, you literally sample from the model. and you teach the core behaviors of the models,

Starting point is 00:32:10 and that will generalize to all sorts of diverse coverage. And when you launch the beta feature, you learn so much from the users that you can, like, all your synthetic sets can be shifted in the distribution of how the users behave in the private behavior, and this is how you improve. And this is what happened to this canvas too, when we launched from beta to GA.

Starting point is 00:32:35 This episode is brought to you by LUM. LUM lets you record your screen, your camera, and your voice to share video messages easily. Record a loom and send it out with just a link to gather feedback at context or share an update. So now you can delete that novel-length email that you were writing. Instead, you can record your screen and share your message faster. Lume can help you have fewer meetings and make the meetings that you do have much more productive. Meetings start with everyone on the same page and and early. Problem solved, time saved. We know that everyone isn't a one-take wonder when it comes to recording videos, so Loom comes with easy editing and AI features to help you record once

Starting point is 00:33:16 and get back to the work that counts. Save time, align your team, stay connected, and get more done with Loom. Now part of Atlassian, the makers of Jira. Try a Loom for free today at loom.com slash Lenny. That's L-O-O-O-M.com slash Lenny. Something that I want to help people understand, and I don't even 100% understand this, is what's the simplest way to understand the job of a researcher versus, say, a model designer and other folks involved? Like, what's the simplest way to understand what researchers do at open air? So the project that I described, I'm mostly like product-oriented, like research is mostly the product research. Another part component of my team is actually more like longer-term exploratory

Starting point is 00:33:58 And it's more about like developing new methods, understanding those methods, and a variety of circumstances. So like basically develop new methods. You kind of like need to follow very similar kind of like recipe of like building e-biles. But it's like more sophisticated evils. Like you kind of want to have like other distribution or like if you want to like measure journalization, you kind of need to like capture that. but it's basically more sciencey in a way where you know if we talk about synthetic data like one of the hardest things about something data is like how do you make it like more diverse diversity in certain data is like one of the most important questions

Starting point is 00:34:40 right now and it's like exploring like ways to inject like diversity as a general method that will work for all is like a one of the research explorations other ones is like more like developing new capabilities I feel like it's all just about like, you know, like you work on this like new method and you have like signs of life that it's working. Either you think of like how do you make it more general or you think of like how do you make it very useful or like and this is how like longer term projects become more like medium and short term projects. That makes sense. Essentially working on developing ways to make the model smarter 04 or 506.

Starting point is 00:35:21 any ways to like 01 was a big breakthrough right the way it operates where it's not just here's your answer it actually thinks and has takes time to think through the process of coming up with an answer okay yeah very helpful speaking of that of thinking about the future where things are going I want to spend some time on just this insight that basically you are building the cutting edge of AI like at the very bleeding edge of where AI is going and where it is and so I'm very curious to hear just your take on how you think things are going to change in the world and how people work based on where you see things are going. And I know it's a broad question, but let's say like in the next three years, how do you see the world changing? How do you see people's way of working changing? It's a very humbling experience to be in both labs, I guess. To me, when I first came to Andarbate, I was like, oh, no, I really love from an engineering.

Starting point is 00:36:16 And then, like, the reason why I switched to, like, research is because I realized at that time, like, oh, my God, like, Cloud is getting better at, like, front end. Like, Cloud is getting better, like, coding. I think Cod can, like, develop new apps or something. And so, like, it can, like, develop new features for the thing that I'm working. So it's like, it was kind of like this meta-realization where it's like, oh, my God, like, the world is actually changing. And they're like, when we first, like, launched 100K context at that time, obviously, you know,

Starting point is 00:36:47 I'm thinking about, like, from factors that's like, yeah, like, file. uploads were like very natural, very familiar to people, but you can imagine we could just like make like infinite chats in the cloud that AI app, right? Like as of like it's like in a hundred key context. But because like file uploads, it's like foreign follows function is like the form factor of the file uploads kind of enable people to just like literally upload anything, the books are like any reports, financial and like ask any task to the model. And then I remember it was like, you know, enterprise customers like, like financial customers are like really interested in that. And it's like, oh, wow, like actually they, it's actually one of the very common tasks that people do in that setting.

Starting point is 00:37:39 It was like kind of crazy to like see how some of the redundant tasks are getting like automated basically by these like smart models. And they're entering the era where I actually. don't know, for example, sometimes if L1 gives me the correct answer or not, because I'm not an expert in that field. And it's like, I don't even know how to verify the outputs of the models is because, like, all my experts, not like, they can, like, verify this. So, yes, so basically there are trends that are going on. The first trend is the cost of reasoning and intelligence is drastically going down. I had a blog post about this.

Starting point is 00:38:24 Maybe I should update on, like, latest benchmarks, because at that time, like, MMO, everybody was, like, doing, like, one benchmark, and they'd be, like, quickly saturated the benchmarks. And, like, now we need to, like, do the same plot, but with another, like, frontier evolve.

Starting point is 00:38:40 But the cost of intelligence is, like, going down because it becomes, like, much cheaper. Smart, small models are becoming, it was smarter than like large models and that's because of like the distillation research this happened with like clotty haiku I was like working like post-taining

Starting point is 00:39:00 a lot of high school and I realized it was much smarter than like clotul which was like way bigger, let's and like that but like the power of like small models become very intelligent and fast and cheap we are moving towards that road

Starting point is 00:39:17 that has multiple implications, but that means that, like, people will have more access to AI, and that's really good. Like builders and developers will have much better access to AI, but also it means, like, all the work that has been, like, bottlenecked by intelligence will be kind of, like, unblocked. So anyone, like, I'm thinking about, like, health care, right? Like, if I have, instead of all, going to a doctor, I can, like, ask chat GPT, like give chat chvety a list of symptoms and ask me like oh which like would I have like a cold flu or like something else like I can literally get the access to like a doctor almost and there's like

Starting point is 00:40:03 been some like research studies around that yeah there's a New York Times story about that where they compared doctors to doctors using chat chitpT to just chat chapt and just just chat chpt was the best of the mall. Like doctors made it worse. Yeah. Yeah, that's crazy. Like, right? Like, education, I think,

Starting point is 00:40:24 I would have dreamt if, like, I had the tool, like, chatypity and when I was, like, young and, like, would learn so much. But it's like, people can now learn almost anything from these models, so they can learn new language. They can learn how to build new book apps.

Starting point is 00:40:40 Like, I don't write, anything that you want. And, like, I'm so. Like, it's humbling to, like, have, like, launch canvas and, like, bring that thing to the people, enable them to do something else that they couldn't have ever before. And I think this is, there's something, like, magical around this experience. So education will have massive implications, like, I guess, like, scientific research, right? Like, I think it's, like, the theme of, like, any AI research is, like, augmentate AI research. It's kind of scary, I'd say, which makes me think that, like, people management will stay, you know?

Starting point is 00:41:14 it's like one of the hardest things to, it's like emotional intelligence for the model, so like creativity in itself is like one of the hardest things. So writers, I don't think like people should be worried as much. I think it's like, I think it's alleviate a lot of like redundant tasks for people. This is awesome. Okay, I want to follow this thread for sure. And it's funny that what you described is like, you were an engineer anthropic and you're like, okay, Claude is going to be very good at engineering. This isn't going to be a potentially career long term, so I'm going to move into research.

Starting point is 00:41:48 And AI is going to need me for a long time to build it, to make it smarter. I would say we still have, like, I think Canvas team has still have, like, really cool, like, front engineers that are really, like, you know, people who, like, really care about, like, interaction, design, like, interacting with, like, I don't think, like, models are there yet.

Starting point is 00:42:08 Like, I think if, but we can get the models So like this top line percent of like front end something, for sure. So what I want to move on to next along these lines is just, and this is just speculation, but what skills do you think will be most valuable going forward for product teams in particular? So folks are listening and they're like, okay, this is scary. What should I be building now to help me stay ahead and not be in trouble down the road? What skills do you think are going to be most more and more important? to build. Yeah, I think like creative thinking, like you kind of want to like, come up,

Starting point is 00:42:48 like generate a bunch of ideas and like filter through them and not just like build the best product experience. Listening, you know, you want to like build something that like the most general model will not replace you. And oftentimes you build something and you make it really, really good for like specific set of users and actually the mode is now in like your user feedback. The mode is like more in like whether you listen to them like whether you can like rapidly iterate. Like the mode is like in here. I don't think like we are yet to like there are so many ideas. I think there's an abundance of like ideas that you can look like I wouldn't be worried.

Starting point is 00:43:35 I feel like in fact, I do think like people in AI fields are like, I wish they were like a little more more creative and like connecting dots across like to print like fields or something like that to like develop really cool new like generation and new paradigms of interactions with this AI. Like I don't think we've cracked this problem at all. A couple of years ago I was like telling some people I was like, you know, you kind of want to like build for the future. So it's like, it doesn't necessarily matter whether the model is good and not good right now, but you can build product ideas such that by the time the models will be really good. It will work really well. And I think it just like happened naturally. Like for example, like at Antarctic, like the cloud artifacts.

Starting point is 00:44:27 And I feel like early days of canvas was like back in like 2022, like before, chat chiptie, like writing idea was like a knowledge chashpee. But I feel like Claude 1.3 model itself was like not there to like made like really extreme good like high quality edits, for example, like coding. And I feel like I see like startups like cursor and it's like doing super well. Like unless because they like iterate so fast, they like invent like new ways or like training models. they move really fast, they listen to users, like, massive distribution. It's like, yeah, it's kind of cool.

Starting point is 00:45:07 That's really helpful, actually. So what I'm hearing is that soft skills essentially are going to be more and more important and powerful. You just to talk to about management, leading people, being creative and coming up with innovative insights, listening. There's a post I wrote that I'll link to where I look, I try to analyze how AI will impact product management. and we're actually very aligned.

Starting point is 00:45:29 And my sense was the same thing, that soft skills are going to become more and more important. And the things that are going to be replaced is the hard skills, which is interesting because usually people value the hard skills, like coding, design, writing really well. And it's interesting that AI is actually really good at that because it's taking a bunch of data, synthesizing it, and writing, creating a thing versus all these fuzzy things around

Starting point is 00:45:52 of what influences convinces people to do things and aligning and listening, like you said, creativity. Anything along those lines come up as I say that? I think it's actually really, really hard to teach the model how to be aesthetic or like do like visual, a really good visual design or like how to be extremely creative in the way they write. I think like, I still think like Chacheepee kind of sucks at like writing.

Starting point is 00:46:17 And that's because it's like it's like bottle mouth by this like creative reasoning. I think like prioritization is like one of the most important. Like I think like for a moment. manage, I feel like, I actually like AI research progress is bottlenecked by like management, like research management is because you have like constrain set of compute and you need to like allocate the computers to the research path that you feel the most commenced about. It was like you need to like really, you need to have like a really high conviction in the research parts to put the compute and like it's more like return on investment kind of situation.

Starting point is 00:46:56 And it's like, okay, yeah, like, I'm thinking a lot about, like, okay, like, how do, across all my projects, which projects are higher priorities, like, prioritization and also, like, on the lower level, like, which experiments are really important to run right now and which are not and, like, cut through the line. So I think, like, prioritization, communication, like, management, people skills, like, empathy, like, understanding people, like, kind of, like, collaboration. I think like canvas wouldn't be like an amazing launch if it wasn't like about like people. And I think it's the wonderful global group of people. And like I got a chance to like work with like people like Lee Byron who's like a co-creator like GraphQL and like some of the best like Apple designers. And it's like so cool to like see. And like how do you create this like collaboration between people? It's just like something that's still humane, I think.

Starting point is 00:47:50 Let me just follow us around a little bit because I imagine people. listening are like, okay, but once we have AGI or SGI, it's like, it'll do all this. It's like, there's a world where like, why isn't all this done? I think it's easy to just assume all that. I'm curious, this idea of creativity and listening, why you think AI isn't good at it, other than it's just very hard to train it to do this well. Is there anything there just like why this is especially difficult for AI NLMs to get good at? I think currently it's difficult for many reasons.

Starting point is 00:48:26 I think it's still like an active research area. And it's something that like, I think my team is working on is like, okay, how do we teach the model to be like more creative in like the writing? And actually like, I'm thinking like this new paradigm of life that the models think more should actually lead to like better writing in itself. But like when it comes down to like idea generation. or like, discriminating of, like,

Starting point is 00:48:53 what is the good, like, visual design and odd? I feel like it hasn't had learned, like, examples from, like, people to discriminate it very well.

Starting point is 00:49:03 I do think it's because, like, you know, there are not that many people who are, like, actually, like, really, like, it's not, like, accessible to, like,

Starting point is 00:49:13 models to learn from these people, I guess. So I definitely, that's why it sucks. Yeah, that makes sense. Basically, there's not enough of you yet. Researchers teaching it to do these things slash people that have incredible taste and creativity that can teach these things. You could argue this will come, but I'm not, we don't need to keep going down that thread. Let me ask you a specific question in this post I wrote. I made this argument that a lot of people disagreed with that strategy is something that AI tooling will become increasingly great at and take over. There's the sense that that's the thing that people will continue to be much better at and you can't off-lotte AI basically developing your strategy, telling you what to do to win.

Starting point is 00:49:56 My case is, isn't strategy, just take all the inputs, all the data you have available, understand the world around you, and come up with a plan to win. Feels like AI would be, like, an L-LM would be incredibly smart at this. What's your take? I think so, too. I think, like, again, like, you teach the model all sorts of, like, tools and, like, capabilities and, like, reasoning, right? And it's like, when it comes down to like, as for Canvas right now,

Starting point is 00:50:20 it's been very cool to like, for the models, just like, aggregate all the feedback from users, like summarize me, like, the top five, like, most painful flows on user experiences. And then, like, the model itself is, like, very capable of, like, thinking of, like, knowing how it's being made, figure out, like, how to, like, create a data sets for itself to, like, train on it. And I don't think, like, we are far away from that kind of, like, self-improvement models becoming, like, self-improve. By, like, then, like, the product development is basically kind of, like, self-improving, like, it's kind of, like, its own, like, organism or something.

Starting point is 00:51:02 Yeah, like, again, like, strategies, like, it's more, like, data analysis and, like, coming up with, like, like, I think what models are really good at is, like, like, like connecting the dots, I think. It's like, okay, if you have users feedback from this source, but you also have an internal, like, dashboard with matrix, and then you have, you know, like other kind of like feedback or like inputs, and then like it can co-create, like, a plan for you, like, recommendations even. And I think this is like one of the most common use cases for ChatsyPTC2 is like coming up with like this sort of things.

Starting point is 00:51:46 That makes sense. like essentially a human can only comprehend so much information at once and look at so much data at once to synthesize takeaways. And as you said, these context windows are huge now. Here's all the information. What's the most important thing I should do? Yeah, same as like scientific research. It's because like you, like ideally the model would be able to like suggest like ideas, like iterate on the experiment or like given the empirical results of the previous experiments, like, how do you like come up with like new ideas or like the methods? Yeah.

Starting point is 00:52:19 Oh man. Okay. So just to close the loop on this conversation, this part of the thread is the skills you're suggesting people focus on building and leaning into soft skills like creativity, managing influence, collaboration, looking for patterns. Is that generally where your mind is at? Yeah. I'm thinking a lot about like how do you make organizations more effectively.

Starting point is 00:52:43 And I think this is mostly like management, I guess. It's like, how do you organize, like, research teams or, like, generally teams, like, combine, composed teams such that they will be at their maximally succeed or, like, at the maximum, like, performance of what can possibly, like, if you can, like, literally create, like, the next generation of computers. It's just, like, the matter of conviction and, like, the way you manage through that. It's like scaling organizations or like scaling product research it does. Yeah, I think what like you're basically building this thing and not efficiently doing it is like limiting the potential of the human species right now. Right. Mismanagement within the research team and Open Eye and Anthropic and some of these other models. Yeah, it's kind of crazy to think about it.

Starting point is 00:53:33 Holy moly. Okay, so speaking of Anthropic and Open AI, you've worked at both. Very few people have worked at both companies and have seen how they've seen how they. operate. I'm curious just what you've noticed about the differences between these two, how they operate, how they think, how they approach stuff. What can you share along those lines? It's more similar than different. Obviously, there is a lot of, like, there are some, like, differences also comes to, like, nuances. A tech culture, I really love Anthropic, and they have a lot of friends there, and I also love opening eye, and I still have a lot of friends.

Starting point is 00:54:05 So it's like, it's not about, like, enemy. I feel like there's, like, in the eye, it was all, like, yeah, they're competitors, those like enemies. This is actually like one big community and like of people like doing the same thing. I would say what I would have learned from Antarprecht is this like real care and craft towards like model behavior, model craft, model training.

Starting point is 00:54:32 And I've been thinking a lot about like, okay, like what makes Cloud and what makes Chachapini, Chichpity. And it's like, I actually comes down to like, operational processes that kind of leads to the outputs, to the model, is the outputted model. And it's like, the reason why Cloud has so much more personality and, like, is more like a librarian. I don't know, like, I don't know, I'm like visualizing a Cloud being like a librarian, somewhat like a very like nerdy or something.

Starting point is 00:55:04 is because I feel like it's a reflection of the creators who are like making this model and like a lot of like details around like the character and the personality and like whether the model should follow up on this question or like not like was the correct like ethical behavior for the model to like in this scenario is like a lot of like craft and like to read it like the assets and this is where I learned that part of like art I guess at Antarctica. I'd say that intharburg is much smaller. Like when I joined, it was like, what, like, 70 people? When I left, it was like 70 people. And like, obviously the culture changed so much. I really enjoyed being, like, early days, startup, like, wives and, like, people knew each other

Starting point is 00:55:50 as a family, but, like, the culture shifted. I would say, like, under I learned from Antarctica that, like, they're much better at, like, focusing and, like, pre-eartization or, like, very, very, hard like very hardcore participation I guess and they need to do it like but I think like opening eyes like much more um innovative and uh much more like risk takers in terms of like product or like research actually you know like I don't know you can like your full-time job can be just like teaching the model how to be like creator writers and it's like there's some luxury in this like research freedom that that comes to scale maybe I don't know um but it gives you, it's like, you'll have, I feel like I have much more creative, like, product

Starting point is 00:56:36 freedom to do almost anything, I guess, within like opening eye, like, if lost Chatsyp. into, like, the illusion that you want. It's like more like, yeah, probably bottoms up, I guess. Yeah, that's how I was thinking about it. It feels like opening eyes more bottoms up, distributed people bubble up ideas, try stuff. There's more, and that emert leads to more products launching. I imagine more things just kind of being tried versus more of a, let's just make sure everything we do is awesome and craft and thinking deeply

Starting point is 00:57:06 about every investment. That's really interesting. I've never heard it describe this way. Karina, we've covered so much ground. This is going to help a lot of people with so many ways of thinking about where the future is going. Before we get to our very exciting lighting around, I'm curious if there's anything else that you think might be helpful

Starting point is 00:57:21 to share or get into. One of my regrets, I guess, when I was early days at John's Deveregge. that like I think there was like some luxury of the time this pre-chatypcate to actually like come in with like a bunch of ideas and like prototype like almost every day um and i think like we did a lot of cool ideas like cloud and slack was actually one of the first like uh tool-usy like products is like cloud could operate in like your work place now it's like kind of cool when you like add clod summarize the thread so maybe you have a entire conversation with someone and then

Starting point is 00:58:02 you want to like a summary or like what happened like you can say like at cloud summarize this also it was really fun to like even like iterate on the model itself it's like when you just like talk to the model and like slack forever um it created like some social element it's kind of cool it's kind of like me journey and like um this discord like people learned so much about like prompting and like how do you work with like cloud I feel one of the features that was like early tasks, part of the type is like, you know,

Starting point is 00:58:32 every Monday clock would just like summarize the entire channel. Or like every Friday we just like summarize like a bunch of channels and give like the news about the organization or something. So it's kind of like really cool like phone factor. I think I'm thinking about like phone factor is like a really important like question like in AI. especially we haven't even figured out how do we create an awesome product experience with O-Series models

Starting point is 00:59:04 it's like the paradigm between like synchronous real-time, give an answer paradigm into like more asynchronous paradigm of like agents working on the background but then now the question is like the agents should build trust with you right and trust both over time which is like with humans and you know you start

Starting point is 00:59:25 this collaboration, which is why like this collaboration model was like you and a model is like so important because you both trust and the model learns from your preferences so that it can become like more personalized. And it will start predicting the next like action that you want to take on the computer or something. And it's like kind of like more predictive, much more. We went from like personal computers like personal model basically here. Why is it not a thing? That seems like such an obvious feature that every LM should have as a Slackbot version of them.

Starting point is 00:59:59 Is that a thing I can have you install or is that not a thing right now? I know that Cloud and Slack was sunsetted in like 20, 23 or something. But I think it was like after Chichipiti, it was mostly like the focus on like consumer use cases or like enterprise use cases. I think it didn't want like, I think the form factor of like cloud and Slack is like was kind of constrained a little bit when you wanted to develop new features.

Starting point is 01:00:28 No, I want that. I know that JigP had like SlackBart tools. I don't know, like maybe it will come back. All right, I would pay for that. Any other memories from that time of early days? Because that's a really special place to have been as early days anthropic. Any other memories or stories from that time

Starting point is 01:00:45 that might be interesting to share? I think the very first launch when they felt like when clicked something, years, again, was like a 100-key context launch. It's like when the models could input the entire book and give you a summary of the book or something or the entire financial or like have like multi-files financial reports and then like give you an answer to the question, to very specific question. I think there was something in there that kind of like, oh my God, this is like a really cool new capability, not like model capability, but more like the capabilities that came from the product

Starting point is 01:01:27 form factor itself rather than like the model capability as much. I think like other prototypes that we were thinking about like yeah like there's like one part of how like Cloud workspaces and it's like kind of the same idea of like Cloud and I would have the shared workspace and that share records like a document and you can like it's written in the document. And I feel like sometimes the ideas like probably be a lag and they lack for like two years. Just like in this case. It's interesting there are these milestones that kind of open up our view of what is happening and where things are going. Chat GPT I think was the first of just like, wow, this is much better than I would have thought.

Starting point is 01:02:12 You talked about 100K context windows or you could upload a book and ask you questions and have it summarized. I actually used that all time when I have interview guests and they wrote a book. I sometimes don't have time to read the whole book. So I use it to help me understand what the most interesting parts are. And then I actually dive into the book, just to be clear. And then, I don't know, maybe like voice was another one where you could talk to say chat GPT. Is there any other moments there that you're like, wow, this is much better than I thought it was going to be? Yeah, I think like the computer use agents like the model operating the desktop.

Starting point is 01:02:48 up and you can essentially think of like, you know, new kind of like experience where the model can learn the way you browse. And from that preference, it can just like browse as just like you. And it's kind of like simulated persona. And it's actually very similar to the idea of like, okay, like maybe Sam Altman doesn't have a little like time. I want to talk to like he's simulated like his simulation and ask like or like for example like yeah like I really appreciate some of the technical management of like Jacob like but he doesn't have a lot of

Starting point is 01:03:28 time so it's like I really want to like ask him this question like how do you respond like simulated environments like this would be really cool that's a great place to plug lennie bot I have one of those it's trained on all of my podcast and newsletters and it sits on many models I don't know which one exactly they use, but it's exactly that. And it's not even me. It's all the guests that have been on the podcast on the newsletter as I wrote. And you could just ask it, how do I grow my product? How do I develop a strategy?

Starting point is 01:03:56 And it's actually shockingly good. Do you feel like it reflects who you are? The best part of it is you can talk to it. It's built. There's an 11 Labs voice version that's trained on my voice from this podcast. And it's actually very good. And people like have told me they sit there for hours talking to it. And somebody told it, interview me like I am on Lenny's podcast, asked me questions about my career, and he did a half-hour podcast episode with Lenny Bart.

Starting point is 01:04:25 That's so fun. It's incredible. Future is wild. Yeah. I think like content transformation is like, you know, like I would imagine sometimes like, you know, when you generate a sci-fi story in Canvas, like you can like transform this into like audio book, like where I have like, where I have like, where you know. natural like content transformation like one media to another media. I think like one of my earliest inspiration is like one of the last episodes of like Westworld where I want to flow, but where Dolores comes to her work at that time and she comes to like this like new workspace and

Starting point is 01:05:08 she starts like writing a story and then she writes a story like a 3D like virtual reality starts like creating on the fly. So I kind of want to create that. Kind of cool. Wow. Speaking of medium, I guess I was wondering if I should go in this direction or not, but

Starting point is 01:05:29 real quick, Kevin Weil slash Kevin Wheel, I don't know exactly that it pronounced his last name, the CPU of Upenae. Is a Weil or Wheel? I think Real. Wheel. Okay, okay. Let's just say that. Real. He was he did a panel at the Lennian Friends Summit last year, and he made this really

Starting point is 01:05:47 fascinating point that chat is a really interesting interface for these tools because they're just getting smarter and smarter, smart and smart and smarter, and smart and smarter, and chat continues to work as a paradigm to just interact with them. Similar to a human. You could talk to Albert Einstein. You could talk to someone not very smart, and it's all conversation still. And so it's a really flexible way to interact with increasingly good intelligence. at some point it'll not be so great

Starting point is 01:06:13 and you're talking about all these ways that you're adding additional ways to interact but it's interesting chat proof to be a really powerful layer on top of all the stuff yeah that's really cool I feel like chat also has the social element which is like very humane it's like you know you sometimes want to

Starting point is 01:06:29 like get into group chat and like yeah having conversations there is kind of like a group chat in itself as a messaging actually this idea of like how do you build like features like this like I see tasks as like this like general kind of like feature that will scale very nicely as the models would develop like new capabilities of ourselves. It's like like the model will be able to

Starting point is 01:06:53 like do better like searches and like, you know, create new like come up with like more creative like writing or like render, you know, React apps and like HTML like apps and like you can have like every day a new puzzle for you like every day like continue the story from the future. days. It scales very nicely. You mentioned something as we were getting into this extra section that we ended up going down is this idea of the agents using a computer. I know this is actually something you're going to launch today the day we're recording it, which will be out by the time this comes out. Call operator. Can you talk about this very cool feature that people will have access to?

Starting point is 01:07:33 Yeah, so I unfortunately did not work enough, but I'm really, really excited about this launch. It's basically an agent that can complete the task in its own virtual computer, like in its own virtual environment. You can't do any literally task, like order me a book on Amazon, and then ideally the model will either like follow up with you, like, which book do you want, or like know you so well that they will like start recommending, like, oh, here's the five books that I might recommend you to buy. and then like you hit like, yeah, help me, help me buy.

Starting point is 01:08:13 And then the model goes off into its own virtual little browser and like complete the task and buy the book on the Amazon. And then if you give the model like credentials, credit card, obviously it comes with like a lot of trust and like safety. Then it will just complete the thing for youth. It's a virtual assistant. It's interesting how this just sounds like obviously, obviously, this should happen. Like, why is this not a other thing, which is also mind-blowing that we're just assuming this should exist, like, just some AI doing things for you on a computer that you just ask it to do. Like, it's absurd. It's actually really hard. And I think, like, you're still cracking

Starting point is 01:08:58 this way. I feel like, I don't know if you use like topple. It's like a pair programming product. No. But I don't know if you love pair programming. So if you use... Oh, yeah. Shopify uses this. I remember it came up on a podcast episode. Oh, nice. Yeah, so it's a very cool product where you can just call anyone at any time and then like share screen and the other person can like have access to this dream

Starting point is 01:09:20 or like start like literally operating your computer. And it's very like real time like the allegiance is like very it's like very high quality and it's just like I kind of want the same. It's like I want to like a program with like my model

Starting point is 01:09:38 and the model should even talk to me like draw like very specific like section in my code and VS code and tell me like I would teach me and you can have like different modes it's like right here it's like a product right here for you I don't know some people should build out

Starting point is 01:09:57 sounds like a startup just got birthed from someone listening to this you mentioned that it's very hard to do this agent controlling a computer as you and helping out What makes it so hard for whatever, however much you can explain briefly? Much of it is like, because right now the models operating on like pixels instead of like language or whatnot, like pixels is actually really, really hard for the models because like perception or visual perception. I think there's still like a lot of like multimedia like research that's going on.

Starting point is 01:10:31 But I think like language scaled so much like easier compared to like multimodels because of, that. Another thing that, I guess my team is, like, how do you derive human intent very correctly? It's like, sometimes, like, does the model know enough information to ask a follow-up question or, like, to complete the task? You kind of don't want, like, an agent to, like, go off for, like, 10 minutes and then compile was, like, an answer that you didn't even want that actually creates, like, much more versus user experience. And this is comes as like teaching the model like people skills. It's like, you know, like, what do people like?

Starting point is 01:11:15 Like kind of like creating like the mindset model of the user and like care about the user in order to ask certain questions. Like actually that part is like hard to the models. That relates to what we talked about earlier. It's kind of the soft skill people skills pieces. Not where these models are strong yet. Okay. I'm going to skip the lightning round. I want to ask just one question from the lightning round.

Starting point is 01:11:40 Something fun. Yes. Okay, so when AI replaces your job, Karina, I'm curious what you're getting. And it gives you a stipend, gives you a monthly stipend. Here's your salary for the month. What would you want to do? What do you want to spend your time? What will you be doing in this future world?

Starting point is 01:11:56 I've been thinking about this. I feel like I have a lot of jobs options. I would love to be a writer, I think. I think that would be super cool. You should write short stories, like, sci-fi stories, novels. I really like art history. So, you know, it's like conservationists in the museums

Starting point is 01:12:22 who just, like, try to preserve, like, art paintings, but just, like, painting through a lot of things. I think that would be really cool to do. Yeah. That sounds beautiful. I don't know. What I'm hearing is you need to nerf these models to not get very good at writing so that you can continue. Although at that point, you don't need to do it from, like, you don't need people to buy it.

Starting point is 01:12:47 You're just doing it for fun. So it doesn't even matter if they're incredibly good at writing or art conservation. Oh man. What an episode or a conversation. What a wild time we're living in. Karina, thank you so much for being here. Two final questions. Where can folks find you online if they want to reach out and follow up on?

Starting point is 01:13:03 anything and how can listeners be useful to you? You can find me. I'm on Twitter. It's premium. You can also social with me at email on my website. And my team is hiring. And so I'm looking for research engineers, research scientists, as well as like machine learning engineers. Like people who come from like part of engineers

Starting point is 01:13:26 who want to like learn like model training. I'm actually hiring for like my team. My team is called like Frontier Park research. and the train models, we develop new methods, but for part of the oriented outcomes. What a place to work. Holy moly. What's the best way for people to apply for these very lucrative roles? I think you can show me a DM or Twitter or I'm yet to create a job description. Okay, this is the job description.

Starting point is 01:13:55 Or you can apply into like post-training team. Okay, you're going to get a flood of DMs. I hope you're prepared. Karina, thank you so much for being here. This was incredible. Thank you so much, Lenny. Bye, everyone. It was fun. Thank you so much for listening.

Starting point is 01:14:10 If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app. Also, please consider giving us a rating or leaving a review, as that really helps other listeners find the podcast. You can find all past episodes or learn more about the show at Lenny'spodcast.com. See you in the next episode.

Lenny's Podcast: Product | Career | Growth - OpenAI researcher on why soft skills are the future of work | Karina Nguyen (Research at OpenAI, ex-Anthropic)

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.