Lenny's Podcast: Product | Career | Growth - OpenAI researcher on why soft skills are the future of work | Karina Nguyen (Research at OpenAI, ex-Anthropic)
Episode Date: February 9, 2025Karina Nguyen leads research at OpenAI, where she’s been pivotal in developing groundbreaking products like Canvas, Tasks, and the o1 language model. Before OpenAI, Karina was at Anthropic, where sh...e led post-training and evaluation work for Claude 3 models, created a document upload feature with 100,000 context windows, and contributed to numerous other innovations. With experience as an engineer at the New York Times and as a designer at Dropbox and Square, Karina has a rare firsthand perspective on the cutting edge of AI and large language models. In our conversation, we discuss:• How OpenAI builds product• What people misunderstand about AI model training• Differences between how OpenAI and Anthropic operate• The role of synthetic data in model development• How to build trust between users and AI models• Why she moved from engineering to research• Much more—Brought to you by:• Enterpret—Transform customer feedback into product growth• Vanta—Automate compliance. Simplify security• Loom—The easiest screen recorder you’ll ever use—Find the transcript at: https://www.lennysnewsletter.com/p/why-soft-skills-are-the-future-of-work-karina-nguyen—Where to find Karina Nguyen:• X: https://x.com/karinanguyen_• LinkedIn: https://www.linkedin.com/in/karinanguyen28• Website: https://karinanguyen.com/—Where to find Lenny:• Newsletter: https://www.lennysnewsletter.com• X: https://twitter.com/lennysan• LinkedIn: https://www.linkedin.com/in/lennyrachitsky/—In this episode, we cover:(00:00) Introduction to Karina Nguyen(04:42) Challenges in model training(08:21) Synthetic data and its importance(12:38) Creating Canvas(18:33) Day-to-day operations at OpenAI(20:28) Writing evaluations(23:22) Prototyping and product development(26:57) Building Canvas and Tasks(33:34) Understanding the job of a researcher(35:36) The future of AI and its impact on work and education(42:15) Soft skills in the age of AI(47:50) AI’s role in creativity and strategy development(53:34) Comparing Anthropic and OpenAI(57:11) Innovations and future visions(01:07:13) The potential of AI agents(01:11:36) Final thoughts and career advice—Referenced:• What’s in your stack: The state of tech tools in 2025: https://www.lennysnewsletter.com/p/whats-in-your-stack-the-state-of• Anthropic: https://www.anthropic.com/• OpenAI: https://openai.com/• What is synthetic data—and how can it help you competitively?: https://mitsloan.mit.edu/ideas-made-to-matter/what-synthetic-data-and-how-can-it-help-you-competitively• GPQA: https://datatunnel.io/glossary/gpqa/• Canvas: https://openai.com/index/introducing-canvas/• Barret Zoph on LinkedIn: https://www.linkedin.com/in/barret-zoph-65990543/• Mira Murati on LinkedIn: https://www.linkedin.com/in/mira-murati-4b39a066/• JSON Schema: https://json-schema.org/• Anthropic—100K Context Windows: https://www.anthropic.com/news/100k-context-windows• Claude 3 Haiku: https://www.anthropic.com/news/claude-3-haiku• A.I. Chatbots Defeated Doctors at Diagnosing Illness: https://www.nytimes.com/2024/11/17/health/chatgpt-ai-doctors-diagnosis.html• Cursor: https://www.cursor.com/• How AI will impact product management: https://www.lennysnewsletter.com/p/how-ai-will-impact-product-management• Lee Byron on LinkedIn: https://www.linkedin.com/in/lee-byron/• GraphQL: https://graphql.org/• Claude in Slack: https://www.anthropic.com/claude-in-slack• Sam Altman on X: https://x.com/sama• Jakub Pachocki on LinkedIn: https://www.linkedin.com/in/jakub-pachocki/• Lennybot: https://www.lennybot.com/• ElevenLabs: https://elevenlabs.io/• Westworld on Prime Video: https://www.amazon.com/Westworld-Season-1/dp/B01N05UD06• A conversation with OpenAI’s CPO Kevin Weil, Anthropic’s CPO Mike Krieger, and Sarah Guo: https://www.youtube.com/watch?v=IxkvVZua28k• Tuple: https://tuple.app/• How Shopify builds a high-intensity culture | Farhan Thawar (VP and Head of Eng): https://www.lennysnewsletter.com/p/how-shopify-builds-a-high-intensity-culture-farhan-thawar—Production and marketing by https://penname.co/. For inquiries about sponsoring the podcast, email podcast@lennyrachitsky.com.—Lenny may be an investor in the companies discussed. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.lennysnewsletter.com/subscribe
Transcript
Discussion (0)
Not only are you working at the cutting edge of AI and LMs, you're actually building the cutting edge.
When I first came to Andarbite, I was like, oh, God, I really love for the engineering.
And then the reason why I switched to research is because I realized, oh, my God, cloud is getting better at front end.
Cloud is getting better at, like, coding.
I think God can, like, develop new apps.
What skills do you think will be most valuable going forward for product teams in particular?
Creative thinking, and you kind of want to, like, generate a bunch of ideas.
and like filter through them and not just load the best product experience.
I think it's actually really, really hard to teach the model how to be aesthetic or really good visual
design or like how to be extremely creative in the way they're right.
What do you think people most misunderstand about how models are created?
When you taught the model some of the self-knowledge of you actually don't have a physical body
to operate in the physical world. The model would get like extremely confused.
Today my guest is Karina Nguyen. Karina is an AI researcher at OpenAI,
where she helped build canvas, tasks, the 01 Chain of Thought model, and more.
Prior to Open AI, she was at Anthropic, where she led work on post-training and evaluation
for the Cloud 3 models, built a document upload feature with 100K context windows, and so much more.
She was also an engineer at New York Times, was a designer at Dropbox and at Square.
It's very rare to get a glimpse into how someone working on the bleeding edge of AI and LLMs operates
and how they think about where things are heading.
In our conversation, we talk about how teams at Open AI operate and build product,
what skills she thinks you should be building as AI gets smarter, how models are created,
why synthetic data will allow models to keep getting smarter,
and why she moved from engineering to research after realizing how good LMs are going to be at coding.
If you enjoy this podcast, don't forget to subscribe and follow it in your favorite podcasting app or YouTube.
It's the best way to avoid missing future episodes, and it helps the podcast tremendously.
With that, I bring you Karina Nguyen.
This episode is brought to you by Interpret.
Interpret unifies all your customer interactions,
from gong calls to Zendesk tickets, to Twitter threads, to appstore reviews,
and makes it available for analysis.
It's trusted by leading product orgs like Canva, Notion, Loom, Linear, Monday.com, and Strava,
to bring the voice of the customer into the product development process,
helping you build best-in-class products faster.
What makes Interpret special is its ability to build and update customer-specific AI models
that provide the most granular and accurate insights into your business,
connect customer insights to revenue and operational data in your CRM or data warehouse
to map the business impact of each customer need and prioritize confidently,
and empower your entire team to easily take action on use cases like win-loss analysis,
critical bug detection, and identifying drivers of churn with interprets AI assistant wisdom.
Looking to automate your feedback loops and prioritize your roadmap with confidence,
like Notion, Canva, and Linear, visit each other.
B-N-T-E-R-P-R-E-T-R-E-T-R-E-T-O-C-R-E-T-T-R-E-T-R-E-T-R-E-N-E-T-R-E-N-E-T-E-R-T-E-R-E-N-E-T. This episode is brought to be here,
big fan of the podcast and The Nes-Let. Vanta is a longtime sponsor of the show, but for some of our
newer listeners, what does Vanta do and who is it for?
Sure. So we started Vanta in 2018 focused on founders, helping them start to build out their
security programs and get credit for all of that hard security work with compliance certifications
like SOC2 or ISO- 2701. Today, we currently help over 9,000 companies, including some
startup household names like Atlassian, Ramp, and Lang Chain, start and scale their security programs
and ultimately build trust by automating compliance.
centralizing GRC and accelerating security reviews.
That is awesome.
I know from experience that these things take a lot of time and a lot of resources
and nobody wants to spend time doing this.
That is very much our experience, but before the company and some extent during it.
But the idea is with automation, with AI, with software,
we are helping customers build trust with prospects and customers in an efficient way.
And, you know, our joke, we started this compliance company so you don't have to.
We appreciate you for doing that.
And you have a special discount for listeners.
They can get $1,000 off Vanta at vanta.com slash Lenny.
That's V-A-N-T-A-com slash Lenny for $1,000 off Vanta.
Thanks for that, Christina.
Thank you.
Karina, thank you so much for being here.
Welcome to the podcast.
Thank you so much, Lenny, for inviting me.
I'm very excited to have you here because not only are you working at the cutting edge of AI and LLMs,
you're actually building the cutting edge of AI in LMs.
You recently launched this feature,
which basically the first agent feature of OpenEI.
I also just did this survey.
I don't know if you know about this.
I did a survey of my readers
and asked them what tools do you use every day in your work and most used.
And chat GPT was number one above Gmail,
above Slack, above anything else.
90% of people said that use chat dbt regularly.
It's absurd.
And it wasn't around two years ago.
Yeah.
Also, we're recording this the week that Open Eye announced Stargate, which is this half a trillion dollar investment in AI infrastructure.
So there's just like a lot happening constantly in AI.
And you have a really unique glimpse into how things are working, where things are going, how work gets done.
So I have a lot of questions for you.
I want to talk about how you operate and how you work at Open AI, where you think things are going, what skills are going to matter more and less in the future.
And also just where things are going broadly.
So how does that sound?
Sounds great.
so much. Yeah, I was extremely lucky to join early days on Toppaic and kind of learned a lot of
things there and I joined Open AI around like eight months ago. So yeah, I'm excited today.
It's more into it. Okay, I'm going to definitely ask you about the differences between those,
but I want to start more technical and just dive it in. I want to talk about model training.
People always hear about models being trained, these big models, how much data takes, how long it takes,
how much money to us, it takes how we're running out of data, which I want to talk about.
Let me just ask you this question. What do you think people most misunderstand about how models are
created? Model training is more an art than a science. And in a lot of ways, like we as model
trainers, think a lot about, like, data quality. It's one of the most important things in
model training is like, how do you ensure the highest quality data for certain like interaction,
model behavior that you want to create? But the way you debug models is actually very similar,
the way you debug software. So one of the things that I've learned early days at Anthropoc was like,
we've discovered, especially with like clot-3 training when you taught the model some of the
self-knowledge of like, hey, like, you actually don't have a physical body to operate, like,
in the physical world.
But then at the same time, we had data that kind of taught
the model some of the function calls, which is like,
this is how you set the alarm.
And so the model would get extremely confused
about whether it can set an alarm, but it doesn't have a body
in the physical world.
So it's like the model gets confused, and sometimes it's like over-refused.
So sometimes it says, I don't know, like,
sorry, I cannot help you.
And so there is always like a,
balanced trade-off between how do you make the model to be more helpful for users,
but also not being harmful in other scenarios.
So it's always about, like, how do you make the model more robust and, like,
operate across, like, a variety of diverse scenarios.
That is so funny.
I never thought about that.
Most of the data that's trained on is kind of, like, assuming it's like a human
describing the world and how they operate and there's,
they assumes there's a body and you could do things.
in the model told you don't have a body.
Yeah.
Okay.
I want to talk a little bit about data.
While we're on this topic, I know you have strong opinions here.
There's kind of this meme that models are going to stop getting smarter because they're
running out of data.
They're trained in a large part on the internet, and there's only one internet, and they've
already been trained on it.
What more can you show them about the world?
And there's this trend of synthetic data, this term synthetic data.
What is synthetic data?
Why do you think this important?
Do you think it's going to work?
I think there are two questions here.
We can unpack one at a time, but people say if you're hitting the data wall, I think people think more in the terms of like pre-trained large models that are trained on the entire internet to predict the next token.
But what actually the model is learning during that process is actually how do you compress the compression algorithm here?
The model learns to compress a lot of knowledge.
and it learns how to model the world.
So the next prediction of the word, like,
teach me how to drive, basically,
and you only have, like, a few words that will match that,
a car.
So the model actually learns about the world in itself.
So it's like, it's modeling human behavior.
Sometimes it's modeling.
And when you talk to, like, pre-chain models,
which are very, very large,
they're actually extremely diverse and extremely creative
because you can talk to almost any Reddit user
through Pigeen model.
But I think what's happening right now
was like new paradigm of like O1 series
is of like the scaling in post-chaining itself
is not hitting the wall.
And that's because basically we went from like
raw data sets from pre-term.
train models to infinite amount of tasks that you can teach the model in the post-training
world via reinforcement learning. So any task, for example, like how to search the web, how to use
the computer, how to write, well, like all sorts of tasks that you're like trying to teach the model,
all the different skills. And that's why I'm you saying, like, there's no data wall or whatever,
because there will be infinite amount of tasks.
And that's how the model becomes extremely super diligent.
And we are actually getting saturated in all benchmarks.
So I think the bottleneck is actually in evaluations.
We don't have all the frontier like evas, like, I don't know, GPQA,
which is like a Google-proof question answering, like PhD level.
And diligent the benchmark is like getting to like, I don't know,
more than like 60, 70%, which is what HD gets.
So it's like literally hitting the wall and evolves.
I want to follow both those threads.
So the first is on this idea of synthetic data.
It's a simple way to understand it that the models are generating the data
that future models are trained on.
And you ask it to generate all these ways of doing stuff, all these tasks, as you described,
and then the newer models trained on this data that the previous model generate.
Some tasks are synthetically curated.
So this is like an active research area is like how do you can you construct like
new tasks model to like learn?
Sometimes, you know, like when you develop products, you get a lot of like data from
the product and like use the feedback and you can use that data to like this like post-training
world.
Sometimes you still want to like use like human data because actually some of the tasks can be like
really, really hard to teach.
Like experts only know, like, certain knowledge
about some chemicals or biological knowledge.
So you actually need to tap into the expert knowledge a lot.
So yeah, I think to me, like synthetic data training
is more for like product.
It's like a rapid model iteration for similar product outcomes.
And we can dive more into it.
But the way we made Canvas and tasks and new, like, part of features for Chichibati was mostly done by synthetic training.
Let's actually get into that. That's really interesting.
I want to talk about e-vals, but let's follow that thread.
So talk about how this helped you create Canvas.
So when I first came to Open AI, I really had this idea of, like, okay, like, it would be really cool for Chachapiti to actually, like, change the visual interface, but also change, like, the way.
It is with people.
So going from like being a chat bot to more of a collaborative agent and the collaborator,
it is like a step towards like more gigantic systems that become like innovators ultimately.
And so the entire team of like applied engineers, designers, product, like research kind of like got like formed in the air almost out of like nothing.
which is like a collection of people who just got together
and the rapidly started iterating with each other.
Actually, like, Kavis is like one of the,
I would say like the first project at Open AI
where researchers and applied in key years
started working together from the very beginning
of the product development cycle.
And I think like there's a lot of things that we have learned on the way.
But I definitely came to with the mindset of like,
we need to do like really,
rapid model situation such that it would be much easier for engineers to, you know,
work with the latest model possible, but also learn from like user feedback or like early,
like internal dog food, how to be improved the model very rapidly. And, you know, it's really
hard to like kind of like figure out like how people, when you deploy a product, how people would
be able to use it.
And so, like, the way
you synthetically train the model
is basically figuring out, like, what are the
most core behaviors that you
want this product feature to do?
And for Canvas,
for example, it was,
it came down to, like, three main
behaviors. It was, how
do you trigger Canvas for prompts,
like, write me a long essay
when the user intention is
mostly, like, iterating over
long documents, or write me,
piece of code, or when to not trigger canvas for prompts like, can you tell me more about
precedent, like, I don't know, some of the general questions. So you don't want to let trigger
canvas because the user intention is mostly getting answered, not necessarily like iterates
always a long document. The second behavior is how do you teach the model to update the document
when the user asks.
So one of the behaviors that the target model
is actually have some agency on autonomy
to literally go to the document
and select specific sections
and either deleted or edit,
so highlighted and rewrite certain sections.
So sometimes the model,
sometimes the user would just, like, say,
change the second paragraph to be something friendlier.
And we would have to, like, teach the model
to,
literally find the second paragraph in the document and change it to a friendly tone.
So basically you teach both how to trigger edit itself, but also how do you teach the model
to get higher quality edit for that document.
In case of coding, for example, there's also like the question of like how good the model
is of like completely rewriting the document versus like having like very specific targeted
edits. So that's like another like layer of decision boundary within like edit itself. It's like select
the entire document that like rewrite completely or you want to like have like very popular custom
behavior. And you know like when you first launch the model, we would bias the model towards like
more rewrites because you thought the quality of the rewrites were like much higher. But over
time you're like kind of shifting based on like user feedback and what's your learning from iterative
deployment.
Lastly, the third behavior that we taught genetically, the model is how to make comments on any document.
So the way we used it is like we would use a one model to produce, to like simulate like use a conversation.
Let's say like, write me a document about XYZ.
But then we used O1 to like produce the document.
And then we kind of injected like user prompt to be like,
oh, make some comments, critique my piece of writing,
or critique this piece of writing that you just made.
And then we taught the model to make comments on the document,
on very specific document.
So it's also like what kind of comments you want the model to make.
Like, do they make sense or not?
Like, how do you teach the quality of that?
And it all came down to like measuring progress via very robust evolves.
But yeah, this is how you like used like a long like kind of like synthetic data generation for like the staining.
Okay.
This is so interesting.
So you talk about this idea of teaching the model and you mention how it's using synthetic data to teach the model different behaviors.
Is a simple way to think about it?
Basically, that's where you do that by showing it what success looks like using basically e-valves.
Is that the simple way to think about it?
Like here's what you doing this successfully would look like and that teaches it.
Okay, I see this is what I should do.
Great. Yeah. Amazing. Yeah, you got it.
Okay, got it. I want to start unpacking what your day-to-day looks like as you're building these sort of things.
Is it like you sitting there talking to some version of chat GPT crafting these evils?
Sometimes I do that. Sometimes I do sit with chat GPT.
Actually, I think I learned this so much from Antarctica.
It's like people spend so much time just like prompting models and like quality a little bit backbush all the time.
and you actually get a lot of new ideas
how do you make the model better?
It's like, oh, like, this response is kind of weird.
Like, why is it doing this?
And you start, like, debugging or something
or, like, you start, like, figuring out, like, new methods
and, like, how do you teach the model to respond in a different way,
like, have better personality, let's say?
So it's the same thing of, like, how personality is made, like,
in the models with those.
It's, like, very similar methods.
But yes, I think my time atopaday I have changed.
I think when they first came, I was mostly like research IC work.
So I was like building a lot of like, I was like writing code, like, you know, changing models, writing evas, working with PMs and like designers to like learn, teach them how to like even think about like evaluations.
I think it was like really cool experience.
And I think this is like an adoption of like how do we like do this.
this like prior management of like AI features or like AI models.
Yeah, but now it's like mostly like, you know, like management and like mentorship.
I'm still like doing SC like research code after like 4 p.m. although, but yeah, it's kind of like
changed.
All right.
Don't talk too much about being a manager because everyone's firing their managers.
Who needs managers anymore?
That's what I hear now.
Just kidding.
It's interesting that so much of your time was spent on teaching product teams how evals integrate and how important that is.
And I've heard this a few times and I haven't personally experienced it yet.
So I think it's an important threat to follow is just how writing these evaluations is going to become increasingly an important part of the job of product teams, especially when they're building AI features and working with elements.
So can you just talk a bit more about what that looks like?
Is it like sitting there with an Excel spreadsheet basically showing like here's the input, here's the output, here's the output.
but here's how good the result was.
Talk about what that actually looks like very practically.
It certainly depends on what you're developing,
but there are various types of like evaluations.
So sometimes I do ask product managers
or there's also like new role that we have like model designers
to kind of like go through some of the user feedback maybe
or like think of like various like user conversations
that should have triggered like under this.
of some sense is it should trigger canvas. And then you have this like ground truth label of like,
okay, with this conversation, it should look trigger a careless. Under this conversation,
it should not trigger a canvas. And you have this like very deterministic kind of like evolve
that for like this is about behavior is just like this. When we were launching tasks, for example,
like, how do you make correct schedules is like actually really hard for the model. But we built out
like some of the deterministic evaluations
that is like, okay, like if the user says like 7 p.m.
it should like, the model should say 7 p.m.
So if you can like have a different domestic evolves
whether it's like pass or fail.
So yeah, and like the way it works is like,
sometimes I ask,
product managers just like, like,
go create like a Google sheet, like have different tabs
and like, what's the current behavior,
what's like the ideal behavior?
And like why or like some modes
And sometimes we usually use it for evils, sometimes we use it for training.
Because if you give the spreadsheet to like a one model, it can probably figure out like how to teach itself a good behavior.
And I think there are a second type of like eviles that is kind of more prevalent is like human evaluations.
And you can have specific trainers or you can have like internal people to,
when you have like a conversation of the prompt
and then you have like various completion of models,
you kind of choose the win rate, which model is the best,
which model produce the highest quality comment or edit.
And then you can have like continuous win rates.
And as you develop new models,
it should always like win over the previous models.
So it depends on what you want to measure.
So interesting.
Like basically what I'm hearing in the something,
I'm learning about as I talk to people is product development might move from this like,
here's a spec, PRD, let's build it together, and then cool, let's review it. Are we happy with this?
From that to, hey, AI, build this thing for me, and here's what correct looks like.
And I'm spending all my time on what does correct look like on e-vales, essentially.
You definitely want to like measure progress for the model. And this is where e-balances is because, like,
you can have prompted model as a baseline already.
And the most robust evolves is the one where prompted baselines get the lowest score or something.
And then because then you know, like, if you're trained a good model, then it should like,
just like hill climb on that evolve all the time while not like also like regressing on like other
intelligence evolves.
So it's like I think it's more what, that's what I'm saying like it's more of an R than science.
is like, okay, like, if you optimize the model for this behavior, like, you kind of don't want to, like, brain damage in, like, other areas of intelligence or this is happening, like, all the time in every lab and every, like, research team.
I would say, like, prompting is, like, also a way to, like, prototype, like, new product ideas.
Like, early days at Andarberg, when I was working on, like, file uploads feature.
I remember I was just, like, you know, prompting the model.
to just like, I mean when we were like launching like a hundred key context,
I was just like prototyping this in a local person, which I did the demo, like,
people really, really loved it and they just like wanted like API for like file uploads or
something. And then that's when it clicked to me like I also like wrote a blog post
Alampton like a new way of like product development or like prototyping for designers and for like
product measures. For example, one of the features that I wanted to do is have like personalized
startup prompts. So whenever you come to like cloud, like it should like recommend you like
starter prompts based on what your interests are. And so like you can literally do it like prompting
for that. And it's like another feature was like generating titles for the conversations. It's
this very small micro-age spirits, but I'm really proud of.
The way we did that was we took like five latest conversation from the user,
like, asked the model, like, what's the style of the user?
And then, like, for the next kind of new conversation,
the generated title will be of the same, like, style.
The user's like really little, like, micro-experience is like this.
That's so cool.
Did you do that atthropic or at OpenAI?
At anthropic.
Okay, cool.
I love the file upload feature that Cloud has, by the way,
Chat ChaptiPT doesn't have that yet, is that right?
I think it has.
I think the way it's implemented is very different, though.
Okay, maybe it's the PDF feature because I use it all the time with call it.
Okay.
That's cool.
So I needs to get on that.
Man, it's wild how many features you built that I use every day and that many people use every day.
This prototyping point you made is really important.
It's something that comes up a ton on this podcast also of how that is maybe the way that AI has most impacted the job of product builders recently is just prototyping.
Instead of going from showing just like, here's a PRD, here's a PRD, here is a,
design PMs more and more, just here's the prototype of the idea that I have if it's working,
you can play with it.
Yeah.
Yeah.
Okay.
I want to spend a little more time on how you operate.
So you talked about, you built this in launch of those tasks features.
Is that the way you describe your tasks?
Yeah.
So talk about how that emerged.
And let's better understand just how you collaborate with product teams and how OpenA works
in that way, whatever you can share there.
I think canvas and tasks are going into the bucket of all project.
where it's like more like short or like medium terms.
And actually the way canvas and tasks came about to be was like it started with like one person
prototyping and creating like a spec.
It's kind of like PRD.
It's like creating a spec of like the behavior of the model.
I don't think like tasks is like extremely like grand brady.
groundbreaking feature necessarily.
What makes it like really cool
is because the models are so general,
model can now search,
they can like write sci-fi stories,
they can like search for stocks,
they can like summarize the news every day
because the models are so general,
like giving something familiar to people
that like, you know, notification is like very familiar.
Like having reminders is like very familiar.
So like creating like a form factor
for the people who like very,
very familiar. Same with as like cameras, right?
Google Docs is very familiar.
But then you add like magical AI moment
and it becomes like very powerful.
But the way it comes
usually like operationally like
yeah, it starts as like a prototype
like literally prompted prototype
of like how you would want
like the model to behave.
For like tasks, for example,
like you kind of like need to design
a little bit like design
system design thinking is like
okay like well
if the user says like
remind me to go to lunch at 8 a.m. tomorrow.
Okay, what kind of information does a model need to extract from that prompt
in order to create a reminder?
And so this is how you, like, design, like, a spec for a new feature, like a tool.
Canvas and tasks are all tools.
So it's like, how do you, like, create the tool stack?
And then it's like mostly, like, developing JSON schema.
I was like, okay, like from this problem, maybe the model should extract like the time to the user requested.
And then you're thinking about like, which format you want the time to be?
And then like, how do you want the model to like notify you?
It's like basically the user should give instruction to the model.
And then this instruction would like fire off like every day or something at that particular time.
So for example, if you say, like, search, like, every day I want to, like, learn now about the latest AI news, the models should do a ride into, like, okay, like, search for the latest AI news.
And this will, this task will get fired at that particular type that the user requested.
And then, you know, like, your design is like tool spec.
And then, actually, I don't know, like, I feel like sometimes, like, it's like through conversations, I,
like either like people ask me to like join the like team and they're like oh my god like we need to be searchers or like we need like some support like we need like to train the models or sometimes like with canvas canvas is like mostly like I just pitched the idea like it got staffed quite immediately during the break um so I know like it's like depending on the project and usually with staffing it's like mostly like a product manager um model designer actual product design
a couple of researchers don't apply to like applied engineers.
Depends on the complexity of projects.
And then like, you know, for tasks, it took like,
two months or so to go from like zero to one basically.
Oh wow.
For canvas was like four or five months, I guess,
to go from zero to one.
But yeah, and then like, you know, you teach product managers how to like build evils
and maybe, you know, how do we not only like ship the better feature,
but how do we think like more logo term?
Like what kind of like cool features do you want tasks to have?
Like I think it would be nice for tasks to be like a little bit more personalized.
It would be nice to have like to create tasks via voice on a mobile, right?
Like so you kind of need to like, this is how you get like research roadmap right here.
It's like thinking like how the feature will be developed in the future.
And then from there, it's like, you start creating data sets, like, with Iwas, you want to make sure that goes well.
And then, like, you need to have, like, a trade-off between, like, what methods you want to use.
And the reason why I really love, like, synthetic, like, relying purely and synthetic data instead of, like, collecting data from humans is because it's, like, much more scalable.
It's cheap, less than have, like, you literally sample from the model.
and you teach the core behaviors of the models,
and that will generalize to all sorts of diverse coverage.
And when you launch the beta feature,
you learn so much from the users that you can, like,
all your synthetic sets can be shifted in the distribution
of how the users behave in the private behavior,
and this is how you improve.
And this is what happened to this canvas too,
when we launched from beta to GA.
This episode is brought to you by
LUM. LUM lets you record your screen, your camera, and your voice to share video messages easily.
Record a loom and send it out with just a link to gather feedback at context or share an update.
So now you can delete that novel-length email that you were writing. Instead, you can record your
screen and share your message faster. Lume can help you have fewer meetings and make the meetings
that you do have much more productive. Meetings start with everyone on the same page and
and early. Problem solved, time saved. We know that everyone isn't a one-take wonder when it
comes to recording videos, so Loom comes with easy editing and AI features to help you record once
and get back to the work that counts. Save time, align your team, stay connected, and get more done
with Loom. Now part of Atlassian, the makers of Jira. Try a Loom for free today at loom.com
slash Lenny. That's L-O-O-O-M.com slash Lenny.
Something that I want to help people understand, and I don't even 100% understand this, is what's the
simplest way to understand the job of a researcher versus, say, a model designer and other folks
involved? Like, what's the simplest way to understand what researchers do at open air?
So the project that I described, I'm mostly like product-oriented, like research is mostly
the product research. Another part component of my team is actually more like longer-term exploratory
And it's more about like developing new methods, understanding those methods, and a variety of circumstances.
So like basically develop new methods. You kind of like need to follow very similar kind of like recipe of like building e-biles.
But it's like more sophisticated evils. Like you kind of want to have like other distribution or like if you want to like measure journalization, you kind of need to like capture that.
but it's basically more sciencey in a way where
you know if we talk about synthetic data
like one of the hardest things about something data is like
how do you make it like more diverse
diversity in certain data is like one of the most important questions
right now and it's like exploring like ways to inject
like diversity as a general method that will work for all
is like a one of the research explorations
other ones is like more like developing new capabilities
I feel like it's all just about like, you know, like you work on this like new method and you have like signs of life that it's working.
Either you think of like how do you make it more general or you think of like how do you make it very useful or like and this is how like longer term projects become more like medium and short term projects.
That makes sense.
Essentially working on developing ways to make the model smarter 04 or 506.
any ways to like 01 was a big breakthrough right the way it operates where it's not just here's your answer it actually thinks and has
takes time to think through the process of coming up with an answer okay yeah very helpful speaking of that of thinking about the future where things are going
I want to spend some time on just this insight that basically you are building the cutting edge of AI like at the very bleeding edge of where AI is going and where it is and so I'm very curious to hear just your
take on how you think things are going to change in the world and how people work based on where you see things are going.
And I know it's a broad question, but let's say like in the next three years, how do you see the world changing?
How do you see people's way of working changing?
It's a very humbling experience to be in both labs, I guess.
To me, when I first came to Andarbate, I was like, oh, no, I really love from an engineering.
And then, like, the reason why I switched to, like, research is because I realized at that time,
like, oh, my God, like, Cloud is getting better at, like, front end.
Like, Cloud is getting better, like, coding.
I think Cod can, like, develop new apps or something.
And so, like, it can, like, develop new features for the thing that I'm working.
So it's like, it was kind of like this meta-realization where it's like, oh, my God, like,
the world is actually changing.
And they're like, when we first, like, launched 100K context at that time, obviously, you know,
I'm thinking about, like, from factors that's like, yeah, like, file.
uploads were like very natural, very familiar to people, but you can imagine we could just like
make like infinite chats in the cloud that AI app, right? Like as of like it's like in a hundred
key context. But because like file uploads, it's like foreign follows function is like the form
factor of the file uploads kind of enable people to just like literally upload anything, the books
are like any reports, financial and like ask any task to the model.
And then I remember it was like, you know, enterprise customers like, like financial customers are like really interested in that.
And it's like, oh, wow, like actually they, it's actually one of the very common tasks that people do in that setting.
It was like kind of crazy to like see how some of the redundant tasks are getting like automated basically by these like smart models.
And they're entering the era where I actually.
don't know, for example, sometimes if L1 gives me the correct answer or not, because I'm not
an expert in that field. And it's like, I don't even know how to verify the outputs of the models
is because, like, all my experts, not like, they can, like, verify this. So, yes, so basically
there are trends that are going on. The first trend is the cost of reasoning and intelligence
is drastically going down.
I had a blog post about this.
Maybe I should update on, like,
latest benchmarks,
because at that time, like,
MMO, everybody was, like, doing, like,
one benchmark,
and they'd be, like, quickly saturated the benchmarks.
And, like, now we need to, like, do the same plot,
but with another, like, frontier evolve.
But the cost of intelligence is, like, going down
because it becomes, like, much cheaper.
Smart, small models are becoming,
it was smarter than like large models
and that's because of like
the distillation research
this happened with like clotty haiku
I was like working like post-taining
a lot of high school and I realized
it was much smarter than like clotul
which was like way
bigger, let's and like that
but like the power of like small models
become very intelligent
and fast and cheap
we are moving towards that road
that has multiple implications, but that means that, like, people will have more access to AI,
and that's really good.
Like builders and developers will have much better access to AI, but also it means, like,
all the work that has been, like, bottlenecked by intelligence will be kind of, like, unblocked.
So anyone, like, I'm thinking about, like, health care, right?
Like, if I have, instead of all, going to a doctor, I can, like, ask chat GPT,
like give chat chvety a list of symptoms and ask me like oh which like would I have like a cold flu
or like something else like I can literally get the access to like a doctor almost and there's like
been some like research studies around that yeah there's a New York Times story about that where they
compared doctors to doctors using chat chitpT to just chat chapt and just just chat chpt was the
best of the mall.
Like doctors made it worse.
Yeah.
Yeah, that's crazy.
Like, right?
Like, education, I think,
I would have dreamt if, like,
I had the tool, like,
chatypity and when I was, like, young and, like,
would learn so much.
But it's like,
people can now learn almost anything from these models,
so they can learn new language.
They can learn how to build new book apps.
Like, I don't write, anything that you want.
And, like, I'm so.
Like, it's humbling to, like, have, like, launch canvas and, like, bring that thing to the people,
enable them to do something else that they couldn't have ever before.
And I think this is, there's something, like, magical around this experience.
So education will have massive implications, like, I guess, like, scientific research, right?
Like, I think it's, like, the theme of, like, any AI research is, like, augmentate AI research.
It's kind of scary, I'd say, which makes me think that, like, people management will stay, you know?
it's like one of the hardest things to, it's like emotional intelligence for the model,
so like creativity in itself is like one of the hardest things. So writers, I don't think like
people should be worried as much. I think it's like, I think it's alleviate a lot of like redundant
tasks for people. This is awesome. Okay, I want to follow this thread for sure. And it's funny that
what you described is like, you were an engineer anthropic and you're like, okay,
Claude is going to be very good at engineering.
This isn't going to be a potentially career long term,
so I'm going to move into research.
And AI is going to need me for a long time to build it,
to make it smarter.
I would say we still have, like,
I think Canvas team has still have,
like, really cool, like, front engineers that are really, like,
you know, people who, like, really care about, like,
interaction, design, like, interacting with,
like, I don't think, like, models are there yet.
Like, I think if, but we can get the models
So like this top line percent of like front end something, for sure.
So what I want to move on to next along these lines is just, and this is just speculation,
but what skills do you think will be most valuable going forward for product teams in particular?
So folks are listening and they're like, okay, this is scary.
What should I be building now to help me stay ahead and not be in trouble down the road?
What skills do you think are going to be most more and more important?
to build. Yeah, I think like creative thinking, like you kind of want to like, come up,
like generate a bunch of ideas and like filter through them and not just like build the best
product experience. Listening, you know, you want to like build something that like the most
general model will not replace you. And oftentimes you build something and you make it really,
really good for like specific set of users and actually the mode is now in like your user feedback.
The mode is like more in like whether you listen to them like whether you can like rapidly iterate.
Like the mode is like in here.
I don't think like we are yet to like there are so many ideas.
I think there's an abundance of like ideas that you can look like I wouldn't be worried.
I feel like in fact, I do think like people in AI fields are like, I wish they were like a little more more creative and like connecting dots across like to print like fields or something like that to like develop really cool new like generation and new paradigms of interactions with this AI.
Like I don't think we've cracked this problem at all.
A couple of years ago I was like telling some people I was like, you know, you kind of want to like build for the future.
So it's like, it doesn't necessarily matter whether the model is good and not good right now,
but you can build product ideas such that by the time the models will be really good.
It will work really well.
And I think it just like happened naturally.
Like for example, like at Antarctic, like the cloud artifacts.
And I feel like early days of canvas was like back in like 2022, like before,
chat chiptie, like writing idea was like a knowledge chashpee. But I feel like Claude 1.3 model itself
was like not there to like made like really extreme good like high quality edits, for example,
like coding. And I feel like I see like startups like cursor and it's like doing super well. Like
unless because they like iterate so fast, they like invent like new ways or like training models.
they move really fast, they listen to users, like,
massive distribution.
It's like, yeah, it's kind of cool.
That's really helpful, actually.
So what I'm hearing is that soft skills essentially
are going to be more and more important and powerful.
You just to talk to about management, leading people,
being creative and coming up with innovative insights, listening.
There's a post I wrote that I'll link to where I look,
I try to analyze how AI will impact product management.
and we're actually very aligned.
And my sense was the same thing,
that soft skills are going to become more and more important.
And the things that are going to be replaced is the hard skills,
which is interesting because usually people value the hard skills,
like coding, design, writing really well.
And it's interesting that AI is actually really good at that
because it's taking a bunch of data, synthesizing it,
and writing, creating a thing versus all these fuzzy things around
of what influences convinces people to do things
and aligning and listening,
like you said, creativity.
Anything along those lines come up as I say that?
I think it's actually really, really hard to teach the model how to be
aesthetic or like do like visual,
a really good visual design or like how to be extremely creative in the way they write.
I think like, I still think like Chacheepee kind of sucks at like writing.
And that's because it's like it's like bottle mouth by this like creative reasoning.
I think like prioritization is like one of the most important.
Like I think like for a moment.
manage, I feel like, I actually like AI research progress is bottlenecked by like management,
like research management is because you have like constrain set of compute and you need to like
allocate the computers to the research path that you feel the most commenced about.
It was like you need to like really, you need to have like a really high conviction in the research
parts to put the compute and like it's more like return on investment kind of situation.
And it's like, okay, yeah, like, I'm thinking a lot about, like, okay, like, how do, across all my projects, which projects are higher priorities, like, prioritization and also, like, on the lower level, like, which experiments are really important to run right now and which are not and, like, cut through the line.
So I think, like, prioritization, communication, like, management, people skills, like, empathy, like, understanding people, like, kind of, like, collaboration.
I think like canvas wouldn't be like an amazing launch if it wasn't like about like people.
And I think it's the wonderful global group of people.
And like I got a chance to like work with like people like Lee Byron who's like a co-creator like GraphQL and like some of the best like Apple designers.
And it's like so cool to like see.
And like how do you create this like collaboration between people?
It's just like something that's still humane, I think.
Let me just follow us around a little bit because I imagine people.
listening are like, okay, but once we have AGI or SGI, it's like, it'll do all this.
It's like, there's a world where like, why isn't all this done?
I think it's easy to just assume all that.
I'm curious, this idea of creativity and listening, why you think AI isn't good at it,
other than it's just very hard to train it to do this well.
Is there anything there just like why this is especially difficult for AI NLMs to get good at?
I think currently it's difficult for many reasons.
I think it's still like an active research area.
And it's something that like, I think my team is working on is like, okay, how do we
teach the model to be like more creative in like the writing?
And actually like, I'm thinking like this new paradigm of life that the models think more
should actually lead to like better writing in itself.
But like when it comes down to like idea generation.
or like,
discriminating of, like,
what is the good,
like,
visual design and odd?
I feel like it hasn't had learned,
like,
examples from,
like,
people to discriminate it very well.
I do think it's because, like,
you know,
there are not that many people
who are, like,
actually, like,
really, like,
it's not, like,
accessible to, like,
models to learn from these people,
I guess.
So I definitely,
that's why it sucks.
Yeah,
that makes sense. Basically, there's not enough of you yet. Researchers teaching it to do these things slash people that have incredible taste and creativity that can teach these things. You could argue this will come, but I'm not, we don't need to keep going down that thread. Let me ask you a specific question in this post I wrote. I made this argument that a lot of people disagreed with that strategy is something that AI tooling will become increasingly great at and take over. There's the sense that that's the thing that people will
continue to be much better at and you can't off-lotte AI basically developing your strategy,
telling you what to do to win.
My case is, isn't strategy, just take all the inputs, all the data you have available,
understand the world around you, and come up with a plan to win.
Feels like AI would be, like, an L-LM would be incredibly smart at this.
What's your take?
I think so, too.
I think, like, again, like, you teach the model all sorts of, like, tools and, like,
capabilities and, like, reasoning, right?
And it's like, when it comes down to like, as for Canvas right now,
it's been very cool to like, for the models, just like,
aggregate all the feedback from users, like summarize me, like,
the top five, like, most painful flows on user experiences.
And then, like, the model itself is, like, very capable of, like,
thinking of, like, knowing how it's being made,
figure out, like, how to, like, create a data sets for itself to, like, train on it.
And I don't think, like, we are far away from that kind of, like, self-improvement models becoming, like, self-improve.
By, like, then, like, the product development is basically kind of, like, self-improving, like, it's kind of, like, its own, like, organism or something.
Yeah, like, again, like, strategies, like, it's more, like, data analysis and, like, coming up with, like, like, I think what models are really good at is, like, like,
like connecting the dots, I think.
It's like, okay, if you have users feedback from this source,
but you also have an internal, like, dashboard with matrix,
and then you have, you know, like other kind of like feedback or like inputs,
and then like it can co-create, like, a plan for you, like, recommendations even.
And I think this is like one of the most common use cases for ChatsyPTC2
is like coming up with like this sort of things.
That makes sense.
like essentially a human can only comprehend so much information at once and look at so much data at once
to synthesize takeaways. And as you said, these context windows are huge now. Here's all the
information. What's the most important thing I should do? Yeah, same as like scientific research.
It's because like you, like ideally the model would be able to like suggest like ideas, like
iterate on the experiment or like given the empirical results of the previous experiments, like,
how do you like come up with like new ideas or like the methods?
Yeah.
Oh man.
Okay.
So just to close the loop on this conversation, this part of the thread is the skills
you're suggesting people focus on building and leaning into soft skills like creativity,
managing influence, collaboration, looking for patterns.
Is that generally where your mind is at?
Yeah.
I'm thinking a lot about like how do you make organizations more effectively.
And I think this is mostly like management, I guess.
It's like, how do you organize, like, research teams or, like, generally teams, like, combine, composed teams such that they will be at their maximally succeed or, like, at the maximum, like, performance of what can possibly, like, if you can, like, literally create, like, the next generation of computers.
It's just, like, the matter of conviction and, like, the way you manage through that.
It's like scaling organizations or like scaling product research it does.
Yeah, I think what like you're basically building this thing and not efficiently doing it is like limiting the potential of the human species right now.
Right.
Mismanagement within the research team and Open Eye and Anthropic and some of these other models.
Yeah, it's kind of crazy to think about it.
Holy moly.
Okay, so speaking of Anthropic and Open AI, you've worked at both.
Very few people have worked at both companies and have seen how they've seen how they.
operate. I'm curious just what you've noticed about the differences between these two, how they
operate, how they think, how they approach stuff. What can you share along those lines?
It's more similar than different. Obviously, there is a lot of, like, there are some, like,
differences also comes to, like, nuances. A tech culture, I really love Anthropic, and they have a
lot of friends there, and I also love opening eye, and I still have a lot of friends.
So it's like, it's not about, like, enemy. I feel like there's, like, in the eye, it was all, like,
yeah, they're competitors, those like enemies.
This is actually like one big community
and like of people like doing the same thing.
I would say what I would have learned from Antarprecht
is this like real care and craft
towards like model behavior,
model craft, model training.
And I've been thinking a lot about like, okay, like what makes
Cloud and what makes Chachapini, Chichpity.
And it's like, I actually comes down to like,
operational processes that kind of leads to the outputs, to the model, is the outputted model.
And it's like, the reason why Cloud has so much more personality and, like, is more like a
librarian.
I don't know, like, I don't know, I'm like visualizing a Cloud being like a librarian,
somewhat like a very like nerdy or something.
is because I feel like it's a reflection of the creators who are like making this model and like
a lot of like details around like the character and the personality and like whether the model
should follow up on this question or like not like was the correct like ethical behavior for the
model to like in this scenario is like a lot of like craft and like to read it like the assets
and this is where I learned that part of like art I guess
at Antarctica. I'd say that intharburg is much smaller. Like when I joined, it was like, what, like,
70 people? When I left, it was like 70 people. And like, obviously the culture changed so much.
I really enjoyed being, like, early days, startup, like, wives and, like, people knew each other
as a family, but, like, the culture shifted. I would say, like, under I learned from Antarctica
that, like, they're much better at, like, focusing and, like, pre-eartization or, like, very, very,
hard like very hardcore participation I guess and they need to do it like but I think like opening
eyes like much more um innovative and uh much more like risk takers in terms of like product or like
research actually you know like I don't know you can like your full-time job can be just like
teaching the model how to be like creator writers and it's like there's some luxury in this like
research freedom that that comes to scale maybe I don't know um but
it gives you, it's like, you'll have, I feel like I have much more creative, like, product
freedom to do almost anything, I guess, within like opening eye, like, if lost Chatsyp.
into, like, the illusion that you want. It's like more like, yeah, probably bottoms up, I guess.
Yeah, that's how I was thinking about it. It feels like opening eyes more bottoms up,
distributed people bubble up ideas, try stuff. There's more, and that emert leads to more
products launching. I imagine more things
just kind of being tried versus more
of a, let's just make sure everything we do is awesome
and craft and thinking deeply
about every investment.
That's really interesting. I've never heard it describe
this way.
Karina, we've covered so much ground.
This is going to help a lot of people with so many
ways of thinking about where the future is going.
Before we get to our very exciting lighting around,
I'm curious if there's anything else that you think might be helpful
to share or get into.
One of my regrets, I guess, when
I was early days at John's Deveregge.
that like I think there was like some luxury of the time this pre-chatypcate to actually like
come in with like a bunch of ideas and like prototype like almost every day um and i think like we did
a lot of cool ideas like cloud and slack was actually one of the first like uh tool-usy like
products is like cloud could operate in like your work place now it's like kind of cool when you
like add clod summarize the thread so maybe you have a entire conversation with someone and then
you want to like a summary or like what happened like you can say like at cloud summarize this
also it was really fun to like even like iterate on the model itself it's like when you just like talk
to the model and like slack forever um it created like some social element it's kind of cool
it's kind of like me journey and like um this discord like people learned so much about like
prompting and like how do you work with like cloud
I feel one of the features that was like early tasks,
part of the type is like,
you know,
every Monday clock would just like summarize the entire channel.
Or like every Friday we just like summarize like a bunch of channels and give like the news about the organization or something.
So it's kind of like really cool like phone factor.
I think I'm thinking about like phone factor is like a really important like question like in AI.
especially we haven't even figured out
how do we create
an awesome product experience
with O-Series models
it's like the paradigm between like synchronous
real-time, give an answer
paradigm into like more asynchronous
paradigm of like agents working on the background
but then now the question is like
the agents should build trust with you right
and trust both over time which is like with humans
and you know you start
this collaboration, which is why like this collaboration model was like you and a model is like
so important because you both trust and the model learns from your preferences so that it can
become like more personalized. And it will start predicting the next like action that you want to
take on the computer or something. And it's like kind of like more predictive, much more.
We went from like personal computers like personal model basically here.
Why is it not a thing?
That seems like such an obvious feature that every LM should have as a Slackbot
version of them.
Is that a thing I can have you install or is that not a thing right now?
I know that Cloud and Slack was sunsetted in like 20, 23 or something.
But I think it was like after Chichipiti, it was mostly like the focus on like consumer
use cases or like enterprise use cases.
I think it didn't want like, I think the form factor of like cloud and Slack is like
was kind of constrained
a little bit
when you wanted to develop new features.
No, I want that.
I know that JigP had like SlackBart tools.
I don't know, like maybe it will come back.
All right, I would pay for that.
Any other memories from that time of early days?
Because that's a really special place to have been
as early days anthropic.
Any other memories or stories from that time
that might be interesting to share?
I think the very first launch when they felt like
when clicked something,
years, again, was like a 100-key context launch. It's like when the models could input the entire
book and give you a summary of the book or something or the entire financial or like have like
multi-files financial reports and then like give you an answer to the question, to very specific
question. I think there was something in there that kind of like, oh my God, this is like a really cool
new capability, not like model capability, but more like the capabilities that came from the product
form factor itself rather than like the model capability as much. I think like other
prototypes that we were thinking about like yeah like there's like one part of how like Cloud
workspaces and it's like kind of the same idea of like Cloud and I would have the shared workspace
and that share records like a document and you can like it's written in the document.
And I feel like sometimes the ideas like probably be a lag and they lack for like two years.
Just like in this case.
It's interesting there are these milestones that kind of open up our view of what is happening and where things are going.
Chat GPT I think was the first of just like, wow, this is much better than I would have thought.
You talked about 100K context windows or you could upload a book and ask you questions and have it summarized.
I actually used that all time when I have interview guests and they wrote a book.
I sometimes don't have time to read the whole book.
So I use it to help me understand what the most interesting parts are.
And then I actually dive into the book, just to be clear.
And then, I don't know, maybe like voice was another one where you could talk to say chat GPT.
Is there any other moments there that you're like, wow, this is much better than I thought it was going to be?
Yeah, I think like the computer use agents like the model operating the desktop.
up and you can essentially think of like, you know, new kind of like experience where the model
can learn the way you browse.
And from that preference, it can just like browse as just like you.
And it's kind of like simulated persona.
And it's actually very similar to the idea of like, okay, like maybe Sam Altman doesn't
have a little like time.
I want to talk to like he's simulated like his simulation and ask like or like for example like yeah
like I really appreciate some of the technical management of like Jacob like but he doesn't have a lot of
time so it's like I really want to like ask him this question like how do you respond like simulated
environments like this would be really cool that's a great place to plug lennie bot I have one of
those it's trained on all of my podcast and newsletters and it sits on many models I don't know which one
exactly they use, but it's exactly that.
And it's not even me.
It's all the guests that have been on the podcast on the newsletter as I wrote.
And you could just ask it, how do I grow my product?
How do I develop a strategy?
And it's actually shockingly good.
Do you feel like it reflects who you are?
The best part of it is you can talk to it.
It's built.
There's an 11 Labs voice version that's trained on my voice from this podcast.
And it's actually very good.
And people like have told me they sit there for hours talking to it.
And somebody told it, interview me like I am on Lenny's podcast, asked me questions about my career, and he did a half-hour podcast episode with Lenny Bart.
That's so fun.
It's incredible.
Future is wild.
Yeah.
I think like content transformation is like, you know, like I would imagine sometimes like, you know, when you generate a sci-fi story in Canvas, like you can like transform this into like audio book, like where I have like, where I have like, where you know.
natural like content transformation like one media to another media. I think like one of my
earliest inspiration is like one of the last episodes of like Westworld where I want to flow, but
where Dolores comes to her work at that time and she comes to like this like new workspace and
she starts like writing a story and then she writes a story like a 3D like virtual reality
starts like creating on the fly.
So I kind of want to
create that.
Kind of cool.
Wow. Speaking of
medium, I guess I
was wondering if I should go in this direction or not, but
real quick, Kevin Weil
slash Kevin Wheel, I don't know exactly that it pronounced
his last name, the CPU of
Upenae.
Is a Weil or Wheel? I think Real.
Wheel. Okay, okay. Let's just say that.
Real. He was
he did a panel at the Lennian Friends Summit last year, and he made this really
fascinating point that chat is a really interesting interface for these tools because
they're just getting smarter and smarter, smart and smart and smarter, and smart and smarter,
and chat continues to work as a paradigm to just interact with them.
Similar to a human.
You could talk to Albert Einstein.
You could talk to someone not very smart, and it's all conversation still.
And so it's a really flexible way to interact with increasingly good intelligence.
at some point it'll not be so great
and you're talking about all these ways
that you're adding additional ways to interact
but it's interesting chat proof to be
a really powerful layer on top of all the stuff
yeah that's really cool I feel like
chat also has the social element
which is like very humane
it's like you know you sometimes want to
like get into group chat and like
yeah having conversations there is kind of like a group chat
in itself as a messaging
actually this idea of like
how do you build like features
like this like I see tasks
as like this like general kind of like feature that will scale very nicely as the models
would develop like new capabilities of ourselves. It's like like the model will be able to
like do better like searches and like, you know, create new like come up with like more creative
like writing or like render, you know, React apps and like HTML like apps and like you can
have like every day a new puzzle for you like every day like continue the story from the future.
days. It scales very nicely.
You mentioned something as we were getting into this extra section that we ended up going down
is this idea of the agents using a computer. I know this is actually something you're going
to launch today the day we're recording it, which will be out by the time this comes out.
Call operator. Can you talk about this very cool feature that people will have access to?
Yeah, so I unfortunately did not work enough, but I'm really, really excited about this launch.
It's basically an agent that can complete the task in its own virtual computer,
like in its own virtual environment.
You can't do any literally task, like order me a book on Amazon,
and then ideally the model will either like follow up with you,
like, which book do you want, or like know you so well that they will like start recommending,
like, oh, here's the five books that I might recommend you to buy.
and then like you hit like, yeah, help me, help me buy.
And then the model goes off into its own virtual little browser and like complete the task and buy the book on the Amazon.
And then if you give the model like credentials, credit card, obviously it comes with like a lot of trust and like safety.
Then it will just complete the thing for youth.
It's a virtual assistant.
It's interesting how this just sounds like obviously, obviously,
this should happen. Like, why is this not a other thing, which is also mind-blowing that we're just
assuming this should exist, like, just some AI doing things for you on a computer that you just
ask it to do. Like, it's absurd. It's actually really hard. And I think, like, you're still cracking
this way. I feel like, I don't know if you use like topple. It's like a pair programming
product. No. But I don't know if you love pair programming. So if you use... Oh, yeah. Shopify
uses this. I remember it came up on a podcast episode.
Oh, nice. Yeah, so it's a very
cool product where you can just call
anyone at any time
and then like share screen and the other
person can like have access to this dream
or like start like literally
operating your computer. And it's
very like real time like
the allegiance is like very
it's like very high quality
and it's just like
I kind of want the same. It's like
I want to like a program with like my model
and the model should even talk to me
like draw like very specific like section
in my code and VS code and tell me like
I would teach me and you can have like different modes
it's like right here it's like a product
right here for you
I don't know
some people should build out
sounds like a startup just got birthed
from someone listening to this
you mentioned that it's very hard to do this
agent controlling a computer
as you and helping out
What makes it so hard for whatever, however much you can explain briefly?
Much of it is like, because right now the models operating on like pixels instead of like language or whatnot, like pixels is actually really, really hard for the models because like perception or visual perception.
I think there's still like a lot of like multimedia like research that's going on.
But I think like language scaled so much like easier compared to like multimodels because of,
that. Another thing that, I guess my team is, like, how do you derive human intent
very correctly? It's like, sometimes, like, does the model know enough information to ask a
follow-up question or, like, to complete the task? You kind of don't want, like, an agent to, like, go
off for, like, 10 minutes and then compile was, like, an answer that you didn't even want
that actually creates, like, much more versus user experience.
And this is comes as like teaching the model like people skills.
It's like, you know, like, what do people like?
Like kind of like creating like the mindset model of the user and like care about the user in order to ask certain questions.
Like actually that part is like hard to the models.
That relates to what we talked about earlier.
It's kind of the soft skill people skills pieces.
Not where these models are strong yet.
Okay.
I'm going to skip the lightning round.
I want to ask just one question from the lightning round.
Something fun.
Yes.
Okay, so when AI replaces your job, Karina, I'm curious what you're getting.
And it gives you a stipend, gives you a monthly stipend.
Here's your salary for the month.
What would you want to do?
What do you want to spend your time?
What will you be doing in this future world?
I've been thinking about this.
I feel like I have a lot of jobs options.
I would love to be a writer, I think.
I think that would be super cool.
You should write short stories, like, sci-fi stories,
novels.
I really like art history.
So, you know, it's like conservationists in the museums
who just, like, try to preserve, like, art paintings,
but just, like, painting through a lot of things.
I think that would be really cool to do.
Yeah.
That sounds beautiful.
I don't know.
What I'm hearing is you need to nerf these models to not get very good at writing so that you can continue.
Although at that point, you don't need to do it from, like, you don't need people to buy it.
You're just doing it for fun.
So it doesn't even matter if they're incredibly good at writing or art conservation.
Oh man.
What an episode or a conversation.
What a wild time we're living in.
Karina, thank you so much for being here.
Two final questions.
Where can folks find you online if they want to reach out and follow up on?
anything and how can listeners be useful to you?
You can find me. I'm on Twitter.
It's premium.
You can also social with me at email on my website.
And my team is hiring.
And so I'm looking for research engineers,
research scientists, as well as like machine learning engineers.
Like people who come from like part of engineers
who want to like learn like model training.
I'm actually hiring for like my team.
My team is called like Frontier Park research.
and the train models, we develop new methods, but for part of the oriented outcomes.
What a place to work. Holy moly.
What's the best way for people to apply for these very lucrative roles?
I think you can show me a DM or Twitter or I'm yet to create a job description.
Okay, this is the job description.
Or you can apply into like post-training team.
Okay, you're going to get a flood of DMs. I hope you're prepared.
Karina, thank you so much for being here.
This was incredible.
Thank you so much, Lenny.
Bye, everyone.
It was fun.
Thank you so much for listening.
If you found this valuable, you can subscribe to the show on Apple Podcasts, Spotify, or your favorite podcast app.
Also, please consider giving us a rating or leaving a review, as that really helps other listeners find the podcast.
You can find all past episodes or learn more about the show at Lenny'spodcast.com.
See you in the next episode.
