The Data Stack Show - 208: The Intersection of AI Safety and Innovation: Insights from Soheil Koushan on LLMs, Vision, and Responsible AI Development
Episode Date: September 25, 2024Highlights from this week’s conversation include:Soheil’s Background and Journey in AI (0:40)Anthropic's Philosophy on Safety (1:21)Key Moments in AI Discovery (2:52)Computer Vision Applications (...4:42)Magic vs. Reality in AI (7:35)Product Development at Anthropic (12:57)Tension Between Research and Product (14:36)Safety as a Capability (17:33)Community Notes and Democracy in AI (20:41)Expert Panels for Safety (21:38)Post-Training Data Quality (23:32)User Data and Privacy (25:32)Test Time Compute Paradigm (30:54)The Future of AI Interfaces (36:04)Advancements in Computer Vision (38:46)The Role of AGI in AI Development (41:52)Final Thoughts and Takeaways (43:07)The Data Stack Show is a weekly podcast powered by RudderStack, the CDP for developers. Each week we’ll talk to data engineers, analysts, and data scientists about their experience around building and maintaining data infrastructure, delivering data and data products, and driving better outcomes across their businesses with data.RudderStack helps businesses make the most out of their customer data while ensuring data privacy and security. To learn more about RudderStack visit rudderstack.com.
Transcript
Discussion (0)
Hi, I'm Eric Dotz.
And I'm John Wessel.
Welcome to the Data Stack Show.
The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work.
Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Welcome back to the show. We are here with Sohail
Kushan from Anthropic. Sohail, we're so excited to chat with you. Thanks for giving us some time.
Yeah, of course. I'm really excited to be here.
All right, well, give us just a brief background. So I started working in AI in 2018.
Self-driving was my first gig.
I worked on five or five years at a self-driving trucking company called Embark, trying to
make big rigs that drive themselves.
And then continued my work in AI by joining Anthropic.
I joined earlier this year, and I've been working on making cloud better for various
use cases,
especially in the knowledge work domain. So, Sohail, we talked about so many topics before
the show. It is hard to pick a favorite, but I'm really excited about talking use cases
and talking about common mistakes people make when interacting with LLMs. What are some topics
you're excited about? Yeah, I think I'd love to just like talk a bit about where I think things are going from here.
Awesome.
Let's dig in.
So, Hale, what interested you about machine learning?
I mean, that was something that you wanted to explore.
You considered graduate school.
You ended up joining a self-driving startup.
But of all the different things you could have done in sort of the technical or data domain, machine learning drew your attention.
What kind of specific things were attractive?
Yeah, I might be oversimplifying it, but to me it felt like magic. early vision models do things that you know i as an engineer as a software engineer as a technical
person had no idea was possible and had no way of explaining how they worked like anytime something
is cool and you don't know how it works it's like indistinguishable from magic like there's a quote
that goes along those lines and with the realization that wait i could do this like i could be a
magician like i could build this i could figure out how it works. So I think it was just like a shock around like what I would have previously thought was impossible.
Yeah. Do you remember maybe one of the specific moments when you saw a vision model do something
and that was, you know, there are probably probably multiple but like one of the moments where you said okay this is totally different yeah i think for vision it was probably like the early bounding
box detectors like i remember playing around with classical like heuristic ways of trying to like
understand what's in an image using opencv. And then like, seeing the first like, real deep learning base
bounding box of textures that can also track objects over time. After having played around
with like, algorithmic computer vision, and then seeing, oh, whoa, this is like, way better,
it's able to work in variety of conditions, different angles, was like really cool. And I
had a very similar moment in LLMs like I remember seeing
the I think it was GPT two or three blog posts that had Alice in Wonderland maybe like the whole
book or maybe a chapter and it was like recursively summarizing it to distill it from 30 pages to 10
pages and then from 10 to sort of five or sorry from 10 to like five and then eventually one paragraph that was one of those
holy crap moments for me where it is it's like it requires an actual like deep amount of
understanding of the content to be able to summarize and to then be able to do it with like
new phrasing new ways of like rewriting the story was a leap that I had like never seen before up until that
point so those are like two two key moments that I remember so we're gonna spend a bunch of time
talking LLMs and AI but I have to ask on the computer vision side that that I mean self-driving
still like very you know present in the news and and stuff but computer vision in general like
you know I think LLMs have really
like taken over kind of the press. What are some computer vision applications that you think people
like don't know about or some like really neat things that like maybe wouldn't show up for an
average person? Yeah. So I hate to use self-driving because it's probably overdone, but also probably
the average person doesn't know that if you come to San Francisco, you can take a self-driving vehicle anywhere around the city completely autonomously,
and you can download the Waymo app and do it today.
There's so much work that's gone into that, like over 10 plus years of engineering.
And I think it's definitely still the coolest application of computer vision.
I do think that in the longer term horizon, like VR will probably be a very interesting application of computer vision. Like I saw Meta's latest release, Segment Anything 2, I think wasbased model that essentially allows you to pick any arbitrary object and have it segmented and be able to understand the semantics of, okay, this is an object versus background, but also be able to track that over time in a way that was extremely robust, especially if once the frame goes out of the scene and comes right back so there's like so many cool applications
in vr and i think the technology is advancing pretty quickly and you know maybe even just
actually going stepping back from vr like people are working on humanoid robots i think that's a
whole topic of like worth discussion and i don't have actually strong opinions on it but a humanoid
robot would require the level of
computer vision understanding that goes actually beyond what cars are able to do today.
So that's, I think, another area where vision will become really important.
Yeah. And it's always fascinating to me, right? Where a lot of times you see the advances or the
threshold of super usefulness post post like it's like so
let's say everybody kind of moves on to like humanoid robots and then all of a sudden like
cars finally hit like oh wow like this is like we're here but everybody else has kind of moved
on and to a point like i think that happens because like you know you if you can get it
right for robots which is even harder harder goal like you can you solve
some of those like downstream problems that i needed for like that last five percent for cars
or for trucks yeah and like you know there are real applications of computer vision today like
in manufacturing and sort of factories like there's robots that do a lot and a lot of them
have really advanced heading edge like computer vision going on so beyond just sort of like the futuristic use cases there's a lot of really cool use cases today yeah okay so hale
you saw you know early major advances in computer vision and then llms and it was magical but now
you're behind the curtain or have been behind the curtain,
does it still feel as magical or do you feel like a magician?
That's a great question. Yeah. So one of the things that's kind of surprising is that when I was working on self-driving, I kept being a bit of the pessimist. Like I was like,
hey, I think this will take longer than people are saying I think that we're being a
little bit optimistic there are so many situations where I can fail the level of reliability we need
is so high that it's like longer away than people think and I don't feel that way about like LLMs
and transformers I actually feel like the hype is actually warranted. And in both situations, I was like behind the curtains, right?
And I don't think I, yeah.
My only takeaway is that I do think that this is real.
And I do think that I do think that this is like magic that we're building.
And I do think that it will progress really rapidly.
And I, yeah, I'm super excited to be a part of it. And I do think like,
Anthropics founding makes a lot of sense when you are aware of just like the rapid pace of
progress in the space. I wanted to actually dig into that. And I really enjoyed consuming
Anthropics literature because I think number one, the clear articulation of incredibly deep
concepts is absolutely outstanding. But two, I think you address a lot of really serious concerns
around AI and the future. And specifically, like any technology, what happens if it's used in ways
that are really damaging? And so I just love to dig into that and hear from someone inside of
Anthropic. And maybe one thing that we could maybe start out with is just to, I think this is
something a lot of people talk about, especially when you, if you think about your average person, right, they're not deeply aware of the inner workings of an LLM or transformers
or other components of this. And so what are the dangers? How do you think about like, what are the
real dangers that anthropic that makes safety such a core component of the way that you're approaching the problem
and the research?
Yeah, I think my mental framing of it is that this is like incredibly powerful technology.
And incredibly powerful technology can be used for, you know, good or for harm.
And this is true for all kinds of technological innovations that we've made, right?
Like social media can be used for good or for harm, right?
The internet has obviously been 99% good, but it can also be used for harm.
But I think, you know, the current pace of AI progress is showing us that the technology
is super, super powerful.
And I think, you know, I try to put myself into the mindset of like the anthropic founders,
right?
So they're part of opening.
I they're working on research there.
I think Dario was like head of research or VP of research at opening eye.
And they're seeing the progress that's being made from LGBT one to two to
three.
And they're like, okay, this is going to be huge.
Like this is one of the most powerful technologies that humanity has ever
created.
It's very possible that in a few years we'll have like super intelligence.
Like we need to like like think about this seriously like this is pretty serious stuff we need to like think about the implications of this right and so and the other thing about
a sort of ai and the current sort of like technology is that like it's kind of inevitable
like even if you know open ai were to suddenly stop building it,
like other people will build it
and it will exist.
So it's almost a necessity
that someone is taking
like a good, hard, serious look
as to the implications
of what we're building
in a way that's maybe a bit more serious
than back in social media days
where, you know, it was like,
this looks fun.
Let's just build it
and like not really think
through the implications
of this technology.
So that's kind of the
the anthropic like mission.
I think it's basically to
ensure that the world
safely makes the transition
through transformative AI.
Like transformative AI is happening.
It'll very likely be built
at one of these three labs.
But what's most important
is that the transition that humanity is making
goes well is that the world you know ends up being in a better place in the end so that's
kind of the mission and i think everything that entropic dies does is connected to that mission
and so like like doing interpretability research doing safety research doing capabilities research
doing building products are all in service of this like bigger goal. Yeah. How does the product piece play into that
specifically? Because you, it's an interesting approach, right? Because usually product is sort
of comes sequentially after research, right? If you think about in academia, right? You have a
bunch of research that's done and then it's about in academia, right? You have a bunch of
research that's done and then it's like, okay, well, we could build a company around this or
a product around this. And those things are happening very simultaneously at Anthropic
at a very high sort of, I guess, level or pace. I'd love to just know how that works and why.
Yeah. I mean, I think product is incredibly important.
And Anthropic is investing heavily into it. You know, we hired the co founder of Instagram and
the former CTO, Mike to sort of lead our product work here. And I think it's important for a few
reasons. Like one, having your technology into the hands of like, millions of people is really
helpful for understanding it, for figuring out the dynamics of like millions of people is really helpful for understanding it
for figuring out the dynamics of how do people use this thing when it's out there in what ways
does that work in what ways is it not because again if the goal is to make this useful for
humanity it should be interfacing with humanity we should figure out how humanity is going to
be interfacing with it so we can learn and make it better and make it maybe more steerable.
We figure out what people care about and don't care about and actually feeds back into our research, right? So that part is super important. It's also super important as a business. Like
Anthropic needs to have a thriving business. It needs to be a serious player from a financial
perspective to be able to have a seat at the table, whether that's in the space of government,
in the space of having
sort of investors invested in Tropics that would continue our work. And so I think those two
together make it so the product is very important for us. Is there a tension in the company
between the research side and the product side? And when I say tension, I don't necessarily mean in a challenging way,
although I'm sure that there are some challenges. But is there a healthy tension there in terms of
balancing both of those and just the way that the company operates? Because the outcomes and
the way that you would measure success historically tend to be very different. Yeah, I actually think that it is very healthy
here at Entropic.
Like specifically research breakthroughs
create space for new incredible products
and relaying that all the time to the product folks
is super valuable.
And then the inverse is also true
where, hey, we have this product,
but it's really lacking in these specific ways.
These can then feed back into research to figure out, well, why can't Claude do this?
How can we make it better at this?
And so this constant back and forth between product and research is, I think, really key to building long-lasting and useful products.
Artifacts is, on the surface, just a UI enhancement,
like, you know, you could recreate artifacts in other places, too. But because of this,
like constant back and forth between research and product, we're able to like, come up with
paradigms, figure out things that work and don't work, and ship them and create like really
meaningful value for people in a way that, you know, I think you're not seeing as much innovation when it comes to this.
I like in the industry broadly, you're especially seeing it at startups. Like I think startups
in particular come up with really good ideas, but I think at like the biggest companies,
everyone's kind of working on the same thing. So yeah, I do think that that sort of interplay
is really important. And then another one is just like, well, what about safety and product, right? Or what about safety and research? Like,
how does that play into sort of like other tensions there? And I think one thing that's
really helpful there is the responsible scaling policy that we have, which basically sets like
the rules as to what kind of models are we willing to ship? And how do we test them for the things that we care about?
Like, does this model make it easier to create bioweapons or not?
And if that's the case, then we will not ship it,
regardless of whether we have like really cool product breakthroughs
that will go on top of it.
And it kind of becomes like the goalposts and sets the stage.
And as long as we all agree on the RSV, the need for one,
and then also to some degree the
details of it then there's no you know then you can debate the rsp and hey are we being too strict
are we being not but the decision about whether to not to launch something is just about does it
sort of fit with the rsp or not like it's not like i want it shipped versus you want it shipped it's
like does it it's like an objective question of whether it sort of fits within the RSP or not.
So that's like a really cool tool we have to be able to scale responsibly and like make
sure that everyone's aligned and on the same page about it.
Another note I have on this is I kind of view safety as a capability.
Like we talk often about this idea of race to the top.
So if we're able to build
models that are less jailbreakable, that are more steerable, and follow instructions better,
and don't cause harm for people, that then creates incentive for everyone else to match us in that
capability. And these are capabilities, people will be willing to pay for a model that doesn't
actually like his customer support bot that doesn't accidentally say rude
things to the customer or accidentally
make decisions that it shouldn't. And it's really good at instruction
following. Those are capabilities. It's not jailbreakable. You can't convince it to give you a discount.
Those are things that are actually valuable for people. And so safety
and capabilities a lot of that are actually valuable for people and so safety and capabilities a lot
of times are actually like combined like one thought experiment i have is if you truly had
an incredibly capable model then you could just tell it hey here's the constitution here are the
10 things that humanity cares about follow it and then you're done you know like because it's so
capable understanding and like knowledge and like it can think through
things really deeply giving it the exact list of instructions that you want it to follow and then
it can sort of be perfectly aligned to those right yeah so that's a bit of a thought experiment but i
do think there's actually overlap between safety and capabilities yeah i love that okay john i know
you have questions about data but i have one more question on this sort of safety and anthropics, you know, convictions and view of things. So we talked about a model that harms someone. And I think one of the really fascinating questions about developing AI models is that if you look around the world, the definition of harm in different cultures can
vary. And so how do you think about developing something that where safety is a capability
when there is some level of subjectivity in certain areas around these definitions that would define safety as a
capability? Yeah, this is really hard. Like different cultures have different definitions
of harm. And I think hopefully we get to a world where to some degree it is almost like
democratically decided what we're training these models to do and what we're asking them to behave
like. I think for now, the best we can do is sort of come up with a common set
that has the biggest overlap
with the most places in the world
and is like following all the rules and regulations
that every place has decided on.
So it's like the minimal set of overlaps.
But in a future where we have
like really easy to customize models,
you could give it a system prop and say,
hey, actually in this country,
it's a bit more okay to talk about this
or in this country, it's not okay to talk about this in this way.
And, you know, I think hopefully we can like give people to the degree that is like, you
know, reasonable, the ability to like steer the model to sort of behave in a way that
makes sense for their locality.
Yep.
Yeah.
There are limits, of course, but yeah.
Sure, sure.
Yeah.
I mean, I love that you just said this is really hard. Like, yeah, that's sort of fundamental. You know, I think philosophers have been, you know, debating the roots of that question for millennia. twitter uh he bought it he's all about free speech and then he realized okay well there's a reason
like we have some level of fact checking and there's some level you know community notes is
actually a very prominent feature now and like i think as soon as you sort of think about it a
little bit further you realize that there's some level of democracy or community or sort of
connection or sort of like alignment that needs to happen between groups of people it's never like purely sort of like clear-cut yeah yep so on the data side been excited
about digging into this obviously you have a ton of data that that you use you know to train these
models ton of compute required so huge large-scale problem. I want to talk about that. But I actually some
other things you said prompted this question in my mind. When you're talking about like,
you know, we would want to ship a model where you could build a bioweapon.
How do you get the right people in the room to know that would be possible? Because I don't
know anything about bioweapons. Like, presumably you don't either. So like, like, let's start there with data. Like, how do you even know like, what you don't either so like how like let's start
there with data like how do you even know like what you have do you kind of have like a panel
of experts that like span a bunch of different you know knowledge domains is that exactly that
so you know we have teams of people who are focused on exactly these sorts of questions
we like leverage external experts we leverage government agencies and do
all kinds of rigorous testing to understand you know risks around bio around nuclear around you
know cyber security it is really a panel of experts that contribute to making these sorts of decisions
yeah okay that's awesome so on the technical side like tell us a little bit about that like how does
that look obviously it's tons of data you you know, that goes into these, these training, what are some like scale problems,
technology type problems you guys have faced? Yeah, I mean, the scale of data is massive,
right? trillions and trillions of tokens, like dealing with the entire internet text,
dealing with in many ways, dealing with the entire net. In many ways, yeah, dealing with the entire internet.
Yeah, it's not just a sort of a data storage issue.
There's all kinds of other problems with internet data.
And there's multimodal data now, obviously, right?
Like there's a lot of that.
And that takes up significantly more amount of space and much harder to process and networking
and all that.
So the data challenges are massive on the opposite side of
that is i do think there is a cognitive core that we need to like get to when it comes to building
llms like right now a lot of their and this is something that like carpasi mentioned in like a
podcast maybe a week or two ago like a lot of the parameters of these big models are going into like memorizing facts and the like core common sense and cognitive capabilities can be distilled into like a
smaller data set and this is where i think bigger models can help train smaller models
that's helped them to like reason to like know the basic information like models that need to
know every single thing that happened on wikipedia but they need to be able to you know create new data and have
the models like run on their own and sort of learn from their own mistakes and that can help like
address like the data bottleneck too so i'm curious how two things like on the use cases
side of thing how like how internally do you all use you know llm technology and and then and then as a follow-up like let's
dig in a little bit too on like how do you see people kind of outside of like your world using
it and maybe what are some of the mistakes they they make yeah personally i use cloud the most
for coding so i think most of my queries involve like, hey, like, you know, make this
better or how do I do this
in this language? But
I think a lot of it is also just like general
world knowledge. Like,
hey, like my drain
is clogged. Like what would be the right
thing to use? Like things that would previously go into Google
and then you'd have to like
open some blog posts with like 16
ads and at the bottom it's like okay
put baking soda and vinegar you know like now it's just like baking soda and vinegar you know
it's a very direct so yeah recipes are like common everyday note right yeah that's like my one where
it's like you have to scroll so far and there's like a recipe like like you know like 30 pages
and it's like at the very bottom yeah Yeah, totally. Yeah, it starts like by explaining their life story and like, exactly.
And it's like, okay, I just want to make spaghetti like teach me how to do that.
Yeah.
So yeah, just like for all kinds of common queries.
Like one fun example is I had a friend who was a teacher and or he is a teacher and I
reconnected with him after a long time.
And I told him I work in property.
He's like, oh, yeah, like I use it all the time for like lesson planning.
Like it takes care of so much of that.
I'm like,
Hey,
today I want to do like a lesson about X to me,
like come up with some ideas and then make like some homework assignments.
And it can like,
you know,
it's a,
he said is like,
does a great job at that.
So there's all kinds of like,
you know,
things in the context of work that are super helpful.
I use it a lot to just like do question and too like instead of like reading some long thing i'll just like take it i'll throw it
into cloud and be like hey like this is the specific thing i'm looking for is it in here
can you answer it and that's like a big time saver so i think probably um i you know i should probably talk to like more average consumers to understand where they
use llms but i think most people aren't aware like probably the average person in the world
is like has never heard of anthropic and probably the average person in in the states hasn't really
used llms to their maximum potential and so i think it'd be really interesting
to figure out where like the discrepancy is and like you know where people are not aware of how
like lms can make their lives easy because i think it's easy to be in an echo chamber like san
francisco and like assume that everyone's using it exactly the way that you are but i think that's
probably like very far from reality yeah so on the prompt
on the prompting side just want to ask there's a i mean there's a lot out there about you know
people have done some pretty wild you know things with prompting and created you know personas and
all that kind of stuff i'm curious from your perspective like what like what do you think the
most helpful just broadly like things you can do when you're
trying to get the best answers out of an llm when you're interacting with it yeah so we actually
have this tool called the meta prompter where you tell it hey i'm trying to do this can you help me
like write a prompt and it'll like recursively work on the prompt with you to make the prompt
better and best suited for an LLM.
So that's like an example of a tool that I think can help people like do prompt engineering.
Actually, I think there aren't honestly, there aren't like very specific tips that I have when it comes to prompting.
I think like using that tool can help you like see examples of, oh, this is what like a good prompt looks like versus this is a bad
prompt but i think in general like making what you're saying easy to follow and having examples
is probably you know an advice you would give to any person trying to explain something but i think
it is like especially true in the context of lms just like examples in particular really help models
like figure out what you're
trying to do. Yeah, I the what I've seen, which I think relates to that is, is it seems like,
especially like technical people, like want to program that, right? Want to be like, okay,
well, how do I prompt? It's like, all right. And then I've seen some very complicated,
like, oh, I know an engineer wrote this type of prompts. And, you know, it's relatively
hard to benchmark that versus like a more simple prompt. But I've also seen some very simple prompts
that seem to have like pretty similar outputs. Is that like your general experience to where like,
there's some really complicated stuff and some simple stuff, and maybe the gap isn't very big,
you know, between the two? Yeah, yeah i do think like as your instructions get
bigger and bigger like models today do struggle with like internalizing all of it and may start
forgetting little pieces of it and it's not like perfect and so yeah if you can distill it into
like the most key simple parts i think that would generally be helpful. Yeah, I think one other
maybe tip when it comes to prompting is to think of every token that the model has as input and
output as compute units. And by, for example, telling the model to, hey, can you like explain my question and describe your understanding of it before answering it?
Like what you're doing is two things.
Like one is you're just giving the model more ability to compute.
Like it is like, you know, every single forward pass creates causes some amount of computation to happen and you're giving it more of a chance to think.
And I think that can
be pretty helpful and like a very complicated question but also you're giving it a chance for
it to think out loud and put things down on the paper and every single time it puts down a token
for the next token it can look at what it wrote down previously and so having the model be very
explicit think out loud be descriptive and reason gives it just like it
costs you more money right because it's like more tokens that have to get processed and it costs
more like from a compute perspective but that then can help make the model smarter and give you like
answers that are a better line so you know i was putting together an eval one thing i added before
my actual question was describe this document,
figure out what are the relevant parts to it,
and then answer this question.
And that sort of thing can help a lot.
And I guess we're entering this sort of paradigm now of test time compute,
where, you know, you can scale train time compute,
and you're trying to put more into the model,
but you can also scale test time compute, which is having the model explain itself and think out loud and do chain of thoughts and
it turns out you know that that can scale pretty nicely with capabilities especially for certain
types of things like problem solving and math and coding and so that's like a lever that you can
pull you can use a bigger model where more compute went into training,
or you can ask it to think out loud more and leverage test time compute
to get a better answer.
Interesting.
Yeah, that's interesting.
That's actually a very helpful way
to map your interactions
to those different modes of compute.
That's super interesting.
Well, we're going to close in on probably a topic
that we could have started with and taken the entire show up with,
which is looking towards the future.
So one of the ways that I think would be fun to frame this,
so we've just been talking about very natural, you know,
day-to-day ways that, you know, we interact with, you know, with Claude, right?
So how do I unclog my drain?
You know, make this, you know, make thisude, right? So how do I unclog my drain? You know, make this,
you know, make this Python code better, explain my question, you know, all those sorts of things.
But when we think about, I love the way that Anthropic talks about the concept of frontier
and, you know, both in terms of research and product and models.
And one thing that's really interesting
about the way that most people interact with AI,
at least two interesting things to me.
One is that it is so consumer in nature
in that there is,
I guess to put it in a very primitive, like a primitive analogy would be,
it just feels so similar to like open Facebook Messenger, you know, or open Claude and the interface is very similar. You know, there are just so many ergonomics that are really similar. That's one way, which is very consumer and is
ironically just not super different than a lot of interactions that have come previously.
The other really interesting thing is that in many ways, it's disappearing into existing products,
right? So increasingly, the products we use will have these new ways to use features or new features that feel extremely natural, but are like, whoa, that was really cool.
And it's like, okay, well, there's something happening under the hood there, but it's so natural within the product experience that the AI component is sort of blending into the product in a way that isn't discernible, which actually, to your point earlier, you know, it's like that felt kind of magical, right? And it's like,
well, maybe that's the point. Those don't feel super different to the consumer necessarily,
right? Or to the person interacting with it. It just feels like a more powerful version of things
that we were doing before. And that may be that, you know, that's probably an understatement.
When we think about the frontier and especially the research and the future
of AI. I think because the way that we interact with it on a daily basis almost obfuscates that
a little bit. Yeah, I think part of the explanation for why people don't fully understand the safety
implications is because maybe because we've, as an industry, done a pretty good job of doing RLHF and making sure
that the models act in a reasonable aligned way like I think if we throughout these the base model
that has no alignment work done on it like people would be like whoa this model just completely
ripped into me and like made me feel shitty or whoa it just taught me how to like do something
that's pretty illegal.
Like we've done a good job of preventing those sort of interactions. And so people are like,
Oh,
they're super safe.
Like they're super harmless.
And it's like,
great.
That's exactly what we were hoping to happen.
Yeah.
Yeah.
Yeah.
And this is just today,
like as they become more and more capable,
like it becomes an even bigger problem.
But yeah,
I think that means that we've done a good job of just like aligning them making sure that they sort of act in ways that people would
expect and then are harmless and i think yeah you know on on the point about like user interaction
and whether it's like a specific app or just spring into the product and just user interaction
broadly like you know i tell my parents that i you know work in ai or anthropic and i think my
mom was like oh man, it's so scary.
Things are changing so fast.
I'm going to be so obsolete.
I wouldn't even know how to use the future thing.
And I'm like, actually, the future thing will be way easier to use than anything you've
ever used in the past.
You will be able to talk to your computer.
40 years ago, you had to be an expert to use a computer.
You had to understand the command line use a computer you had to like understand the
command line and understand like exactly the commands you'd need to use to like execute
something very specific today you can literally talk to your phone and be like hey how's the
weather from on the trip that i'm going to next week in new york and i'd be like here's the weather
like it becomes more and more natural and more and more human-like which is actually going to
increase accessibility and it's going to make all these things easier and easier to use. And I think
there is a little bit of like, jumping the gun a little bit where people, you know, this is where
things are going. But if you kind of build it before it's ready, you end up with like,
lackluster product experiences, like, okay, like a, an AI for like creating slide decks. And you're
like, this sounds cool like let me
explain the slide deck that i want and it like does kind of a half-assed job and it doesn't
really create exactly what you want and that creates a bad user experience and then people
are distrusting of it and don't use your product anymore like there's definitely a certain level
of capability that needs to exist for that feature to actually feel magical to actually feel useful
and to actually like you know not be frustrating to use. So, but once, once those are there,
interfaces will be very natural. There will be like the most natural human interfaces that we've
ever had before. So yeah, I think it, a lot of it will be disappearing into the things that we use
every day. Like your laptop will be completely like AI based or AI driven and like the way,
you know, interact with your phone will be like that too. or AI driven. And like the way you interact with your
phone will be like that too. And some of it will create like full new modalities. Like, you know,
one, one really cool idea I have is, you know, I think in five years, like, you know, maybe more
or less, whatever you can be like, okay, I'm trying to like install this shelf and I don't
like fully get it. And you would just pull out your phone and
be like yeah this is the shelf like these are the instructions and then you'll have this video
avatar that pops up and talks to you and like has a virtual version of the shelf and says okay you
see this part of the shelf like like drill this part and then you'll look at your thing and be
like oh okay I see and this will be like generated on the fly and like you know you
can't get more like sort of intuitive than that like a literal like person in your phone explaining
something with what you're seeing right outside of the phone that sort of thing will i think very
likely exist so yeah it's going to be like a crazy future wow that's pretty wild actually it's
putting desks together for the kids.
And, you know, you get those things and you have like this little Allen wrench.
And it's like not fully like this sequence is like, you know, important if you get one thing wrong, you know.
You start over practically.
Yeah. So let me know.
Actually, yeah, you'll be the first to know. I'll let you know, yeah.
So full circle, now I'm curious about,
you spent the time with computer vision,
now with LLMs,
and we talked about different applications for LLMs.
I mean, chat's the one everybody knows.
Are there some cool things going on with computer vision type technology and LLMs?
I mean, I've seen some things like but
what are some things that that you see in the future for that yeah so like claude is multimodal
so you can take you know a picture of something whether that's like some document you're looking
at or you know something in the physical world and ask questions about it and it's like particularly
good at like explaining what it sees and going through it in a decent amount of detail but the area that i'm
most excited about is actually you know kind of away from what i was working on before which is
the natural world like computer vision on images and actually vision on digital content so a pdf
right or like a screenshot of your computer or like a website, I think that as
an input exists today, I think it'll get better and better. And then the related capabilities,
like, okay, you know, the first demo of I think, multimodal chat GPT was here's a sketch of a
website, and like you throw it in and take a picture and it tries to like write the code for
that, like that will get better and better over time and obviously there are
multi-modal output models like dolly right where you can ask to generate an image there's now video
with sora and a bunch of other companies doing that audio output too with sort of voice mode
that's coming and also google has their own and there's a bunch of others like moshi. So the three main modalities, text, audio, and vision, and they can be at the
input or output. And, you know, in the case of cloud, you have texts and images inputs, as well
as Texas output. But this list will be continuing to expand over the future. And GPT-4.0 is actually
a three modality and three modality output model i do think that's the future i think
especially vision in particular is useful like i think audio just a personal product take is i
think very useful from a product perspective like i don't think audio is adding new capabilities
into the model but it is a much richer more human way to interact with it. Whereas vision is truly a new capability.
Like you cannot describe, you know, that table and the whole and where to drill it as sort
of a text as you know, you could, but it'd be way, way harder than like, here's an image,
like do this, like, so I think vision actually does add new capabilities. And yeah, you're
seeing a lot of that for like, my focuses is on multimodal vision
in the context of knowledge work.
So how do you make Cloud really good
at reading charts and graphs
and being able to answer the common questions
you might have about a report and stuff.
So that, I think, is super valuable.
One thing I'll also just add on the prior self-driving work
to what I'm working on today
is that people talk about AGI.
I kind of think that AGI, depending on how you define it, is already here.
These are general purpose models that can perform generally intelligent behavior.
And it's more of a question of what data you feed in.
And when I was working on perception and vision, like it was a very
narrow model.
Like it could do bounding boxes on cars and people and pedestrians and lights and stuff.
But we were slowly starting to make it general.
We were slowly starting to add other types of things that you want to detect.
Whereas like Claude and Transformers and Autoregressive Transformers in particular are general purpose thinkers.
They're general purpose, like next token predictors.
And so many things can be framed as a next token prediction problem.
And so that's one of the things that I see that's different about what I'm working on now versus before.
Whereas like I'm working on something very general, which is why audio just kind of works.
You just, you know, discretize it, tokenize it, and then throw it it in and then sort of you know with some tricks and with a bunch of things you have the
same engine that's creating text output creating audio output and i think that's like super cool
in general the same way that your brain is a general purpose cognitive machine there's been
people who like have had different parts of their brain like ablated and suddenly they can't do a specific skill or a specific like
type of kinematic motion.
And then other parts of their brain reconfigure and allow them to do that
over time through retraining,
especially if they're young and early.
Right.
So there's tissue in here that the general purpose system.
And I think we've unlocked that.
We have found a digital analog to a general purpose cognitive engine and
now it's just a matter of scaling it is the way that i feel wow well brooks is messaging us that
we're at the buzzer although i could continue to ask you questions for hours or perhaps days
but so hail this has been so fun i cannot believe we just talked for an hour. I feel like we just hit record, you know, five minutes ago.
Really appreciate the time.
It's been so wonderful for us.
And I know it will be for our audience as well.
Yeah, thanks for coming on the show.
I'm really glad to hear that.
Yeah, appreciate you guys.
This was really fun.
And I hope people get some value out of it. The Data Stack Show is brought to you by Rudderstack, the warehouse-native customer data platform.
Rudderstack is purpose-built to help data teams turn customer data into competitive advantage.
Learn more at rudderstack.com.