The Data Stack Show - 208: The Intersection of AI Safety and Innovation: Insights from Soheil Koushan on LLMs, Vision, and Responsible AI Development

Starting point is 00:00:00 Hi, I'm Eric Dotz. And I'm John Wessel. Welcome to the Data Stack Show. The Data Stack Show is a podcast where we talk about the technical, business, and human challenges involved in data work. Join our casual conversations with innovators and data professionals to learn about new data technologies and how data teams are run at top companies. Welcome back to the show. We are here with Sohail Kushan from Anthropic. Sohail, we're so excited to chat with you. Thanks for giving us some time. Yeah, of course. I'm really excited to be here. All right, well, give us just a brief background. So I started working in AI in 2018.

Starting point is 00:00:46 Self-driving was my first gig. I worked on five or five years at a self-driving trucking company called Embark, trying to make big rigs that drive themselves. And then continued my work in AI by joining Anthropic. I joined earlier this year, and I've been working on making cloud better for various use cases, especially in the knowledge work domain. So, Sohail, we talked about so many topics before the show. It is hard to pick a favorite, but I'm really excited about talking use cases

Starting point is 00:01:15 and talking about common mistakes people make when interacting with LLMs. What are some topics you're excited about? Yeah, I think I'd love to just like talk a bit about where I think things are going from here. Awesome. Let's dig in. So, Hale, what interested you about machine learning? I mean, that was something that you wanted to explore. You considered graduate school. You ended up joining a self-driving startup.

Starting point is 00:01:39 But of all the different things you could have done in sort of the technical or data domain, machine learning drew your attention. What kind of specific things were attractive? Yeah, I might be oversimplifying it, but to me it felt like magic. early vision models do things that you know i as an engineer as a software engineer as a technical person had no idea was possible and had no way of explaining how they worked like anytime something is cool and you don't know how it works it's like indistinguishable from magic like there's a quote that goes along those lines and with the realization that wait i could do this like i could be a magician like i could build this i could figure out how it works. So I think it was just like a shock around like what I would have previously thought was impossible. Yeah. Do you remember maybe one of the specific moments when you saw a vision model do something

Starting point is 00:02:40 and that was, you know, there are probably probably multiple but like one of the moments where you said okay this is totally different yeah i think for vision it was probably like the early bounding box detectors like i remember playing around with classical like heuristic ways of trying to like understand what's in an image using opencv. And then like, seeing the first like, real deep learning base bounding box of textures that can also track objects over time. After having played around with like, algorithmic computer vision, and then seeing, oh, whoa, this is like, way better, it's able to work in variety of conditions, different angles, was like really cool. And I had a very similar moment in LLMs like I remember seeing the I think it was GPT two or three blog posts that had Alice in Wonderland maybe like the whole

Starting point is 00:03:31 book or maybe a chapter and it was like recursively summarizing it to distill it from 30 pages to 10 pages and then from 10 to sort of five or sorry from 10 to like five and then eventually one paragraph that was one of those holy crap moments for me where it is it's like it requires an actual like deep amount of understanding of the content to be able to summarize and to then be able to do it with like new phrasing new ways of like rewriting the story was a leap that I had like never seen before up until that point so those are like two two key moments that I remember so we're gonna spend a bunch of time talking LLMs and AI but I have to ask on the computer vision side that that I mean self-driving still like very you know present in the news and and stuff but computer vision in general like

Starting point is 00:04:23 you know I think LLMs have really like taken over kind of the press. What are some computer vision applications that you think people like don't know about or some like really neat things that like maybe wouldn't show up for an average person? Yeah. So I hate to use self-driving because it's probably overdone, but also probably the average person doesn't know that if you come to San Francisco, you can take a self-driving vehicle anywhere around the city completely autonomously, and you can download the Waymo app and do it today. There's so much work that's gone into that, like over 10 plus years of engineering. And I think it's definitely still the coolest application of computer vision.

Starting point is 00:05:00 I do think that in the longer term horizon, like VR will probably be a very interesting application of computer vision. Like I saw Meta's latest release, Segment Anything 2, I think wasbased model that essentially allows you to pick any arbitrary object and have it segmented and be able to understand the semantics of, okay, this is an object versus background, but also be able to track that over time in a way that was extremely robust, especially if once the frame goes out of the scene and comes right back so there's like so many cool applications in vr and i think the technology is advancing pretty quickly and you know maybe even just actually going stepping back from vr like people are working on humanoid robots i think that's a whole topic of like worth discussion and i don't have actually strong opinions on it but a humanoid robot would require the level of computer vision understanding that goes actually beyond what cars are able to do today. So that's, I think, another area where vision will become really important. Yeah. And it's always fascinating to me, right? Where a lot of times you see the advances or the

Starting point is 00:06:22 threshold of super usefulness post post like it's like so let's say everybody kind of moves on to like humanoid robots and then all of a sudden like cars finally hit like oh wow like this is like we're here but everybody else has kind of moved on and to a point like i think that happens because like you know you if you can get it right for robots which is even harder harder goal like you can you solve some of those like downstream problems that i needed for like that last five percent for cars or for trucks yeah and like you know there are real applications of computer vision today like in manufacturing and sort of factories like there's robots that do a lot and a lot of them

Starting point is 00:07:00 have really advanced heading edge like computer vision going on so beyond just sort of like the futuristic use cases there's a lot of really cool use cases today yeah okay so hale you saw you know early major advances in computer vision and then llms and it was magical but now you're behind the curtain or have been behind the curtain, does it still feel as magical or do you feel like a magician? That's a great question. Yeah. So one of the things that's kind of surprising is that when I was working on self-driving, I kept being a bit of the pessimist. Like I was like, hey, I think this will take longer than people are saying I think that we're being a little bit optimistic there are so many situations where I can fail the level of reliability we need is so high that it's like longer away than people think and I don't feel that way about like LLMs

Starting point is 00:07:59 and transformers I actually feel like the hype is actually warranted. And in both situations, I was like behind the curtains, right? And I don't think I, yeah. My only takeaway is that I do think that this is real. And I do think that I do think that this is like magic that we're building. And I do think that it will progress really rapidly. And I, yeah, I'm super excited to be a part of it. And I do think like, Anthropics founding makes a lot of sense when you are aware of just like the rapid pace of progress in the space. I wanted to actually dig into that. And I really enjoyed consuming

Starting point is 00:08:39 Anthropics literature because I think number one, the clear articulation of incredibly deep concepts is absolutely outstanding. But two, I think you address a lot of really serious concerns around AI and the future. And specifically, like any technology, what happens if it's used in ways that are really damaging? And so I just love to dig into that and hear from someone inside of Anthropic. And maybe one thing that we could maybe start out with is just to, I think this is something a lot of people talk about, especially when you, if you think about your average person, right, they're not deeply aware of the inner workings of an LLM or transformers or other components of this. And so what are the dangers? How do you think about like, what are the real dangers that anthropic that makes safety such a core component of the way that you're approaching the problem

Starting point is 00:09:45 and the research? Yeah, I think my mental framing of it is that this is like incredibly powerful technology. And incredibly powerful technology can be used for, you know, good or for harm. And this is true for all kinds of technological innovations that we've made, right? Like social media can be used for good or for harm, right? The internet has obviously been 99% good, but it can also be used for harm. But I think, you know, the current pace of AI progress is showing us that the technology is super, super powerful.

Starting point is 00:10:18 And I think, you know, I try to put myself into the mindset of like the anthropic founders, right? So they're part of opening. I they're working on research there. I think Dario was like head of research or VP of research at opening eye. And they're seeing the progress that's being made from LGBT one to two to three. And they're like, okay, this is going to be huge.

Starting point is 00:10:38 Like this is one of the most powerful technologies that humanity has ever created. It's very possible that in a few years we'll have like super intelligence. Like we need to like like think about this seriously like this is pretty serious stuff we need to like think about the implications of this right and so and the other thing about a sort of ai and the current sort of like technology is that like it's kind of inevitable like even if you know open ai were to suddenly stop building it, like other people will build it and it will exist.

Starting point is 00:11:08 So it's almost a necessity that someone is taking like a good, hard, serious look as to the implications of what we're building in a way that's maybe a bit more serious than back in social media days where, you know, it was like,

Starting point is 00:11:20 this looks fun. Let's just build it and like not really think through the implications of this technology. So that's kind of the the anthropic like mission. I think it's basically to

Starting point is 00:11:30 ensure that the world safely makes the transition through transformative AI. Like transformative AI is happening. It'll very likely be built at one of these three labs. But what's most important is that the transition that humanity is making

Starting point is 00:11:46 goes well is that the world you know ends up being in a better place in the end so that's kind of the mission and i think everything that entropic dies does is connected to that mission and so like like doing interpretability research doing safety research doing capabilities research doing building products are all in service of this like bigger goal. Yeah. How does the product piece play into that specifically? Because you, it's an interesting approach, right? Because usually product is sort of comes sequentially after research, right? If you think about in academia, right? You have a bunch of research that's done and then it's about in academia, right? You have a bunch of research that's done and then it's like, okay, well, we could build a company around this or

Starting point is 00:12:28 a product around this. And those things are happening very simultaneously at Anthropic at a very high sort of, I guess, level or pace. I'd love to just know how that works and why. Yeah. I mean, I think product is incredibly important. And Anthropic is investing heavily into it. You know, we hired the co founder of Instagram and the former CTO, Mike to sort of lead our product work here. And I think it's important for a few reasons. Like one, having your technology into the hands of like, millions of people is really helpful for understanding it, for figuring out the dynamics of like millions of people is really helpful for understanding it for figuring out the dynamics of how do people use this thing when it's out there in what ways

Starting point is 00:13:11 does that work in what ways is it not because again if the goal is to make this useful for humanity it should be interfacing with humanity we should figure out how humanity is going to be interfacing with it so we can learn and make it better and make it maybe more steerable. We figure out what people care about and don't care about and actually feeds back into our research, right? So that part is super important. It's also super important as a business. Like Anthropic needs to have a thriving business. It needs to be a serious player from a financial perspective to be able to have a seat at the table, whether that's in the space of government, in the space of having sort of investors invested in Tropics that would continue our work. And so I think those two

Starting point is 00:13:51 together make it so the product is very important for us. Is there a tension in the company between the research side and the product side? And when I say tension, I don't necessarily mean in a challenging way, although I'm sure that there are some challenges. But is there a healthy tension there in terms of balancing both of those and just the way that the company operates? Because the outcomes and the way that you would measure success historically tend to be very different. Yeah, I actually think that it is very healthy here at Entropic. Like specifically research breakthroughs create space for new incredible products

Starting point is 00:14:34 and relaying that all the time to the product folks is super valuable. And then the inverse is also true where, hey, we have this product, but it's really lacking in these specific ways. These can then feed back into research to figure out, well, why can't Claude do this? How can we make it better at this? And so this constant back and forth between product and research is, I think, really key to building long-lasting and useful products.

Starting point is 00:15:02 Artifacts is, on the surface, just a UI enhancement, like, you know, you could recreate artifacts in other places, too. But because of this, like constant back and forth between research and product, we're able to like, come up with paradigms, figure out things that work and don't work, and ship them and create like really meaningful value for people in a way that, you know, I think you're not seeing as much innovation when it comes to this. I like in the industry broadly, you're especially seeing it at startups. Like I think startups in particular come up with really good ideas, but I think at like the biggest companies, everyone's kind of working on the same thing. So yeah, I do think that that sort of interplay

Starting point is 00:15:42 is really important. And then another one is just like, well, what about safety and product, right? Or what about safety and research? Like, how does that play into sort of like other tensions there? And I think one thing that's really helpful there is the responsible scaling policy that we have, which basically sets like the rules as to what kind of models are we willing to ship? And how do we test them for the things that we care about? Like, does this model make it easier to create bioweapons or not? And if that's the case, then we will not ship it, regardless of whether we have like really cool product breakthroughs that will go on top of it.

Starting point is 00:16:17 And it kind of becomes like the goalposts and sets the stage. And as long as we all agree on the RSV, the need for one, and then also to some degree the details of it then there's no you know then you can debate the rsp and hey are we being too strict are we being not but the decision about whether to not to launch something is just about does it sort of fit with the rsp or not like it's not like i want it shipped versus you want it shipped it's like does it it's like an objective question of whether it sort of fits within the RSP or not. So that's like a really cool tool we have to be able to scale responsibly and like make

Starting point is 00:16:53 sure that everyone's aligned and on the same page about it. Another note I have on this is I kind of view safety as a capability. Like we talk often about this idea of race to the top. So if we're able to build models that are less jailbreakable, that are more steerable, and follow instructions better, and don't cause harm for people, that then creates incentive for everyone else to match us in that capability. And these are capabilities, people will be willing to pay for a model that doesn't actually like his customer support bot that doesn't accidentally say rude

Starting point is 00:17:28 things to the customer or accidentally make decisions that it shouldn't. And it's really good at instruction following. Those are capabilities. It's not jailbreakable. You can't convince it to give you a discount. Those are things that are actually valuable for people. And so safety and capabilities a lot of that are actually valuable for people and so safety and capabilities a lot of times are actually like combined like one thought experiment i have is if you truly had an incredibly capable model then you could just tell it hey here's the constitution here are the 10 things that humanity cares about follow it and then you're done you know like because it's so

Starting point is 00:18:02 capable understanding and like knowledge and like it can think through things really deeply giving it the exact list of instructions that you want it to follow and then it can sort of be perfectly aligned to those right yeah so that's a bit of a thought experiment but i do think there's actually overlap between safety and capabilities yeah i love that okay john i know you have questions about data but i have one more question on this sort of safety and anthropics, you know, convictions and view of things. So we talked about a model that harms someone. And I think one of the really fascinating questions about developing AI models is that if you look around the world, the definition of harm in different cultures can vary. And so how do you think about developing something that where safety is a capability when there is some level of subjectivity in certain areas around these definitions that would define safety as a capability? Yeah, this is really hard. Like different cultures have different definitions

Starting point is 00:19:11 of harm. And I think hopefully we get to a world where to some degree it is almost like democratically decided what we're training these models to do and what we're asking them to behave like. I think for now, the best we can do is sort of come up with a common set that has the biggest overlap with the most places in the world and is like following all the rules and regulations that every place has decided on. So it's like the minimal set of overlaps.

Starting point is 00:19:35 But in a future where we have like really easy to customize models, you could give it a system prop and say, hey, actually in this country, it's a bit more okay to talk about this or in this country, it's not okay to talk about this in this way. And, you know, I think hopefully we can like give people to the degree that is like, you know, reasonable, the ability to like steer the model to sort of behave in a way that

Starting point is 00:19:58 makes sense for their locality. Yep. Yeah. There are limits, of course, but yeah. Sure, sure. Yeah. I mean, I love that you just said this is really hard. Like, yeah, that's sort of fundamental. You know, I think philosophers have been, you know, debating the roots of that question for millennia. twitter uh he bought it he's all about free speech and then he realized okay well there's a reason like we have some level of fact checking and there's some level you know community notes is

Starting point is 00:20:31 actually a very prominent feature now and like i think as soon as you sort of think about it a little bit further you realize that there's some level of democracy or community or sort of connection or sort of like alignment that needs to happen between groups of people it's never like purely sort of like clear-cut yeah yep so on the data side been excited about digging into this obviously you have a ton of data that that you use you know to train these models ton of compute required so huge large-scale problem. I want to talk about that. But I actually some other things you said prompted this question in my mind. When you're talking about like, you know, we would want to ship a model where you could build a bioweapon. How do you get the right people in the room to know that would be possible? Because I don't

Starting point is 00:21:20 know anything about bioweapons. Like, presumably you don't either. So like, like, let's start there with data. Like, how do you even know like, what you don't either so like how like let's start there with data like how do you even know like what you have do you kind of have like a panel of experts that like span a bunch of different you know knowledge domains is that exactly that so you know we have teams of people who are focused on exactly these sorts of questions we like leverage external experts we leverage government agencies and do all kinds of rigorous testing to understand you know risks around bio around nuclear around you know cyber security it is really a panel of experts that contribute to making these sorts of decisions yeah okay that's awesome so on the technical side like tell us a little bit about that like how does

Starting point is 00:22:02 that look obviously it's tons of data you you know, that goes into these, these training, what are some like scale problems, technology type problems you guys have faced? Yeah, I mean, the scale of data is massive, right? trillions and trillions of tokens, like dealing with the entire internet text, dealing with in many ways, dealing with the entire net. In many ways, yeah, dealing with the entire internet. Yeah, it's not just a sort of a data storage issue. There's all kinds of other problems with internet data. And there's multimodal data now, obviously, right? Like there's a lot of that.

Starting point is 00:22:36 And that takes up significantly more amount of space and much harder to process and networking and all that. So the data challenges are massive on the opposite side of that is i do think there is a cognitive core that we need to like get to when it comes to building llms like right now a lot of their and this is something that like carpasi mentioned in like a podcast maybe a week or two ago like a lot of the parameters of these big models are going into like memorizing facts and the like core common sense and cognitive capabilities can be distilled into like a smaller data set and this is where i think bigger models can help train smaller models that's helped them to like reason to like know the basic information like models that need to

Starting point is 00:23:21 know every single thing that happened on wikipedia but they need to be able to you know create new data and have the models like run on their own and sort of learn from their own mistakes and that can help like address like the data bottleneck too so i'm curious how two things like on the use cases side of thing how like how internally do you all use you know llm technology and and then and then as a follow-up like let's dig in a little bit too on like how do you see people kind of outside of like your world using it and maybe what are some of the mistakes they they make yeah personally i use cloud the most for coding so i think most of my queries involve like, hey, like, you know, make this better or how do I do this

Starting point is 00:24:27 in this language? But I think a lot of it is also just like general world knowledge. Like, hey, like my drain is clogged. Like what would be the right thing to use? Like things that would previously go into Google and then you'd have to like open some blog posts with like 16

Starting point is 00:24:44 ads and at the bottom it's like okay put baking soda and vinegar you know like now it's just like baking soda and vinegar you know it's a very direct so yeah recipes are like common everyday note right yeah that's like my one where it's like you have to scroll so far and there's like a recipe like like you know like 30 pages and it's like at the very bottom yeah Yeah, totally. Yeah, it starts like by explaining their life story and like, exactly. And it's like, okay, I just want to make spaghetti like teach me how to do that. Yeah. So yeah, just like for all kinds of common queries.

Starting point is 00:25:15 Like one fun example is I had a friend who was a teacher and or he is a teacher and I reconnected with him after a long time. And I told him I work in property. He's like, oh, yeah, like I use it all the time for like lesson planning. Like it takes care of so much of that. I'm like, Hey, today I want to do like a lesson about X to me,

Starting point is 00:25:32 like come up with some ideas and then make like some homework assignments. And it can like, you know, it's a, he said is like, does a great job at that. So there's all kinds of like, you know,

Starting point is 00:25:40 things in the context of work that are super helpful. I use it a lot to just like do question and too like instead of like reading some long thing i'll just like take it i'll throw it into cloud and be like hey like this is the specific thing i'm looking for is it in here can you answer it and that's like a big time saver so i think probably um i you know i should probably talk to like more average consumers to understand where they use llms but i think most people aren't aware like probably the average person in the world is like has never heard of anthropic and probably the average person in in the states hasn't really used llms to their maximum potential and so i think it'd be really interesting to figure out where like the discrepancy is and like you know where people are not aware of how

Starting point is 00:26:33 like lms can make their lives easy because i think it's easy to be in an echo chamber like san francisco and like assume that everyone's using it exactly the way that you are but i think that's probably like very far from reality yeah so on the prompt on the prompting side just want to ask there's a i mean there's a lot out there about you know people have done some pretty wild you know things with prompting and created you know personas and all that kind of stuff i'm curious from your perspective like what like what do you think the most helpful just broadly like things you can do when you're trying to get the best answers out of an llm when you're interacting with it yeah so we actually

Starting point is 00:27:12 have this tool called the meta prompter where you tell it hey i'm trying to do this can you help me like write a prompt and it'll like recursively work on the prompt with you to make the prompt better and best suited for an LLM. So that's like an example of a tool that I think can help people like do prompt engineering. Actually, I think there aren't honestly, there aren't like very specific tips that I have when it comes to prompting. I think like using that tool can help you like see examples of, oh, this is what like a good prompt looks like versus this is a bad prompt but i think in general like making what you're saying easy to follow and having examples is probably you know an advice you would give to any person trying to explain something but i think

Starting point is 00:27:57 it is like especially true in the context of lms just like examples in particular really help models like figure out what you're trying to do. Yeah, I the what I've seen, which I think relates to that is, is it seems like, especially like technical people, like want to program that, right? Want to be like, okay, well, how do I prompt? It's like, all right. And then I've seen some very complicated, like, oh, I know an engineer wrote this type of prompts. And, you know, it's relatively hard to benchmark that versus like a more simple prompt. But I've also seen some very simple prompts that seem to have like pretty similar outputs. Is that like your general experience to where like,

Starting point is 00:28:37 there's some really complicated stuff and some simple stuff, and maybe the gap isn't very big, you know, between the two? Yeah, yeah i do think like as your instructions get bigger and bigger like models today do struggle with like internalizing all of it and may start forgetting little pieces of it and it's not like perfect and so yeah if you can distill it into like the most key simple parts i think that would generally be helpful. Yeah, I think one other maybe tip when it comes to prompting is to think of every token that the model has as input and output as compute units. And by, for example, telling the model to, hey, can you like explain my question and describe your understanding of it before answering it? Like what you're doing is two things.

Starting point is 00:29:32 Like one is you're just giving the model more ability to compute. Like it is like, you know, every single forward pass creates causes some amount of computation to happen and you're giving it more of a chance to think. And I think that can be pretty helpful and like a very complicated question but also you're giving it a chance for it to think out loud and put things down on the paper and every single time it puts down a token for the next token it can look at what it wrote down previously and so having the model be very explicit think out loud be descriptive and reason gives it just like it costs you more money right because it's like more tokens that have to get processed and it costs

Starting point is 00:30:10 more like from a compute perspective but that then can help make the model smarter and give you like answers that are a better line so you know i was putting together an eval one thing i added before my actual question was describe this document, figure out what are the relevant parts to it, and then answer this question. And that sort of thing can help a lot. And I guess we're entering this sort of paradigm now of test time compute, where, you know, you can scale train time compute,

Starting point is 00:30:41 and you're trying to put more into the model, but you can also scale test time compute, which is having the model explain itself and think out loud and do chain of thoughts and it turns out you know that that can scale pretty nicely with capabilities especially for certain types of things like problem solving and math and coding and so that's like a lever that you can pull you can use a bigger model where more compute went into training, or you can ask it to think out loud more and leverage test time compute to get a better answer. Interesting.

Starting point is 00:31:11 Yeah, that's interesting. That's actually a very helpful way to map your interactions to those different modes of compute. That's super interesting. Well, we're going to close in on probably a topic that we could have started with and taken the entire show up with, which is looking towards the future.

Starting point is 00:31:31 So one of the ways that I think would be fun to frame this, so we've just been talking about very natural, you know, day-to-day ways that, you know, we interact with, you know, with Claude, right? So how do I unclog my drain? You know, make this, you know, make thisude, right? So how do I unclog my drain? You know, make this, you know, make this Python code better, explain my question, you know, all those sorts of things. But when we think about, I love the way that Anthropic talks about the concept of frontier and, you know, both in terms of research and product and models.

Starting point is 00:32:06 And one thing that's really interesting about the way that most people interact with AI, at least two interesting things to me. One is that it is so consumer in nature in that there is, I guess to put it in a very primitive, like a primitive analogy would be, it just feels so similar to like open Facebook Messenger, you know, or open Claude and the interface is very similar. You know, there are just so many ergonomics that are really similar. That's one way, which is very consumer and is ironically just not super different than a lot of interactions that have come previously.

Starting point is 00:32:53 The other really interesting thing is that in many ways, it's disappearing into existing products, right? So increasingly, the products we use will have these new ways to use features or new features that feel extremely natural, but are like, whoa, that was really cool. And it's like, okay, well, there's something happening under the hood there, but it's so natural within the product experience that the AI component is sort of blending into the product in a way that isn't discernible, which actually, to your point earlier, you know, it's like that felt kind of magical, right? And it's like, well, maybe that's the point. Those don't feel super different to the consumer necessarily, right? Or to the person interacting with it. It just feels like a more powerful version of things that we were doing before. And that may be that, you know, that's probably an understatement. When we think about the frontier and especially the research and the future of AI. I think because the way that we interact with it on a daily basis almost obfuscates that

Starting point is 00:34:11 a little bit. Yeah, I think part of the explanation for why people don't fully understand the safety implications is because maybe because we've, as an industry, done a pretty good job of doing RLHF and making sure that the models act in a reasonable aligned way like I think if we throughout these the base model that has no alignment work done on it like people would be like whoa this model just completely ripped into me and like made me feel shitty or whoa it just taught me how to like do something that's pretty illegal. Like we've done a good job of preventing those sort of interactions. And so people are like, Oh,

Starting point is 00:34:50 they're super safe. Like they're super harmless. And it's like, great. That's exactly what we were hoping to happen. Yeah. Yeah. Yeah.

Starting point is 00:34:57 And this is just today, like as they become more and more capable, like it becomes an even bigger problem. But yeah, I think that means that we've done a good job of just like aligning them making sure that they sort of act in ways that people would expect and then are harmless and i think yeah you know on on the point about like user interaction and whether it's like a specific app or just spring into the product and just user interaction broadly like you know i tell my parents that i you know work in ai or anthropic and i think my

Starting point is 00:35:23 mom was like oh man, it's so scary. Things are changing so fast. I'm going to be so obsolete. I wouldn't even know how to use the future thing. And I'm like, actually, the future thing will be way easier to use than anything you've ever used in the past. You will be able to talk to your computer. 40 years ago, you had to be an expert to use a computer.

Starting point is 00:35:44 You had to understand the command line use a computer you had to like understand the command line and understand like exactly the commands you'd need to use to like execute something very specific today you can literally talk to your phone and be like hey how's the weather from on the trip that i'm going to next week in new york and i'd be like here's the weather like it becomes more and more natural and more and more human-like which is actually going to increase accessibility and it's going to make all these things easier and easier to use. And I think there is a little bit of like, jumping the gun a little bit where people, you know, this is where things are going. But if you kind of build it before it's ready, you end up with like,

Starting point is 00:36:18 lackluster product experiences, like, okay, like a, an AI for like creating slide decks. And you're like, this sounds cool like let me explain the slide deck that i want and it like does kind of a half-assed job and it doesn't really create exactly what you want and that creates a bad user experience and then people are distrusting of it and don't use your product anymore like there's definitely a certain level of capability that needs to exist for that feature to actually feel magical to actually feel useful and to actually like you know not be frustrating to use. So, but once, once those are there, interfaces will be very natural. There will be like the most natural human interfaces that we've

Starting point is 00:36:53 ever had before. So yeah, I think it, a lot of it will be disappearing into the things that we use every day. Like your laptop will be completely like AI based or AI driven and like the way, you know, interact with your phone will be like that too. or AI driven. And like the way you interact with your phone will be like that too. And some of it will create like full new modalities. Like, you know, one, one really cool idea I have is, you know, I think in five years, like, you know, maybe more or less, whatever you can be like, okay, I'm trying to like install this shelf and I don't like fully get it. And you would just pull out your phone and be like yeah this is the shelf like these are the instructions and then you'll have this video

Starting point is 00:37:30 avatar that pops up and talks to you and like has a virtual version of the shelf and says okay you see this part of the shelf like like drill this part and then you'll look at your thing and be like oh okay I see and this will be like generated on the fly and like you know you can't get more like sort of intuitive than that like a literal like person in your phone explaining something with what you're seeing right outside of the phone that sort of thing will i think very likely exist so yeah it's going to be like a crazy future wow that's pretty wild actually it's putting desks together for the kids. And, you know, you get those things and you have like this little Allen wrench.

Starting point is 00:38:10 And it's like not fully like this sequence is like, you know, important if you get one thing wrong, you know. You start over practically. Yeah. So let me know. Actually, yeah, you'll be the first to know. I'll let you know, yeah. So full circle, now I'm curious about, you spent the time with computer vision, now with LLMs, and we talked about different applications for LLMs.

Starting point is 00:38:37 I mean, chat's the one everybody knows. Are there some cool things going on with computer vision type technology and LLMs? I mean, I've seen some things like but what are some things that that you see in the future for that yeah so like claude is multimodal so you can take you know a picture of something whether that's like some document you're looking at or you know something in the physical world and ask questions about it and it's like particularly good at like explaining what it sees and going through it in a decent amount of detail but the area that i'm most excited about is actually you know kind of away from what i was working on before which is

Starting point is 00:39:15 the natural world like computer vision on images and actually vision on digital content so a pdf right or like a screenshot of your computer or like a website, I think that as an input exists today, I think it'll get better and better. And then the related capabilities, like, okay, you know, the first demo of I think, multimodal chat GPT was here's a sketch of a website, and like you throw it in and take a picture and it tries to like write the code for that, like that will get better and better over time and obviously there are multi-modal output models like dolly right where you can ask to generate an image there's now video with sora and a bunch of other companies doing that audio output too with sort of voice mode

Starting point is 00:39:56 that's coming and also google has their own and there's a bunch of others like moshi. So the three main modalities, text, audio, and vision, and they can be at the input or output. And, you know, in the case of cloud, you have texts and images inputs, as well as Texas output. But this list will be continuing to expand over the future. And GPT-4.0 is actually a three modality and three modality output model i do think that's the future i think especially vision in particular is useful like i think audio just a personal product take is i think very useful from a product perspective like i don't think audio is adding new capabilities into the model but it is a much richer more human way to interact with it. Whereas vision is truly a new capability. Like you cannot describe, you know, that table and the whole and where to drill it as sort

Starting point is 00:40:53 of a text as you know, you could, but it'd be way, way harder than like, here's an image, like do this, like, so I think vision actually does add new capabilities. And yeah, you're seeing a lot of that for like, my focuses is on multimodal vision in the context of knowledge work. So how do you make Cloud really good at reading charts and graphs and being able to answer the common questions you might have about a report and stuff.

Starting point is 00:41:16 So that, I think, is super valuable. One thing I'll also just add on the prior self-driving work to what I'm working on today is that people talk about AGI. I kind of think that AGI, depending on how you define it, is already here. These are general purpose models that can perform generally intelligent behavior. And it's more of a question of what data you feed in. And when I was working on perception and vision, like it was a very

Starting point is 00:41:47 narrow model. Like it could do bounding boxes on cars and people and pedestrians and lights and stuff. But we were slowly starting to make it general. We were slowly starting to add other types of things that you want to detect. Whereas like Claude and Transformers and Autoregressive Transformers in particular are general purpose thinkers. They're general purpose, like next token predictors. And so many things can be framed as a next token prediction problem. And so that's one of the things that I see that's different about what I'm working on now versus before.

Starting point is 00:42:18 Whereas like I'm working on something very general, which is why audio just kind of works. You just, you know, discretize it, tokenize it, and then throw it it in and then sort of you know with some tricks and with a bunch of things you have the same engine that's creating text output creating audio output and i think that's like super cool in general the same way that your brain is a general purpose cognitive machine there's been people who like have had different parts of their brain like ablated and suddenly they can't do a specific skill or a specific like type of kinematic motion. And then other parts of their brain reconfigure and allow them to do that over time through retraining,

Starting point is 00:42:53 especially if they're young and early. Right. So there's tissue in here that the general purpose system. And I think we've unlocked that. We have found a digital analog to a general purpose cognitive engine and now it's just a matter of scaling it is the way that i feel wow well brooks is messaging us that we're at the buzzer although i could continue to ask you questions for hours or perhaps days but so hail this has been so fun i cannot believe we just talked for an hour. I feel like we just hit record, you know, five minutes ago.

Starting point is 00:43:26 Really appreciate the time. It's been so wonderful for us. And I know it will be for our audience as well. Yeah, thanks for coming on the show. I'm really glad to hear that. Yeah, appreciate you guys. This was really fun. And I hope people get some value out of it. The Data Stack Show is brought to you by Rudderstack, the warehouse-native customer data platform.

Starting point is 00:43:46 Rudderstack is purpose-built to help data teams turn customer data into competitive advantage. Learn more at rudderstack.com.

Your Ad Here

The Data Stack Show - 208: The Intersection of AI Safety and Innovation: Insights from Soheil Koushan on LLMs, Vision, and Responsible AI Development

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.