No Priors: Artificial Intelligence | Technology | Startups - Open sourcing AI app development with Harrison Chase from LangChain

Starting point is 00:00:00 Hi, listeners, and welcome to another episode of No Pryors. Today, we're talking to Harrison Chase, the CEO and co-founder of LangChain, a popular open-source framework and developer toolkit that helps people build LLM applications. We're excited to talk to Harrison about the state of AI application development, the open source ecosystem, and its open questions. Welcome, Harrison. Thanks for having me. I'm excited to be here. Langchain's a really unique story, and it started actually as a personal project for you. Can you talk a little bit about what Langchain is and what it was originally?

Starting point is 00:00:37 Yeah, absolutely. So how I would answer the question, what Langchain is, has kind of evolved over time, as has the entire landscape. Langchane, the open source package, started, yeah, as a side project. So my background's in ML and MLOPS. I was at my previous company. I knew I was going to leave. I didn't know what I was going to do. So this was in September, October of 2022.

Starting point is 00:01:01 And so went to a bunch of hackathons, a bunch of meetups, chatted with folks. They were playing around with LLMs and saw some common abstractions, put it in a Python project as a just fun side project. Turned out to strike a chord, the fantastic timing, you know, Chad GPD came out like a month later. And it's kind of evolved from there. So right now, Langein, the company, there's really two main products that we have. One is the Langchain open source packages and happy to dive into that more. And then the other is Lang Smith platform for testing, evaluation, monitoring, and all of those types of things.

Starting point is 00:01:36 And so, you know, what Langchain is has evolved over time as the company's grown. One thing that we talked about the last time we saw each other in person was just how quickly like the AI ecosystem and research field is evolving and what it means to manage an open source. project through that. Can you talk a little bit about what you decide to keep stable and change when you both have big ecosystem of users now and like very rapidly changing environment of applications and technology? That's been a fun exercise. So I mean, if we go back to the original version of Langchain, what it was when it came out was essentially three kind of like high level implementations. Two were based on research papers. And then one was based on Nat Friedman's like

Starting point is 00:02:20 NatBot type of agent web crawler thing. And so there was some high level. kind of like abstractions. And then there was a few like integrations. So we had integrations with, I think, like, open AI, cohere and hugging face to start or something like that. And those two layers have kind of like remained. So we have, you know, 700 different integrations. We have a bunch of kind of like higher level chains and agents for doing particular things.

Starting point is 00:02:42 I think the thing that we've put a lot of emphasis in, to your point around kind of like what's remained constant and what's changed is like a lower level kind of like abstraction and runtime for joining these things together. One of the things that we pretty quickly saw was that as people wanted to improve the performance, go from prototype to production, they wanted to customize a lot of these bits. And so we've invested a lot in a lower level kind of like chaining protocol, so laying chain expression language. And then in a different protocol laying graph, which is something we're really excited about. And that's more aimed at basically graphs that are not DAGs. So, you know, all these agents are basically running an LLM in a loop.

Starting point is 00:03:23 you need cycles. And so Langraph helps with that. And so I think what we've kind of seen is the underlying bits of, there's all these different integrations. And like, you know, there's LMs, vector stores, and sometimes they change, right? When chat models came out, like that was a, that's a very big change in the API interface. And so we had to add a new abstraction for that. But those have, especially over the past few months, remained relatively stable. We've invested a lot in this underlying runtime, which emphasizes a few things, streaming, structured outputs, and the importance of those has remained relatively stable. But then the way that you put things together and the kind of like patterns for building

Starting point is 00:04:05 things has definitely evolved over time from like simple chains to complex chains to then these kind of like autonomous agents to now something maybe in the middle of like complex state machines or graphs or something. And so it's really that upper layer, which is like the common ways to put things together that I think we've seen the most rapid kind of like churn. What do you think is still missing from really getting to perform in agents? There's a number of companies that have been started recently that are really focused on sort of the agentic world and pushing that whole thread in certain types of automation forward.

Starting point is 00:04:37 What do you use of the big components that you all don't have or that maybe the industry more generally doesn't have that this still needs to come into place to help drive those things ahead? Yeah, that's a really good question. I think there's a few things. one I think like figuring out the right ux for a lot of these things is still an open question in my mind and you know that's not necessarily something we can help with I think there's a lot of exploration that applications need to do to figure out how to you know communicate what these agents are good at and bad at to end users and expose ways to maybe let them course correct and see what's

Starting point is 00:05:08 going on and so you know I think we try to emphasize a lot of this observability of intermediate steps and even correcting intermediate steps, but there's a lot of experimentation around UX that I think needs to happen. Another big part, I think, is basically the planning ability of the underlying LLMs. I think that's probably the biggest, I think when we see people building agents that work right now,

Starting point is 00:05:35 it's often breaking it down into a bunch of smaller components and kind of like imparting their domain knowledge about how information should flow through these components. because I think the alums by themselves still aren't able to reason fully about how that should happen. And I think we see a few kind of like a lot of research is actually around this. I would say in the academic space. Specifically, I think there are two different types of research papers around agents that we see. We see some around like planning for agents.

Starting point is 00:06:05 So there's a bunch of papers that do kind of like an explicit planning step up front. And then there are other research papers that do a bunch around reflection. So, like, after an agent does something, is this actually right? How can I kind of, like, you know, improve upon that? And I think both of those are basically trying to get around the shortcomings of LLMs and that, in theory, they should do that automatically, right? Like, you shouldn't have to ask an LLM to plan or to think about whether what it's done is correct. It should know to do that and then it can kind of like run in a cycle. But we see a lot of shortcomings there. And so I think planning the ability of LLMs is a big one. And that will get better. time. The last one is maybe a little bit more vague, but I think even just as builders, we're still figuring out the right ways how to make all these things work. What's the right information flow between all the different nodes in order to get those nodes, which are typically an LLM call to work, do you want to do few shot prompting? Do you want to fine-tune models? Do you want to just work on improving the instructions and the prompt? And so I think there's a lot of

Starting point is 00:07:07 how do you test those nodes? That's a big thing as well. How do you get confidence in your LLM systems in LLM agents. And so I think there's a lot of workflow around that to kind of like be discovered and figured out. One thing that sort of come up repeatedly relative to agents has just been like memory. And so I wasn't sure how you think about memory and implementing that and what that should look like. And because it seems like there's a few different notions that people have been putting

Starting point is 00:07:31 forward. And I think it's super interesting. So I was just curious about your thinking on that. I also think it's super interesting. I have a few thoughts here. So I think there's maybe two. types of memory, and they're related, but I'll draw some distinction between kind of like system-level procedural memory and then like personalization type memory. So system-level memory,

Starting point is 00:07:54 I mean more like, what's the right way to use a tool? What's the right way to accomplish this objective, independent of who exactly the person is and how I'm different than Sarah or something like that? And then for the personalization bit, I think it's like, okay, you know, Harrison likes soccer and he likes basketball, and I should remember that when he asks questions. And so I think there's maybe slightly different ways that we see teams thinking about both of these. So on the procedural side, I think the main thing that we see people doing and that we think is pretty effective is few shot prompting and maybe fine-tuning for how to use tools, because that's basically what it comes down to.

Starting point is 00:08:34 What's the right way to use tools? What's the right way to plan? And we see few-shot examples being really, really impactful for that. And so that's something where, and so I think there's just really interesting data flywheel of like monitoring your application, gathering good examples, and then plugging those back into your application, the form of few shot examples that we're pushing really heavily with Lange Smith right now. And then the other side of it is this like personalization level memory. And I think there's a few different ways to do this. Like I think OpenAI implemented it in their

Starting point is 00:09:06 chat in their chat GPT where in the way I think it does it under the hood. is it basically has functions that it can call to say, like, remember this fact or delete this fact. And so that's a really interesting, like, active loop that the agent is engaging in, where it explicitly decides what it wants to remember and what it doesn't want to remember. I also think one thing that I'm bullish on is a more kind of, like, passive background process that kind of looks at conversations and almost like extracts insights. And then you can use those insights in kind of like future conversations. And I think there's pros and cons to each. And I think it speaks to the memory in general, I feel is like a field that's just like super, super nascent. Like I actually

Starting point is 00:09:48 am underwhelmed at the amount of like really interesting stuff that's going on there. And so I think it's, you know, a bunch of different approaches. No, no kind of like overwhelmingly best solution. Has the sophistication shape type of application that you see people building with Lank Chain are just generally in the ecosystem dramatically changed over the last few months. I do think that there are more examples, kind of as Alad mentioned, of agentic applications that are much more productive and more sophisticated, like, multi-step rag systems with much more useful ranking. Like, does that match with the patterns you're seeing?

Starting point is 00:10:26 Or, like, what are you seeing that excites you the most that you think is most useful? That does generally match. I think Langein from the beginning has always been focused on those types of applications. and not only the open source, but also Lange Smith, the platform. So I think a lot of the emphasis that we put into the testing and the observability is really focused on these multi-step things. We've always been focused on those. Probably it's generally true in the market that there's been more of a trend towards

Starting point is 00:10:53 those. But from our perspective, we've always been focused on those. And so I think that hasn't been as dramatic. I think there has been like interesting things within that that have emerged, just calling out like a few things. Within RAG, I think we've seen really interesting and advanced query analysis start to come into play. So, you know, you're not just passing the user question directly to an embedding model. You're maybe doing some analysis on it to figure out which retriever should I send it to or like, what is the bit that I should search? Is there kind of like

Starting point is 00:11:23 explicit metadata filter? So some, and then so now retrieval is like a multi-step process and more there. And it's explicitly around query analysis. Few shot prompting. and that whole data flywheel, I think we're starting to see come into play more on the agent side. I kind of alluded to this earlier, but I think, you know, the way that we've kind of thought about things is there's kind of like chains, which are sequential steps. You're going to do this, and then you're going to do this, and then you're going to do this, and you're always going to do those in the exact sequence. And then, you know, last March or April or whenever auto-GPT came out, and it was like,

Starting point is 00:11:57 we're literally just going to run this in a four loop, and it's going to be, you know, this autonomous agent. And I think the things that we see making it into production and informed a lot of the development of laying graph is something in the middle where it's like this controlled state machine type thing. And so we've seen a lot of that come out recently. And so I'd maybe call out that as like one thing that we've really updated a lot of our beliefs on over the past few months. Yeah, I think a combination of that and tree search and just like trying to be efficient with like your sampling at every step has shown like a lot of. really interesting, effective applications recently. And I think the, like, cognition as one example of, like, a surprisingly amazing agent has come out, like, where else do you think agentic applications will begin to work, or that you've

Starting point is 00:12:47 already seen? I think on the customer support side, that's a pretty obvious use case. I think Sierra, you know, has emerged there and is doing quite well there. I think, yeah, the cognition demo was very impressive. I think they did a lot of things, right? I think they really nailed a really interesting U.S., and that was maybe one of the things that I was most excited about. And then obviously it seems to work very well,

Starting point is 00:13:09 and so I don't know exactly what they're doing under the hood. But those types, like coding problems in general, we see a lot of people working on. I think there's a really nice feedback loop that you can get by just executing the code and seeing if it works, and as well as the fact that people building it are developers, and so they can test it. Coding, customer support.

Starting point is 00:13:32 There's some interesting stuff around like recommendation chatbots almost. So I draw a distinction between that and customer support. With customer support, you're maybe trying to explicitly kind of like resolve a ticket or something like that. And the recommendation bit is a bit more focused on like a user's preferences and what they like. And I think we've seen a few, I think we've seen a few things emerge there. But I'd say customer support encoding are the two. Klarna as well, you know, they came out and had a pretty good release. One pattern that I think is very popular and I can't tell if it is real or transient is whether

Starting point is 00:14:14 or not companies will be able to switch between different LM models, right? whether it's a, you know, self-hosted, like, dedicated inference, you know, instance for them or if it's an actual API provider. But for any given application, take your prompts and go from, you know, anthropic to mistrawl to open AI to something else. In reality, it feels like, you know, the way an application response is probably going to be sensitive to the fact that these LMs are actually going to. to predict differently? Like, what do you think about this? Can you switch? Is that a real pattern?

Starting point is 00:14:55 It's not as easy as it seems like it should be. And I think the main thing is that the prompts still need to be different for each model. I do think the prompts will probably start to converge in the sense that if you think the models are getting more and more intelligent, then like hopefully these small idiosyncraticis don't matter as much. And as more and more model providers start supporting the same things, then that will make it easier. And what I mean by that is, you know, so many prompts for OpenAI, which is, you know, deleting the most used one, use function calling. And, you know, up until some period ago, like, no other models did. And so you just, like, couldn't use this prompts at all. But now, like, Mistraw has function calling and Google has

Starting point is 00:15:39 function calling. And so I think they're a little bit more transferable there. What else is on that list? There's function calling. There's visual input. Like, what else is going to differentiate these model APIs. Context Windows one as well. So I think this gets to like, yeah, what's the right context that you can be passing? If it's longer, you know, if that changes, then that changes, that doesn't, like, that changes the whole architecture of your application. Modalities one. Prompt injection for safety. Yeah, I think that's interesting. I think that's a real enterprise concern. I think a lot of the agents are still just figuring out how to make agents work. This is a different axis almost, but to the point around like switching models,

Starting point is 00:16:23 I do think we see a desire for this, especially when you start going to scale. So I think it's like make something work with GPD4, but then, okay, you're rolling it out. Are you really, is that like, are you really going to eat that much cost with GPD4? Can you use GPD 3.5? Do you want to fine tune? And so I think that's that like that transition is where we really start to see people thinking about switching models. There's definitely some switching models at the beginning, like if you just want to play around with different models and see their capabilities. But I think the most pressing need to switch models happens when you go from prototype to scale. Cost and latency would be differentiators there as well. One thing you mentioned I thought was really interesting is just

Starting point is 00:17:03 context windows. And obviously, Gem and I launched with a million token context window. And I was just curious how you think about context window versus RAG versus other aspects of the model and how all those things tie together. And, you know, once we get to very long context windows and the tens of millions of tokens, like, does that really shift things radically or how does that change functionality? And so I was just curious, since you've thought about how all these things piece together, I was just curious how you think about those different factors and what they mean. Very good question that a lot of people are thinking about who are a lot smarter than me. I think, I mean a few thoughts. I think, like, longer context windows definitely make, like,

Starting point is 00:17:42 single-shot things much more realistic. Like extraction of elements in a long PDF. You can do that one-shot. Rag over a single long PDF or like five long PDFs. Okay, cool. You can do that. You can do that one-shot. I think there are definitely things at scale that don't fit, you know, into a single context

Starting point is 00:18:06 window. There are also things where it requires iterations. You need to like decide what to do, interact with the environment, get that back. So this whole idea of chaining and agents, I don't, like, that's less around context windows and more around interacting with the environment and getting feedback. And so I don't think that's going anywhere. I think with respects to RAG in particular, because I think that's where it often comes up, like, you know, did this kill RAG? I think there's a few things. Actually, just today, one of our team members, Lance Martin, there's that, like, everyone's doing the needle in

Starting point is 00:18:36 the haystack thing, and now all these models are like green across the board for whatever reason. They've all figured it out. But I think, like, that actually really doesn't reflect a lot of RAG use cases, in my opinion, because, like, that's, the needle in the haystack is like, okay, given this long context, can I find a single information point? But oftentimes, rag is about seeing multiple information points and then reasoning over them. And so I think, well, the benchmark he released is exactly that. Like, as you increase the number of needles, you know, performance goes down, as you might expect.

Starting point is 00:19:07 And then also when you ask it to reason rather than just retrieve, performance drops as well. And so I think there's more work to be done there. And then I think another thing is just around the ingestion for RAG in the indexing process. Like a lot of attention has been paid to like text splitting and chunking and all of that. And I don't know exactly how that will change. Like will you still do that but you now just retrieve the whole document? Like we have a concept in link chain of like a parent document retriever which basically creates multiple vectors for each document. So maybe you just do that. Maybe you still, maybe you chunk it up into larger chunks and just retrieve those larger chunks.

Starting point is 00:19:46 Maybe use a traditional search engine, like elastic search or something. I'm not, I'm not sure. That's probably the place I have the least confidence in. The one other area that I see a lot of people talking about and I see if you're people actually doing is fine tuning. And to some extent, I think that's because with fine tunes, you lose generalizability. And so people just start focusing on prompt engineering or other ways to effectively get the same performance without the actual fine tune. But it's something that feels very awkward on and people talk about it a lot and people talk about doing it a lot. You probably have a great perspective since you see so many different types of customers.

Starting point is 00:20:17 Are you seeing a lot of fine-tuning happening in the wild? And if so, there's specific common applications or use cases for it. We see people experimenting with it. I think the only real place where they're doing it is when they've reached really critical scale, which I still don't think is that many applications to date. I think there's a lot of difficulties with it. One's like gathering the data set for it. And so I think a lot of the things we have in Ling Smith tackle a lot of these issues,

Starting point is 00:20:44 but like gathering the data set for it. So like having that data visibility and starting to curate that data set, evaluating the fine-tuned model. So like evaluation and testing is a huge pain point there that we're trying to tackle in a few ways. The third is just like, yeah, back to this point of people are still just like experimenting so rapidly. It's much harder to change a fine-tuned model than it is to change a prompt. even changed few shot examples. And so I think we're seeing more and more people use few shot examples, but not a ton graduating to the fine-tuning, just because, yeah, I think much harder to just

Starting point is 00:21:20 iterate quickly on. In terms of other major changes in the landscape, it's been a big year. The first commit to laying chain, I think, was in October of 22, which is like when I launched conviction as a fund as well. At that time, we didn't have law and two. We didn't have mistrust. all. There were not nearly as many open source models with what people would consider to be a more useful reasoning ability. Has that changed in terms of like what you see application developers do with link chain? Gemini too. Oh, and Gemini, yeah. Fun story about that. The original models that we launched with Open AI actually deprecated like a month ago. So the like actual original link chain, you can't run because the models don't exist anymore. But

Starting point is 00:22:06 Yeah. Like, there's, I think we see increasingly interest in open source, but the reasoning abilities are still just like lagging behind Claude 3 or GPT4. And I think like for a lot of the applications that, it kind of probably depends on the types of applications that you're building, but a lot of the applications that link chain is focused on with this kind of like reasoning aspect, those are just so crucial. And I don't think we see. super compelling. I still don't think we see super compelling reasoning abilities in the open source models. And maybe that's one of my hot takes, but I think for a lot of the link chain apps, the open source models maybe don't live up to a lot of kind of like the Twitter hype or Twitter

Starting point is 00:22:54 excitement, at least not yet. Zooming out, like you have a really broad view. Like, what do you feel like that no one is working on that's going to enable better applications that should be? I think the most exciting stuff is at the application and UX layer right now. I think that's where the most exciting stuff is there. One of the, I don't know if this is, this is maybe, this isn't more the capabilities side-ish, but like memory I think is super interesting, especially like personalized long-term memory. I don't know if, I don't know if it's necessarily tooling so much that needs to be built there as it's just like an application in a UX that's really focused on that.

Starting point is 00:23:33 And, you know, if I wasn't doing link chain, if I was starting a company right now, I'd probably start something at the application layer, and it would probably be something that really takes advantage of, like, long-term memory. I guess at the high level similarly, is there anything that you view as, like, a major prediction or things that'll change over the next year that nobody's really paying as much attention to? Memory is a big interest of ours, and so I hope that we'll have some kind of like breakthroughs there. I think a lot of the, specifically around, yeah, learning from interactions, incorporating that back in at a user level. In a similar vein, also this type of more like system level memory I think is really interesting and building up, building towards this idea of almost like continual learning. So there's, you know, like can you learn from your interactions and you can do that in a variety of different ways. This may just be where we sit in the ecosystem, but one exciting and probably under talked about ways is just the idea of, of building. building up a few shot example data sets and really using those.

Starting point is 00:24:33 I think it's much faster and cheaper than fine-tuning models. It's easier to do than trying to like programmatically change the prompt in some way. Like that's still kind of like a bit of a art. And so yeah, towards continual learning with few-shot examples is maybe one like really interesting area that we're excited about. Can you help our listeners like just imagine like a little bit more viscerally like what type of application experience that would enable, like, you know, a consumer application or a application of what that type of continuous learning would allow you to do. Yeah, absolutely. I think at a high level, it would basically allow the application to automatically get better over time. And it could

Starting point is 00:25:16 get better in the sense that it's just more accurate. So, you know, it's, it maybe, you know, it first does a mistake. You then, like, tell it that it made a mistake and it automatically kind of, like, incorporates that as a few shot example or update to a prompt. But it starts learning from its mistakes and its successes as well, right? There's a really cool project called DSPY or DSPI. I don't know how to pronounce it, but it's out of Stanford. I say Disby. Oh, no, there's three ways now.

Starting point is 00:25:43 I say Diaspa. No, I'm just kidding. So, and I think that actually tackles, like, I actually see a lot of similarities between that and Ling Chain, Ling Smith in some way. And I think it's all towards this idea of like, so DSPY, Dispee or whatever. It's basically this idea of like optimization. You have kind of like inputs outputs. You then have your application, which they similarly think is as like multiple steps. And you basically optimize your application through a variety of different ways.

Starting point is 00:26:16 The main one of which I would say is probably few shot examples, although we'll probably do a webinar with Omar and he can correct me if I'm wrong. And I think the idea of like continual learning is basically doing that optimization, but in an online manner, where you don't have like ground truth necessarily, but you get feedback from the environment, thumbs up, thumbs down if things are good. And so I think, yeah, that kind of like optimization loop, whether offline or online is really, really exciting. And I think a similar thing could maybe, I think you can think of like personalization also as like what this would look like to end users and maybe like consumer facing apps. So you start with like a generic application that does the same thing for everyone.

Starting point is 00:26:56 but then it maybe learns to search the web differently for me and ELAD or something like that. And so I think that's, like, concretely how it could manifest. Cool. Thanks so much for doing this. It's obviously a pleasure to have you on. No, thank you guys. Good to see you. Find us on Twitter at NoPriarsPod. Subscribe to our YouTube channel if you want to see your faces. Follow the show on Apple Podcasts, Spotify, or wherever you listen. That way you get a new episode every week. And sign up for emails or find transcripts for every episode. at no dash priors.com.

No Priors: Artificial Intelligence | Technology | Startups - Open sourcing AI app development with Harrison Chase from LangChain

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.