Latent Space: The AI Engineer Podcast - AI-powered Search for the Enterprise — with Deedy Das of Glean
Episode Date: April 22, 2023The most recent YCombinator W23 batch graduated 59 companies building with Generative AI for everything from sales, support, engineering, data, and more:Many of these B2B startups will be seeking to e...stablish an AI foothold in the enterprise. As they look to recent success, they will find Glean, started in 2019 by a group of ex-Googlers to finally solve AI-enabled enterprise search. In 2022 Sequoia led their Series C at a $1b valuation and Glean have just refreshed their website touting new logos across Databricks, Canva, Confluent, Duolingo, Samsara, and more in the Fortune 50 and announcing Enterprise-ready AI features including AI answers, Expert detection, and In-context recommendations.We talked to Deedy Das, Founding Engineer at Glean and a former Tech Lead on Google Search, on why he thinks many of these startups are solutions looking for problems, and how Glean’s holistic approach to enterprise probllem solving has brought so much success. Deedy is also just a fascinating commentator on AI current events, being both extremely qualified and great at distilling insights, so we also went over his many viral tweets diving into Google’s competitive threats, AI Startup investing, and his exposure of Indian University Exam Fraud!Show Notes* Deedy on LinkedIn and Twitter and Personal Site* Glean* Glean and Google Moma* Golinks.io* Deedy on Google vs ChatGPT* Deedy on Google Ad Revenue* Deedy on How much does it cost to train a state-of-the-art foundational LLM?* Deedy on Google LaMDA cost* Deedy’s Indian Exam Fraud Story* Lightning Round* Favorite Products: (covered in segment)* Favorite AI People: AI Pub* Predictions: Models will get faster for the same quality* Request for Products: Hybrid Email Autoresponder* Parting Takeaway: Read the research!Timestamps* [00:00:21] Introducing Deedy* [00:02:27] Introducing Glean* [00:05:41] From Syntactic to Semantic Search* [00:09:39] Why Employee Portals* [00:12:01] The Requirements of Good Enterprise Search* [00:15:26] Glean Chat?* [00:15:53] Google vs ChatGPT* [00:19:47] Search Issues: Freshness* [00:20:49] Search Issues: Ad Revenue* [00:23:17] Search Issues: Latency* [00:24:42] Search Issues: Accuracy* [00:26:24] Search Issues: Tool Use* [00:28:52] Other AI Search takes: Perplexity and Neeva* [00:30:05] Why Document QA will Struggle* [00:33:18] Investing in AI Startups* [00:35:21] Actually Interesting Ideas in AI* [00:38:13] Harry Potter IRL* [00:39:23] AI Infra Cost Math* [00:43:04] Open Source LLMs* [00:46:45] Other Modalities* [00:48:09] Exam Fraud and Generated Text Detection* [00:58:01] Lightning RoundTranscript[00:00:00] Hey everyone. Welcome to the Latent Space Podcast. This is Alessio, partner and CTO and residence at Decibel Partners. I'm joined by my, cohost swyx, writer and editor of[00:00:19] Latent Space. Yeah. Awesome.[00:00:21] Introducing Deedy[00:00:21] And today we have a special guest. It's Deedy Das from Glean. Uh, do you go by Deedy or Debarghya? I go by Deedy. Okay.[00:00:30] Uh, it's, it's a little bit easier for the rest of us to, uh, to, to spell out. And so what we typically do is I'll introduce you based on your LinkedIn profile, and then you can fill in what's not on your LinkedIn. So, uh, you graduated your bachelor's and masters in CS from Cornell. Then you worked at Facebook and then Google on search, specifically search, uh, and also leading a sports team focusing on cricket.[00:00:50] That's something that we, we can dive into. Um, and then you moved over to Glean, which is now a search unicorn in building intelligent search for the workplace. What's not on your LinkedIn that people should know about you? Firstly,[00:01:01] guys, it's a pleasure. Pleasure to be here. Thank you so much for having me.[00:01:04] What's not on my LinkedIn is probably everything that's non-professional. I think the biggest ones are I'm a huge movie buff and I love reading, so I think I get through, usually I like to get through 10 books ish a year, but I hate people who count books, so I should say the number. And increasingly, I don't like reading non-fiction books.[00:01:26] I actually do prefer reading fiction books purely for pleasure and entertainment. I think that's the biggest omission from my LinkedIn.[00:01:34] What, what's, what's something that, uh, caught your eye for fiction stuff that you would recommend people?[00:01:38] Oh, I recently, we started reading the Three Body Problem and I finished it and it's a three part series.[00:01:45] And, uh, well, my controversial take is I did not really enjoy the second part, and so I just stopped. But the first book was phenomenal. Great concept. I didn't know you could write alien fiction with physics so Well, and Chinese literature in particular has a very different cadence to it than Western literature.[00:02:03] It's very less about the, um, let's describe people and what they're all about and their likes and dislikes. And it's like, here's a person, he's a professor of physics. That's all you need to know about him. Let's continue with the story. Um, and, and I, I, I, I enjoy it. It's a very different style from, from what I'm used.[00:02:21] Yeah, I, I heard it's, uh, very highly recommended. I think it's being adapted to a TV show, so looking forward[00:02:26] to that.[00:02:27] Introducing Glean[00:02:27] Uh, so you spend now almost four years at gle. The company's not unicorn, but you were on the founding team and LMS and tech interfaces are all the reach now. But you were building this before.[00:02:38] It was cool, so to speak. Maybe tell us more about the story, how it became, and some of the technological advances you've seen. Because I think you started, the company started really close to some of the early GPT models. Uh, so you've seen a lot of it from, from day one.[00:02:53] Yeah. Well, the first thing I'll say is Glean was never started to be a.[00:02:58] Technical product looking for a solution. We were always wanted to solve a very critical problem first that we saw, not only in the companies that we'd worked in before, but in all of the companies that a lot of our, uh, a lot of the founding team had been in past their time at Google. So Google has a really neat tool that already kind of does this internally.[00:03:18] It's called MoMA, and MoMA sort of indexes everything that you'd use inside Google because they have first party API accessed who has permissions to what document and what documents exist, and they rank them with their internal search tool. It's one of those things where when you're at Google, you sort of take it for granted, but when you leave and go anywhere else, you're like, oh my God, how do I function without being able to find things that I've worked on?[00:03:42] Like, oh, I remember this guy had a presentation that he made three meetings ago and I don't remember anything about it. I don't know where he shared it. I don't know if he shared it, but I do know the, it was a, something about X and I kind of wanna find that now. So that's the core. Information retrieval problem that we had set out to tackle, and we realized when we started looking at this problem that enterprise search is actually, it's not new.[00:04:08] People have been trying to tackle enterprise search for decades. Again, pre two thousands people have been trying to build these on-prem enterprise search systems. But one thing that has really allowed us to build it well, A, you now have, well, you have distributed elastic, so that really helps you do a lot of the heavy lifting on core infra.[00:04:28] But B, you also now have API support that's really nuanced on all of the SaaS apps that you use. So back in the day, it was really difficult to integrate with a messaging app. They didn't have an api. It didn't have any way to sort of get the permissions information and get the messaging information. But now a lot of SaaS apps have really robust APIs that really let.[00:04:50] Index everything that you'd want though though. That's two. And the third sort of big macro reason why it's happening now and why we're able to do it well is the fact that the SaaS apps have just exploded. Like every company uses, you know, 10 to a hundred apps. And so just the urgent need for information, especially with, you know, remote work and work from home, it's just so critical that people expect this almost as a default that you should have in your company.[00:05:17] And a lot of our customers just say, Hey, I don't, I can't go back to a life without internal search. And I think we think that's just how it should be. So that's kind of the story about how Glean was founded and a lot of the LLM stuff. It's neat that all, a lot of that's happening at the same time that we are trying to solve this problem because it's definitely applicable to the problem we're trying to solve.[00:05:37] And I'm really excited by some of the stuff that we are able to do with it.[00:05:41] From Syntactic to Semantic Search[00:05:41] I was talking with somebody last weekend, they were saying the last couple years we're going from the web used to be syntex driven. You know, you siegal for information retrieval, going into a symantics driven where the syntax is not as important.[00:05:55] It's like the, how you actually explain the question. And uh, we just asked Sarah from Seek.ai on the previous episode and instead of doing natural language and things like that for enterprise knowledge, it's more for business use cases. So I'm curious to see, you know, The enterprise of the future, what that looks like, you know, is there gonna be way less dropdowns and kind of like, uh, SQL queries and stuff like that.[00:06:19] And it's more this virtual, almost like person that embodies the company that is like a, an LLM in a way. But how do you do that without being able to surface all the knowledge that people have in the organization? So something like Lean is, uh, super useful for[00:06:35] that. Yeah, I mean, already today we see these natural language queries as well.[00:06:39] I, I will say at, at this point, it's still a small fraction of the queries. You see a lot of, a lot of the queries are, hey, what is, you know, just a name of a project or an acronym or a name of a person or some someone you're looking for. Yeah, I[00:06:51] think actually the Glean website explains gleans features very well.[00:06:54] When I, can I follow the video? Actually, video wasn't that, that informative video was more like a marketing video, but the, the actual website was showing screenshots of what you see there in my language is an employee portal. That happens to have search because you also surface like collections, which proactively show me things without me searching anything.[00:07:12] Right. Like, uh, you even have Go links, you should copy it, I think from Google, right? Which like, it's basically, uh, you know, in my mind it's like this is ex Googlers missing Google internal stuff. So they just built it for everyone else. So,[00:07:25] well, I can, I can comment on that. So a, I should just plug that we have a new website as of today.[00:07:30] I don't know how, how it's received. So I saw it yesterday, so let, let me know. I think today we just launch, I don't know when we launched a new one, I think today or yesterday. Yeah,[00:07:38] it's[00:07:38] new. I opened it right now it's different than yesterday.[00:07:41] Okay. It's, it's today and yeah. So one thing that we find is that, Search in itself.[00:07:48] This is actually, I think, quite a big insight. Search in itself is not a compelling enough use case to keep people drawn to your product. It's easy to say Google search is like that, but Google Search was also in an era where that was the only website people knew, and now it's not like that. When you are a new tool that's coming into a company, you can't sit on your high horse and say, yeah, of course you're gonna use my tool to search.[00:08:13] No, they're not gonna remember who you are. They're gonna use it once and completely forget to really get that retention. You need to sort of go from being just a search engine to exactly what you said, Sean, to being sort of an employee portal that does much more than that. And yeah, the Go Links thing, I, I mean, yes, it is copied from Google.[00:08:33] I will say there's a complete other startup called Go links.io that has also copied it from Google and, and everyone, everyone misses Go Links. It's very useful to be able to write a document and just be like, go to go slash this. And. That's where the document is. And, and so we have built a big feature set around it.[00:08:50] I think one of the critical ones that I will call out is the feed. Just being able to see, not just, so documents that are trending in your sub-organization documents that you, we think you should see are a limited set of them, as well as now we've launched something called Mentions, which is super useful, which is all of your tags across all of your apps in one place in the last whatever, you know, time.[00:09:14] So it's like all of the hundred Slack pings that you have, plus the Jira pings, plus the, the, the email, all of that in one place is super useful to have. So you did GitHub. Yeah, we do get up to, we do get up to all the mentions.[00:09:28] Oh my God, that's amazing. I didn't know you had it, but, uh, um, this is something I wish for myself.[00:09:33] It's amazing.[00:09:34] It's still a little buggy right now, but I think it's pretty good. And, and we're gonna make it a lot better as as we go.[00:09:39] Why Employee Portals[00:09:39] This[00:09:39] is not in our preset list of questions, but I have one follow up, which is, you know, I've worked in quite a few startups now that don't have employee portals, and I've worked at Amazon, which had an employee portal, but it wasn't as beautiful or as smart as as glean.[00:09:53] Why isn't this a bigger norm in all[00:09:56] companies? Well, there's several reasons. I would say one reason is just the dynamics of how enterprise sales happens is. I wouldn't say broken. It is, it is what it is, but it doesn't always cater to employees being happy with the best tools. What it does cater to is there's different incentive structures, right?[00:10:16] So if I'm an IT buyer, I have a budget and I need to understand that for a hundred of these tools that are pitched to me all the time, which ones really help the company And the way usually those things are evaluated is does it increase revenue and does it cut cost? Those are the two biggest ones. And for a software like Glean or a search portal or employee portal, it's actually quite difficult when you're in, generally bucketed in the space of productivity to say, Hey, here's a compelling use use case for why we will cut your cost or increase your revenue.[00:10:52] It's just a softer argument that you have to make there. It's just a fundamental nature of the problem versus if you say, Hey, we're a customer support tool. Everyone in SaaS knows that customer support tools is just sort of the. The last thing that you go to when you're looking for ideas, because it's easy to sell.[00:11:08] It's like, here's a metric. How many tickets can your customer support agent resolve? We've built a thing that makes it 20% better. That means it's 1,000 thousand dollars cost savings. Pay us 50 k. Call it a deal. That's a good argument. That's a very simple, easy to understand argument. It's very difficult to make that argument with search, which you're like, okay, you're gonna get see about 10 to 20 searches that's gonna save about this much time, uh, a day.[00:11:33] And that results in this much employee productivity. People just don't buy it as easily. So the first reaction is, oh, we work fine without it. Why do we need this now? It's not like the company didn't work without this tool, and uh, and only when they have it do they realize what they were missing out on.[00:11:50] So it's a difficult thing to sell in, in some ways. So even though the product is, in my opinion, fantastic, sometimes the buyer isn't easily convinced because it doesn't increase revenue or cut cost.[00:12:01] The Requirements of Good Enterprise Search[00:12:01] In terms of technology, can you maybe talk about some of the stack and you see a lot of companies coming up now saying, oh, we help you do enterprise search.[00:12:10] And it's usually, you know, embedding to then do context for like a LLM query mostly. I'm guessing you started as like closer to like the vector side of thing maybe. Yeah. Talk a bit about that and some learning siva and as founders try to, to build products like this internally, what should they think[00:12:27] about?[00:12:28] Yeah, so actually leading back from the last answer, one of the ways a lot of companies who are in the enterprise search space are trying to tackle the problem of sales is to lean into how advance the technology is, which is useful. It's useful to say we are AI powered, LLM powered vector search, cutting edge, state-of-the-art, yada, yada, yada.[00:12:47] Put it all your buzzwords. That's nice, but. The question is how often does that translate to better user experience is sort of, a fuzzy area where it, it's really hard for even users to tell, to be honest. Like you can have one or two great queries and one really bad query and be like, I don't know if this thing is smart.[00:13:06] And it takes time to evaluate and understand how a certain engine is doing. So to that, I think one of the things that we learned from Google, a lot of us come from an ex Google search background, and one of the key learnings is often with search, it's not about how advanced or how complex the technology is, it's about the rigor and intellectual honesty that you put into tuning the ranking algorithm.[00:13:30] That's a painstaking long-term and slow process at Google until I would say maybe 20 17, 20 18. Everything was run off of almost no real ai, so to speak. It was just information retrieval at its core, very basic from the seventies, eighties, and a bunch of these ranking components that are put stacked on top of it that do various tasks really, really well.[00:13:57] So one task in search is query understanding what does the query mean? One task is synonymous. What are other synonyms for this thing that we can also match on? One task is document understanding. Is this document itself a high quality document or not? Or is it some sort of SEO spam? And admittedly, Google doesn't do so well on that anymore, but there's so many tough sub problems that it breaks search down into and then just gets each of those problems, right, to create a nice experience.[00:14:24] So to answer your question, also, vector search we do, but it is not the only way we get results. We do a hybrid approach both using, you know, core IR signal synonymy. Query accentuation with things like acronym expansion, as well as stuff like vector search, which is also useful. And then we apply our level of ranking understanding on top of that, which includes personalization, understanding.[00:14:50] If you're an engineer, you're probably not looking for Salesforce documents. You know, you're probably looking for documents that are published or co-authored by people in your team, in your immediate team, and our understanding of all of your interactions with people around you. Our personalization layer, our good work on ranking is what makes us.[00:15:09] Good. It's not sort of, Hey, drop in LLM and embeddings and we become amazing at search. That's not how we think it[00:15:16] works. Yeah. I think there's a lot of polish that mix into quality products, and that's the difference that you see between Hacker News, demos and, uh, glean, which is, uh, actual, you know, search and chat unicorn.[00:15:26] Glean Chat?[00:15:26] But also is there a glean chat coming? Is is, what do you think about the[00:15:30] chat form factor? I can't say anything about it, but I think that we are experi, my, my politically correct answer is we're experimenting with many technologies that use modern AI and LLMs, and we will launch what we think users like best.[00:15:49] Nice. You got some media training[00:15:51] again? Yeah. Very well handed.[00:15:53] Google vs ChatGPT[00:15:53] We can, uh, move off of Glean and just go into Google search. Uh, so you worked on search for four years. I've always wanted to ask what happens when I type something into Google? I feel like you know more than others and you obviously there's the things you cannot say, but I'm sure Google does a lot of the things that Glean does as well.[00:16:08] How do you think about this Google versus ChatGPT debate? Let's, let's maybe start at a high level based on what you see out there, and I think you, you see a lot of[00:16:15] misconceptions. Yeah. So, okay, let me, let me start with Google versus ChatGPT first. I think it's disingenuous, uh, if I don't say my own usage pattern, which is I almost don't go back to Google for a large section of my queries anymore.[00:16:29] I just use ChatGPT I am a paying plus subscriber and it's sort of my go-to for a lot of things. That I ask, and I also have to train my mind to realize that, oh, there's a whole set of questions in your head that you never realize the internet could answer for you, and that now you're like, oh, wait, I could actually ask this, and then you ask it.[00:16:48] So that's my current usage pattern. That being said, I don't think that ChatGPT is the best interface or technology for all sets of queries. I think humans are obviously very easily excited by new technology, but new technology does not always mean the previous technology was worse. The previous technology is actually really good for a lot of things, and for search in particular, if you think about all the queries that come into Google search, they fall into various kinds of query classes, depending on whatever taxonomy you want to use.[00:17:24] But one sort of way of, of of understanding broad, generally, the query classes is something that is information seeking or exploratory. And for information for exploratory queries. I think there are uses where Google does really well. Like for example, let's say you want to just know a list of songs of this artist in this year.[00:17:49] Google will probably be able to add a hundred percent, tell you that pretty accurately all the time. Or if you want to say understand like what showtimes of movies came out today. So fresh queries, another query class, Google will be really good at that chat, not so good at that. But if you look at information seeking queries, you could even argue that if I ask for information about Donald Trump, Maybe ChatGPT will spit out a reasonable sounding paragraph and it makes sense, but it doesn't give me enough stuff to like click on and go to and navigate to in a news article here.[00:18:25] And I just kind wanna see a lot of stuff happening. So if you really break down the problem, I think it's not as easy as saying ChatGPT is a silver bullet for every kind of information need. There's a lot of information needs, especially for tail queries. So for long. Un before seen queries like, Hey, tell me the cheat code in Doom three.[00:18:43] This level, this boss ChatGPTs gonna blow it out the water on those kind of queries cuz it's gonna figure out all of these from these random sparse documents and random Reddit threads and assemble one consistent answer for you where it takes forever to find this kind of stuff on Google. For me personally, coding is the biggest use case for anything technical.[00:19:02] I just go to ChatGPT cuz parsing through Stack Overflow is just too mentally taxing and I don't care about, even if ChatGPT hallucinates a wrong answer, I can verify that. But I like seeing a coherent, nice answer that I can just kind of good starting point for my research on whatever I'm trying to understand.[00:19:20] Did you see the, the statistic that, uh, the Allin guys have been saying, which is, uh, stack overflow traffic is down 15%? Yeah, I did, I did.[00:19:27] See that[00:19:28] makes sense. But I, I, I don't know if it's like only because of ChatGPT, but yeah, sure. I believe[00:19:33] it. No, the second part was just about if some of the enterprise product search moves out of Google, like cannot, that's obviously a big AdWords revenue driver.[00:19:43] What are like some of the implications in terms of the, the business[00:19:46] there?[00:19:47] Search Issues: Freshness[00:19:47] Okay,[00:19:47] so I would split this answer into two parts. My first part is just talking about freshness, cuz the query that you mentioned is, is specifically the, the issue there is being able to access fresh information. Google just blanket calls his freshness.[00:20:01] Today's understanding of large language models is that it cannot do anything that's highly fresh. You just can't train these things fast enough and cost efficiently enough to constantly index new, new. Sources of data and then serve it at the same time in any way that's feasible. That might change in the future, but today it's not possible.[00:20:20] The best thing that you can get that's close to it is what, you know, the fancy term is retrieval, augmented generation, but it's a fancy way of saying just do the search in the background and then use the results to create the actual response. That's what Bing does today. So to answer the question about freshness, I would say it is possible to do with these methods, but those methods all in all involve using search in the backend to, to sort of get the context to generate the answer.[00:20:49] Search Issues: Ad Revenue[00:20:49] The second part of the answer is, okay, talk about ad revenue. A lot of Google's ad revenue just comes from the fact that over the last two decades, it's figured out how to put ad links on top of a search result page that sometimes users click. Now the user behavior on a chat product is not to click on anything.[00:21:10] You don't click on stuff you just read and you move on. And that actually, in my opinion, has severe impacts on the web ecosystem, on all of Google and all of technology and how we use the internet in the future. And, and the reason is one thing we also take for granted is that this ad revenue where everyone likes to say Google is bad, Google makes money off ads, yada, yada, yada, but this ad revenue kind of sponsored the entire internet.[00:21:37] So you have Google Maps and Google search and photos and drive and all of this great free stuff basically because of ads. Now, when you have this new interface, sure it, it comes with some benefits, but if users aren't gonna click on ads and you replace the search interface with just chat, that can actually be pretty dangerous in terms of what it even means.[00:21:59] To have to create a website, like why would I create a website if no one's gonna come to my. If it's just gonna be used to train a model and then someone's gonna spit out whatever my website says, then there's no incentive. And that kind of dwindles the web ecosystem. In the end, it means less ad revenue.[00:22:15] And then the other existential question is, okay, I'm okay with saying the incumbent. Google gets defeated and there's this new hero, which is, I don't know, open AI and Microsoft. Now reinvent the wheel. All of that stuff is great, but how are they gonna make money? They can make money off, I guess, subscriptions.[00:22:31] But subscriptions is not nearly gonna make you enough. To replace what you can make on ad revenue. Even for Bing today. Bing makes it 11 billion off ad revenue. It's not a society product like it's a huge product, and they're not gonna make 11 billion off subscriptions, I'll tell you that. So even they can't really replace search with this with chat.[00:22:51] And then there are some arguments around, okay, what if you start to inject ads in textual form? But you know, in my view, if the natural user inclination is not to click on something or chat, they're clearly not gonna click on something. No matter how much you try to inject, click targets into your result.[00:23:10] So, That's, that's my long answer to the ads question. I don't really know. I just smell danger in the horizon.[00:23:17] Search Issues: Latency[00:23:17] You mentioned the information augmented generation as well. Uh, I presumably that is literally Bing is probably just using the long context of GPT4 and taking the full text of all the links that they find, dumping it in, and then generating some answer.[00:23:34] Do you think like speed is a concern or people are just people willing to wait for smarter?[00:23:40] I think it's a concern. We noticed that every, every single product I've worked on, there's almost a linear, at least for some section of it, a very linear curve. A linear line that says the more the latency, the less the engagement, so there's always gonna be some drop off.[00:23:55] So it is a concern, but with things like latency, I just kind of presume that time solves these things. You optimize stuff, you make things a little better, and the latency will get down with time. And it's a good time to even mention that. Bard, we just came out today. Google's LLM. For Google's equivalent, I haven't tried it, but I've been reading about it, and that's based off a model called LamDA.[00:24:18] And LamDA intrinsically actually does that. So it does query what they call a tool set and they query search or a calculator or a compiler or a translator. Things that are good at factual, deterministic information. And then it keeps changing its response depending on the feedback from the tool set, effectively doing something very similar to what Bing does.[00:24:42] Search Issues: Accuracy[00:24:42] But I like their framing of the problem where it's just not just search, it's any given set of tools. Which is similar to what a Facebook paper called Tool Former, where you can think of language as one aspect of the problem and language interfaces with computation, which is another aspect of the problem.[00:24:58] And if you can separate those two, this one just talks to these things and figures out what to, how to phrase it. Yeah, so it's not really coming up with the answer. Their claim is like GPT4, for example. The reason it's able to do factual accuracy without search is just by memorizing facts. And that doesn't scale.[00:25:18] It's literally somewhere in the whole model. It knows that the CEO of Tesla is Elon Musk. It just knows that. But it doesn't know that this is a competition. It just knows that. Usually I see CEO, Tesla, Elon, that's all it knows. So the abstraction of language model to computational unit or tool set is an interesting one that I think is gonna be more explored by all of these engines.[00:25:40] Um, and the latency, you know, it'll.[00:25:42] I think you're focusing on the right things there. I actually saw another article this morning about the memorization capability. You know how GPT4 is a lot of, uh, marketed on its ability to answer SAT questions and GRE questions and bar exams and, you know, we covered this in our benchmarks podcast Alessio, but like I forgot to mention that all these answers are out there and were probably memorized.[00:26:05] And if you change them just, just a little bit, the model performance will probably drop a lot.[00:26:10] It's true. I think the most compelling, uh, proof of that, of what you just said is the, the code forces one where somebody I think tweeted, tweeted, tweeted about the, yeah, the 2021. Everything before 2021. It solves everything after.[00:26:22] It doesn't, and I thought that was interesting.[00:26:24] Search Issues: Tool Use[00:26:24] It's just, it's just dumb. I'm interested in two former, and I'm interested in react type, uh, patterns. Zapier just launched a natural language integration with LangChain. Are you able to compare contrast, like what approaches you like when it comes to LMS using[00:26:36] tools?[00:26:37] I think it's not boiled down to a science enough for me to say anything that's uh, useful. Like I think everyone is at a point of time where they're just playing with it. There's no way to reason about what LLMs can and can't do. And most people are just throwing things at a wall and seeing what sticks.[00:26:57] And if anyone claims to be doing better, they're probably lying because no one knows how these things behaves. You can't predict what the output is gonna be. You just think, okay, let's see if this works. This is my prompt. And then you measure and you're like, oh, that worked. Versus the stint and things like react and tool, form are really cool.[00:27:16] But those are just examples of things that people have thrown at a wall that stuck. Well, I mean, it's provably, it works. It works pretty, pretty well. I will say that one of the. It's not really of the framing of what kind of ways can you use LLMs to make it do cool things, but people forget when they're looking at cutting edge stuff is a lot of these LLMs can be used to generate synthetic data to bootstrap smaller models, and it's a less sexy space of it all.[00:27:44] But I think that stuff is really, really cool. Where, for example, I want to tag entities in a sentence that's a very simple classical natural language problem of NER. And what I do is I just, before I had to gather training data, train model, tune model, all of this other stuff. Now what I can do is I can throw GPT4 at it to generate a ton of synthetic data, which looks actually really good.[00:28:11] And then I can either just train whatever model I wanted to train before on this data, or I can use something called like low rank adaptation, which is distilling this large model into a much smaller, cost effective, fast model that does that task really well. And in terms of productionable natural language systems, that is amazing that this is stuff you couldn't do before.[00:28:35] You would have teams working for years to solve NER and that's just what that team does. And there's a great red and viral thread about our, all the NLP teams at Big Tech, doomed and yeah, I mean, to an extent now you can do this stuff in weeks, which is[00:28:51] huge.[00:28:52] Other AI Search takes: Perplexity and Neeva[00:28:52] What about some of the other kind of like, uh, AI native search, things like perplexity, elicit, have you played with, with any of them?[00:29:00] Any thoughts on[00:29:01] it? Yeah. I have played with perplexity and, and niva. Everyone. I think both of those products sort of try to do, again, search results, synthesis. Personally, I think Perplexity might be doing something else now, but I don't see the, any of those. Companies or products are disrupting either open AI or ChatGPT or Google being whatever prominent search engines with what they do, because they're all built off basically the Bing API or their own version of an index and their search itself is not good enough and there's not a compelling use case enough, I think, to use those products.[00:29:40] I don't know how they would make money, a lot of Neeva's way of making money as subscriptions. Perplexity I don't think has ever turned on the revenue dial. I just have more existential concerns about those products actually functioning in the long run. So, um, I think I see them as they're, they're nice, they're nice to play with.[00:29:56] It's cool to see the cutting edge innovation, but I don't really understand if they will be long lasting widely used products.[00:30:05] Why Document QA will Struggle[00:30:05] Do you have any idea of what it might take to actually do like a new kind of like, type of company in this space? Like Google's big thing was like page rank, right? That was like one thing that kind of set them apart.[00:30:17] Like people tried doing search before, like. Do you have an intuition for what, like the LM native page rank thing is gonna be to make something like this exist? Or have we kinda, you know, hit the plateau when it comes to search innovation?[00:30:31] So I, I talk to so many of my friends who are obviously excited about this technology as well, and many of them who are starting LLM companies.[00:30:38] You know, how many companies in the YC batch of, you know, winter 23 are LM companies? Crazy half of them. Right? Right. It's, it's ridiculous. But what I always, I think everyone's struggling with this problem is what is your advantage? What is your moat? I don't see it for a lot of these companies, and, uh, it's unclear.[00:30:58] I, I don't have a strong intuition. My sense is that the people who focus on problem first usually get much further than the people who focus solution first. And there's way too many companies that are solutions first. Which makes sense. It's always been the, a big achilles heel of the Silicon Valley.[00:31:16] We're a bunch of nerds that live in a whole different dimension, which nobody else can relate to, but nobody else. The problem is nobody else can relate to them and we can't relate to their problems either. So we look at tech first, not problem first a lot. And I see a lot of companies just, just do that.[00:31:32] Where I'll tell you one, this is quite entertaining to me. A very common theme is, Hey, LMS are cool, that, that's awesome. We should build something. Well, what should we build? And it's like, okay, consumer, consumer is cool, we should build consumer. Then it's like, ah, nah man. Consumers, consumer's pretty hard.[00:31:49] Uh, it's gonna be a clubhouse gonna blow up. I don't wanna blow up, I just wanna build something that's like, you know, pretty easy to be consistent with. We should go enter. Cool. Let's go enterprise. So you go enterprise. It's like, okay, we brought LMS to the enterprise. Now what problem do we tackle? And it's like, okay, well we can do q and A on documents.[00:32:06] People know how to do that, right? We've seen a couple of demos on that. So they build it, they build q and a on documents, and then they struggle with selling, or they're like, or people just ask, Hey, but I don't ask questions to my documents. Like, you realize this is just not a flow that I do, like I, oh no.[00:32:22] I ask questions in general, but I don't ask them to my documents. And also like what documents can you ask questions to? And they'll be like, well, any of them is, they'll say, can I ask them to all of my documents? And they'll be like, well, sure, if you give them, give us all your documents, you can ask anything.[00:32:39] And then they'll say, okay, how will you take all my document? Oh, it seems like we have to build some sort of indexing mechanism and then from one thing to the other, you get to a point where it's like we're building enterprise search and we're building an LM on top of it, and that is our product. Or you go to like ML ops and I'm gonna help you host models, I'm gonna help you train models.[00:33:00] And I don't know, it's, it seems very solution first and not problem first. So the only thing I would recommend is if you think about the actual problems and talk to users and understand what this can be useful for. It doesn't have to be that sexy of how it's used, but if it works and solves the problem, you've done your job.[00:33:18] Investing in AI Startups[00:33:18] I love that whole evolution because I think quite a few companies ha are, independently finding this path and, going down this route to build a glorified, you know, search spot. We actually interviewed a very problem focused builder, Mickey Friedman, who's very, very focused on products placement, image generation.[00:33:34] , and, you know, she's not focused on anything else in terms of image generation, like just focused on product placement and branding. And I think that's probably the right approach, you know, and, and if you think about like Jasper, right? Like they, they're out of all the other GPT3 companies when, when GPT3 first came out, they built focusing on, you know, writers on Facebook, you know, didn't even market on Twitter.[00:33:56] So like most people haven't heard of them. Uh, I think it's a timeless startup lesson, but it's something to remind people when they're building with, uh, language models. I mean, as a, as an investor like you, you know, you are an investor, you're your scout with me. Doesn't that make it hard to invest in anything like, cuz.[00:34:10] Mostly it's just like the incumbents will get to the innovation faster than startups will find traction.[00:34:16] Really. Like, oh, this is gonna be a hot take too. But, okay. My, my in, in investing, uh, with people, especially early, is often for me governed by my intuition of how they approach the problem and their experience with the technology, and pretty much solely that I don.[00:34:37] Really pretend to be an expert in the industry or the space that's their problem. If I think they're smart and they understand the space better than me, then I mostly convinced as if they've thought through enough of the business stuff, if they've thought through the, the market and everything else. I'm convinced I typically stray away from, you know, just what I just said.[00:34:57] Founders who are like LMS are cool and we should build something with them. That's not like usually very convincing to me. That's not a thesis. But I don't concern myself too much with pretending to understand what this space means. I trust them to do that. If I'm convinced that they're smart and they've thought about it, well then I'm pretty convinced that that they're a good person to, to, to[00:35:20] back.[00:35:21] Cool.[00:35:21] Actually Interesting Ideas in AI[00:35:21] Kinda like super novel idea that you wanna shout.[00:35:25] There's a lot of interesting explorations, uh, going on. Um, I, I, okay, I'll, I'll preface this with I, anything in enterprise I just don't think is cool. It's like including, like, it's just, it's, you can't call it cool, man. You're building products for businesses.[00:35:37] Glean is pretty cool. I'm impressed by Glean. This is what I'm saying. It's, it's cool for the Silicon Valley. It's not cool. Like, you're not gonna go to a dinner party with your parents and be like, Hey mom, I work on enterprise search. Isn't that awesome? And they're not all my, all my[00:35:51] notifications in one place.[00:35:52] Whoa.[00:35:55] So I will, I'll, I'll start by saying, for in my head, cool means like, the world finds this amazing and, and it has to be somewhat consumer. And I do think that. The ideas that are being played with, like Quora is playing with Poe. It's kind of strange to think about, and may not stick as is, but I like that they're approaching it with a very different framing, which is, Hey, how about you talk to this, this chat bot, but let's move out of this, this world where everyone's like, it's not WhatsApp or Telegram, it's not a messaging app.[00:36:30] You are actually generating some piece of content that now everybody can make you use of. And is there something there Not clear yet, but it's an interesting idea. I can see that being something where, you know, people just learn. Or see cool things that GPT4 has said or chatbots have said that's interesting in the image space.[00:36:49] Very contrasted to the language space. There's so much like I don't even begin to understand the image space. Everything I see is just like blows my mind. I don't know how mid journey gets from six fingers to five fingers. I don't understand this. It's amazing. I love it. I don't understand what the value is in terms of revenue.[00:37:08] I don't know where the markets are in, in image, but I do think that's way, way cooler because that's a demo where, and I, and I tried this, I showed GPT4 to, to my mom and my mom's like, yeah, this is pretty cool. It does some pretty interesting stuff. And then I showed the image one and she is just like, this is unbelievable.[00:37:28] There's no way a computer could write do this, and she just could not digest it. And I love when you see those interactions. So I do think image world is a whole different beast. Um, and, and in terms of coolness, lot more cool stuff happening in image video multimodal I think is really, really cool. So I haven't seen too many startups that are doing something where I'm like, wow, that's, that's amazing.[00:37:51] Oh, 11 labs. I'll, I'll mention 11 labs is pretty cool. They're the only ones that I know that are doing Oh, the voice synthesis. Have you tried it? I've only played with it. I haven't really tried generating my own voice, but I've seen some examples and it looks really, really awesome. I've heard[00:38:06] that Descript is coming up with some stuff as well to compete, cuz yeah, this is definitely the next frontier in terms of, podcasting.[00:38:13] Harry Potter IRL[00:38:13] One last thing I I will say on the cool front is I think there is something to be said about. A product that brings together all these disparate advancements in ai. And I have a view on what that looks like. I don't know if everyone shares that view, but if you bring together image generation, voice recognition, language modeling, tts, and like all of the other image stuff they can do with like clip and Dream booth and putting someone's actual face in it.[00:38:41] What you can actually make, this is my view of it, is the Harry Potter picture come to life where you actually have just a digital stand where there's a person who's just capable of talking to you in their voice, in, you know, understandable dialogue. That is how they speak. And you could just sort of walk by, they'll look at you, you can say hi, they'll be, they'll say hi back.[00:39:03] They'll start talking to you. You start talking back to it. That's sort of my, that's my my wild science fiction dream. And I think the technology exists to put all of those pieces together and. The implications for people who are older or saving people over time are huge. This could be a really cool thing to productionize.[00:39:23] AI Infra Cost Math[00:39:23] There's one more part of you that also tweets about numbers and math, uh, AI math essentially is how I'm thinking about it. What gets you into talking about costs and math and, and you know, just like first principles of how to think about language models.[00:39:39] One of my biggest beefs with big companies is how they abstract the cost away from all the engineers.[00:39:46] So when you're working on a Google search, I can't tell you a single number that is cost related at all. Like I just don't know the cost numbers. It's so far down the chain that I have no clue how much it actually costs to run search, and how much these various things cost aside from what the public knows.[00:40:03] And I found that very annoying because when you are building a startup, particularly maybe an enterprise startup, you have to be extremely cognizant about the cost because that's your unit economics. Like your primary cost is the money you spend on infrastructure, not your actual labor costs. The whole thesis is the labor doesn't scale, but the inf.[00:40:21] Does scale. So you need to understand how your infra costs scale. So when it comes to language models, given that these things are so compute heavy, but none of the papers talk about cost either. And it's just bothers me. I'm like, why can't you just tell me how much it costs you to, to build this thing?[00:40:39] It's not that hard to say. And it's also not that hard to figure out. They give you everything else, which is, you know, how many TPUs it took and how long they trained it for and all of that other stuff, but they don't tell you the cost. So I've always been curious because ev all everybody ever says is it's expensive and a startup can't do it, and an individual can't do it.[00:41:01] So then the natural question is, okay, how expensive is it? And that's sort of the, the, the background behind. Why I started doing some more AI math and, and one of the tweets that probably the one that you're talking about is where I compare the cost of LlaMA, which is Facebook's LLM, to PaLM with, uh, my best estimates.[00:41:23] And, uh, the only thing I'll add to that is it is quite tricky to even talk about these things publicly because you get rammed in the comments because by people who are like, oh, don't you know that this assumption that you made is completely BS because you should have taken this cost per hour? Because obviously people do bulk deals.[00:41:42] And yeah, I have two 80 characters. This is what I could have said. But I think ballpark, I think I got close. I, I'd like to imagine, I think I was off maybe by, by by two x on the lower side. I think I took an upper bound and I might have been off by, by two x. So my quote was 4 million for LlaMA and 27 for PaLM.[00:42:01] In fact, later today I'm going to do, uh, one on Bard. So. Oh oh one bar. Oh, the exclusive is that It's four, it's 4 million for Bard two.[00:42:10] Nice. Nice. Which is like, do you think that's like, don't you think that's actually not a lot, like it's a drop in the bucket for these[00:42:17] guys. One, and one of the, the valuable things to note when you're talking about this cost is this is the cost of the final training step.[00:42:24] It's not the cost of the entire process. And a common rebuttal is, well, yeah, this is your cost of the final training process, but in total it's about 10 x this amount cost. Because you have to experiment. You have to tune hyper parameters, you have to understand different architectures, you have to experiment with different kinds of training data.[00:42:43] And sometimes you just screw it up and you don't know why. And you have, you're just spend a lot of time figuring out why you screwed it up. And that's where the actual cost buildup happens, not in the one final last step where you actually train the final model. So even assuming like a 10 x on top of this, I think is, is, is fair for how much it would actually cost a startup to build this from scratch?[00:43:03] I would say.[00:43:04] Open Source LLMs[00:43:04] How do you think about open source in this then? I think a lot of people's big 2023 predictions are an LLM, you know, open source LLM, that is comparable performance to the GPT3 model. Who foots the bill for the mistakes? You know, like when when somebody opens support request that it's not good.[00:43:25] It doesn't really cost people much outside of like a GitHub actions run as people try entering these things separately. Like do you think open source is actually bad because you're wasting so much compute by so many people trying to like do their own things and like, do you think it's better to have a centralized team that organizes these experiments or Yeah.[00:43:43] Any thoughts there? I have some thoughts. I. The most easy comparison to make is to image generation world where, you know, you had Mid Journey and Dolly come out first, and then you had Imad come out with stability, which was completely open source. But the difference there is I think stability. You can pretty much run on your machine and it's okay.[00:44:06] It works pretty fast. So it, so the entire concept of, of open sourcing, it worked and people made forks that fine tuned it on a bunch of different random things and it made variance of stability that could. A bunch of things. So I thought the stability thing, agnostic of the general ethical concerns of training on everyone's art.[00:44:25] I thought it was a cool, cool addition to the sort of trade-offs in different models that you can have in image generation for text generation. We're seeing an equivalent effect with LlaMA and alpaca, which LlaMA being, being Facebook's model, which they didn't really open source, but then the weights got leaked and then people clone them and then they tuned them using GPT4 generated synthetic data and made alpaca.[00:44:50] So the version I think that's out there is only the 7,000,000,001 and then this crazy European c plus plus God. Came and said, you know what, I'm gonna write this entire thing in c plus plus so you can actually run it locally and and not have to buy GPUs. And a combination of those. And of course a lot of people have done work in optimizing these things to make it actually function quickly.[00:45:13] And we can get into details there, but a function of all of these things has enabled people to actually. Semi-good models on their computer. I don't have that much, I don't have any comments on, you know, energy usage and all of that. I don't really have an opinion on that. I think the fact that you can run a local version of this is just really, really cool, but also supremely dangerous because with images, conceivably, people can tell what's fake and what's real, even though there, there's some concerns there as well. But for text it's, you know, like you can do a lot of really bad things with your own, you know, text generation algorithm. You know, if I wanted to make somebody's life hell, I could spam them in the most insidious ways with all sorts of different kinds of text generation indefinitely, which I, I can't really do with images.[00:46:02] I don't know. I find it somewhat ethically problematic in terms of the power is too much for an individual to wield. But there are some libertarians who are like, yeah, why should only open AI have this power? I want this power too. So there's merits to both sides of the argument. I think it's generally good for the ecosystem.[00:46:20] Generally, it will get faster and the latency will get better and the models may not ever reach the size of the cutting edge that's possible, but it could be good enough to do. 80% of the things that bigger model could do. And I think that's a really good start for innovation. I mean, you could just have people come up with stuff instead of companies, and that always unlocks a whole vector of innovation that didn't previously exist.[00:46:45] Other Modalities[00:46:45] That was a really good, conclusion. I, I, I want to ask follow up questions, but also, that was a really good place to end it. Was there any other AI topics that you wanted to[00:46:52] touch on? I think Runway ML is the one company I didn't mention and that, that one's, uh, one to look out for.[00:46:58] I think doing really cool stuff in terms of video editing with generative techniques. So people often talk about the open AI and the Googles of the world and philanthropic and clo and cohere and big journey, all the image stuff. But I think the places that people aren't paying enough attention to that will get a lot more love in the next couple of years.[00:47:19] Better whisper, so better streaming voice recognition, better t t s. So some open source version of 11 labs that people can start using. And then the frontier is sort of multi-modality and videos. Can you do anything with videos? Can you edit videos? Can you stitch things together into videos from images, all sorts of different cool stuff.[00:47:40] And then there's sort of the long tail of companies like Luma that are working on like 3D modeling with generative use cases and taking an image and creating a 3D model from nothing. And uh, that's pretty cool too, although the practical use cases to me are a little less clear. Uh, so that's kind of covers the entire space in my head at least.[00:48:00] I[00:48:00] like using the Harry Potter image, like the moving and speaking images as a end goal. I think that's something that consumers can really get behind as well. That's super cool.[00:48:09] Exam Fraud and Generated Text Detection[00:48:09] To double back a little bit before we go into the lining round, I have one more thing, which is, relevant to your personal story, but then also relevant to our debate, which is a nice blend.[00:48:18] You're concerned about the safety of everyone having access to language models and you know, the potential harm that you can do there. My guess is that you're also not that positive on watermarking. Techniques from internal languages, right? Like maybe randomly sprinkling weird characters so that people can see like that this is generated by an AI model, but also like you have some personal experience with this because you found manipulation in the Indian Exam Board, which, uh, maybe you might be a similar story.[00:48:48] I, I don't know if you like, have any thoughts about just watermarking manipulation, like, you know, ethical deployments of, of, uh,[00:48:55] generated data.[00:48:57] Well, I think those two things are a little separate. Okay. One I would say is for watermarking text data. There is a couple of different approaches. I think there is actual value to that because from a pure technical perspective, you don't want models to train on stuff they've generated.[00:49:13] That's kind of bad for models. Yes. And two is obviously you don't want people to keep using Chatt p t for i, I don't know if you want this to use it for all their assignments and never be caught. Maybe you don't. Maybe you don't. But it, it seems like it's valuable to at least understand that this is a machine generated text versus not just ethically that seems, seems like something that should exist.[00:49:33] So I do think watermarking is, is. A good direction of research and it's, and I'm fairly positive on it. I actually do think people should standardize how that water marketing works across language models so that everyone can detect and understand language models and not just, OpenAI does its own models, but not the other ones and, and so on.[00:49:51] So that's my view on that. And then, and sort of transitioning into the exam data, this is really old one, but it's one of my favorite things to talk about is I. In America, as you know. Usually the way it works is you give your, you, you take your s a t exam, uh, you take a couple of aps, you do your school grades, you apply to colleges, you do a bunch of fluff.[00:50:10] You try to prove how you're good at everything. And then you, you apply to colleges and then it's a, a weird decision based on a hundred other factors. And then they decide whether you get in or not. But if you're rich, you're basically gonna get in anyway. And if you're a legacy, you're probably gonna get in and there's a whole bunch of stuff going on.[00:50:23] And I don't think the system is necessarily bad, but it's just really complicated. And some of the things are weird in India and in a lot of the non developed world, people are like, yeah, okay, we can't scale that. There's no way we can have enough people like. Non rigorously evaluate this cuz there's gonna be too much corruption and it's gonna be terrible at the end cuz people are just gonna pay their way in.[00:50:45] So usually it works in a very simple way where you take an exam that is standardized and sometimes you have many exams, sometimes you have an exam for a different subject. Sometimes it's just one for everything. And you get ranked on that exam and depending on your rank you get to choose the quality and the kind of thing you want to study.[00:51:03] Which this, the kind of thing always surprises people in America where it's not like, oh it's glory land, where you walk in and you're like, I think this is interesting and I wanna study this. Like, no, in the most of the world it's like you're not smart enough to study this, so you're probably not gonna study it.[00:51:18] And there's like a rank order of things that you need to be smart enough to do. So it's, it's different. And therefore these exams. Much more critical for the functioning of the system. So when there's fraud, it's not like a small part of your application going wrong, it's your entire application going wrong.[00:51:36] And that's why, that's just me explaining why this is severe. Now, one such exam is the one that you take in school. There's a, it's called a board exam. You take one in the 10th grade, which doesn't really matter for much, but, and then you take one in the 12th grade when you're about to graduate and that.[00:51:53] How you, where you go to college for a large set of colleges, not all, but a large set of colleges, and based on how much you get on your top five average, you're sort of slotted into a different stream in a d in a, in a different college. And over time, because of the competition between two of the boards that are a duopoly, there's no standardization.[00:52:13] So everyone's trying to like, give more marks than the, the, the other person to attract more students into their board because oh, that means that you can then claim, oh, you're gonna get into a better college if you take our exam and don't go to a school that administers the other exam. What? So it's, and that's, that's the, everyone knew that was happening ish, but there was no data to back it.[00:52:34] But when you actually take this exam as I did, you start realizing that the numbers, the marks make no sense because you're looking at. Kid who's also in your class and you're like, dude, this guy's not smart. How did he get a 90 in English? He's not good at English. Like, you can't speak it. You cannot give him a 90.[00:52:54] You gave me a 90. How did this guy get a 90? So everyone has like their anecdotal, this doesn't make any sense me, uh, moments with, with this exam, but no one has access to the data. So way back when, what I did was I realized they have very little security surrounding the data where the only thing that you need to put in to get access is your role number.[00:53:15] And so as long as you predict the right set of role numbers, you can get everybody's results. So unlike America, also exam results aren't treated with a level of privacy. In India, it's very common to sort of the entire class's results on a bulletin board. And you just see how everyone did and you shamed the people who are stupid.[00:53:32] That's just how it works. It's changed over time, but that's fundamentally a cultural difference. And so when I scraped all these results and I published it, and I, and I did some analysis, what I found was, A couple of very insidious things. One is that in, if you plot the distribution of marks, you generally tend to see some sort of skewed, but pseudo normal distribution where it's a big peak and a, and it falls off on both ends, but you see two interesting patterns.[00:54:01] One that is just the most obvious one, which is Grace Marks, which is the pass grade is 33. You don't see nobody got between 29 and 32 because what they did for every single exam is they just made you pass. They just rounded up to 33, which is okay. I'm not that concerned about whether you give Grace Marks.[00:54:21] It's kind of messed up that you do that, but okay, fine. You want to pass a bunch of people who deserve to fail, do it. Then the other more concerning thing was between 33 and 93, right? That's about 60 numbers, 61 numbers, 30 of those numbers were just missing, as in nobody got 91 on this exam. In any subject in any year.[00:54:44] How, how does that happen? You, you don't get a 91, you don't get a 93, 89, 87, 85, 84. Some numbers were just missing. And at first when I saw this, I'm like, this is definitely some bug in my code. There's no way that, like, there's 91 never happened. And so I started, I remember I asked a bunch of my friends, I'm like, dude, did you ever get a 9 81 in anything?[00:55:06] And they're like, no. And it just unraveled that this is obviously problematic cuz that means that they're screwing with your final marks in some way or the other. Yeah. And, and they're not transparent about how they do it. Then I did, I did the same thing for the other board. We found something similar there, but not, not, not the same.[00:55:24] The problem there was, there was a huge spike at 95 and then I realized what they were doing is they'd offer various exams and to standardize, they would blanket add like a, a, a, a raw number. So if you took the harder math exam, everyone would get plus 10. Arbitrarily, no one. This is not revealed or publicized.[00:55:41] It's randomly, that was the harder exam you guys all get plus 10, but it's capped at 95. That's just this stupid way to standardize. It doesn't make any sense. Ah, um, they're not transparent about it. And it affects your entire life because yeah, this is what gets you into college. And yeah, if you add the two exams up, this is 1.1 million kids taking it every year.[00:56:02] So that's a lot of people's lives that you're screwing with by not understanding numbers and, and not being transparent about how you're manipulating them. So that was the thesis in my view, looking back on it, 10 years later, it's been 10 years at this point. I think the media never did justice to it because to be honest, nobody understands statistics.[00:56:23] So over time it became a big issue then. And then there was a big Supreme court or high court ruling, which said, Hey, you guys can't do this, but there's no transparency. So there's no way of actually ensuring that they're not doing it. They just added a, a level of password protection, so now I can't scrape it anymore.[00:56:40] And, uh, they probably do the same thing and it's probably still as bad, but people aren't. Raising an issue about it. It's really hard to make this people understand the significance of it because people are so compelled to just go lean into the narrative of exams are b******t and we should never trust exams, and this is why it's okay to be dumb.[00:56:59] And it's not, that's not the point, like the point. So, I, I think the, the response was lackluster in retrospect, but that's, that's what I unveiled in 2013. That's fascinating.[00:57:09] You know, in my finance background, uh, the similar case happens with the Madoff funds because if you plot the, the statistical distribution of the, the Madoff funds, you could see that they were just not a normal distribution, and therefore they would, they would probably made up numbers.[00:57:25] And, uh, we also did the same thing in my first job as a, as a regulator in Singapore for, for hedge funds returns. Wow. Which is watermarking. It's this, this is a watermark of a human or, uh, some kind of system. Uh, you know, making it up. And statistically, if you look at the distribution, you can see like this, this violates any reasonable assumption.[00:57:41] Therefore, something's.[00:57:42] Wrong. Well, I see, I see what you mean there. Like in that sense. Yes. That's really cool that you worked on a very similar problem, and I agree that it's messed up. It's a good way to catch liars in[00:57:53] Madoff's case. Like they actually made it a big deal, but I don't know, like I don't see how this was a big, wasn't a bigger deal in India.[00:57:58] But anyway, uh, that's a conversation for another, uh, over drinks perhaps.[00:58:01] Lightning Round[00:58:01] But, so now we're gonna go into the lightning round. Just to cut things off with a, uh, overview. What are your favorite AI people and communities? You mentioned Reddits. Let's be specific about which, uh,[00:58:12] I actually don't really use Reddit that much for, uh, AI stuff.[00:58:16] It was just one, a one-off example. Most of my learnings are Twitter, and I think there are the obvious ones, like everyone follows Riley Goodside now and there's a bunch of like the really famous ones. But I think amongst the lesser known ones, there are, let me say just my favorite one is probably AI Pub because it does a roundup of everybody else's stuff regularly.[00:58:40] I know Brian who runs AI Pub as well, and I just think I find it really useful cuz often it's very hard to catch up on stuff and this gives you the entire roundup of last two weeks, here's what happened in ai.[00:58:51] Good, good, good. Uh, and any other communities like Slack communities, the scores? You don't[00:58:55] do that stuff?[00:58:56] I try to, but I, I don't because it's too time consuming. I prefer reading at my own pace.[00:59:02] Yeah, yeah, yeah. Okay. So, so my, my learning is, uh, start a Twitter like, uh, weekly recap of here's what happened in ai. I mean, it makes sense, right? Like it'll do very well. It was you[00:59:11] very well a year from now. What do you think people will be the most surprised[00:59:15] by in ai?[00:59:17] I think they're gonna be surprised at how much cheaper they're able to bring out, down the cost to, and how much faster that these models get. I'm more optimistic about cost and latency more than I am about just quality improvements at this point. I think modalities will change, but I think quality is near about like a, a maxima that we're gonna achieve.[00:59:42] So this is a request for startups or a request for site projects. What's an AI thing that you would pay for? Is somebody else built[00:59:47] it aside from the Harry Potter image one, which I would definitely, I would pay a lot of money to have like a floating, I don't know, bill Clinton in my room, just saying things back to me whenever I talk to it.[00:59:59] That would be cool. But in terms of other products, uh, if somebody built. A product that would smartly, I know many people have tried to build things like this that would smartly auto respond to things that it can auto respond to. And for the things that are actually important, please don't auto respond and just tell me to do it.[01:00:19] And that distinction, I think is really important. So somewhere in between the automate everything and the just suggest everything hybrid that works well, I think that would be really cool. Yeah. I've thought[01:00:30] about this as well. Even if it doesn't respond for you, it can draft an answer for you to edit.[01:00:35] Right. Uh, so that you, you at least get to review.[01:00:37] I actually built that this morning. If you guys want it. Ooh. You just, oh, with Gmail and then it pre-draft every email in your inbox. Really? But, uh, yeah, you have to change the prompt because my prompt says like, you. Software engineer. I'm a venture capitalist, this is where it works, blah, blah, blah, blah.[01:00:55] But you can modify that and then it, it works. It works. Are you[01:00:58] gonna open source it?[01:01:00] I, I probably will, but it sometimes it's like it cares too much about the prompt. So for example, in the prompt, I was like, if the person is asking about scheduling, suggest the time and public like the calendar, my calendar or give this calendar link in every email.[01:01:15] It will respond. And if you ever wanna chat, here's my calendar. Like no matter what the email was, every email, it would tell them to schedule time. So there's still work to[01:01:24] be done. You're just very helpful. You're just very, very helpful. Well, so actually I have a GitHub version of this, which I actually would pay someone to build, which is read somebody who opened a GitHub issue, like, and, and check if they have missed anything for resolutions.[01:01:38] And then generate response to like request for resolution. And then like me, you know, if, if they haven't answered in like 30 days, close the issue.[01:01:45] Absolutely. And, and one thing I'll add to that is also the idea of the ai, just going in and making PRS for you, I think is super compelling that it just says, Hey, I found all these vulnerabilities, uh, patch man.[01:01:58] Yeah, yeah. We, we got a cell company doing it, so Hello. Yeah, I'll let you know more. Deedee, thank you so much for coming on. I think to wrap it up, um, is there any parting thoughts, kind of like one thing that you want everyone to take away about AI and the impact this kind of have?[01:02:14] Yeah, I think my, my parting thought is I have always been a big fan of people of bridging the gap between research and the end consumer.[01:02:24] And I think this is just a great time to be alive where. If you are interested in AI or if you're even remotely interested, of course you can go build stuff. Of course you can read about it. But I think it's so cool that you sh you can just go read the paper and read the raw things that people did to make this happen.[01:02:42] And I really encourage people to go and read research, follow people on YouTube who are explaining this. Andre Kapai has a great channel where he also explains it. It's just a great time to learn in this space and I would really encourage more people to go and and read the actual stuff. It's really cool.[01:03:01] Thank you[01:03:01] so much, Didi, for coming on. It was a great chat. Um, where can people follow you on Twitter? Any other thing you wanna[01:03:08] plug? I think Twitter is fine. And there's a link to my website from my Twitter too. It's my first name, debark underscore das is my Twitter and dego.com is my website. But you can also just Google DB das and you will find both of those links.[01:03:25] Awesome. All right. Thank you so much.[01:03:27] Thank you. Thanks guys. This is a public episode. If you'd like to discuss this with other subscribers or get access to bonus episodes, visit www.latent.space/subscribe
Transcript
Discussion (0)
Hey, everyone. Welcome to the Latinspace podcast. This is Alessio, partner, and CTO and residents and Decibel Partners. I'm joined by my co-host, Swix, writer and editor of Layden Space.
Yeah, awesome. And today we have a special guest. It's Didi, Das from Glean. Do you go by Didi or Debargia?
I go by Didi. Okay. It's a little bit easier for the rest of us to spell out. And so what we typically do is I'll introduce you based on your LinkedIn profile and that you can fill in what's not on your LinkedIn. So,
You graduated your bachelor's and master's in CS from Cornell.
Then you worked at Facebook and then Google on search, specifically search,
and also leading the sports team focusing on cricket.
That's something that we can dive into.
And then you moved over to Gleen, which is now a search unicorn in building intelligent search for the workplace.
What's not on your LinkedIn that people should know about you?
Firstly, guys, it's a pleasure.
Pleasure to be here.
Thank you so much for having me.
What's not on my LinkedIn is probably everything that's non-professional.
I think the biggest ones are, I'm a huge movie buff and I love reading.
So I think I get through, usually I like to get through 10 books-ish a year,
but I hate people who count books.
So I should say that a number.
And increasingly, I don't like reading nonfiction books.
I actually do prefer reading fiction books purely for pleasure and entertainment.
I think that's the biggest omission from my LinkedIn.
What's something that caught your eye for fiction stuff that you would recommend people?
Oh, I recently started reading the three-body problem, and I finished it, and it's a three-part series.
And, well, my controversial take is I did not really enjoy the second part, and so I just stopped.
But the first book was phenomenal.
Great concept.
I didn't know you could write alien fiction with physics so well.
And Chinese literature in particular has a very different cadence to it than Western literature.
It's very less about the, let's describe people and what they're all.
all about and their likes and dislikes and it's like, here's a person, he's a professor of physics,
that's all you need to know about him. Let's continue with the story. And I enjoyed. It's a very
different style from what I'm used to reading. Yeah, I heard it very highly recommended. I think
it's being adapted to a TV show, so looking forward to that. So you spend now almost four years
at Glean. The company is now a unicorn, but you were on the founding team and LLMs and text interfaces
are all the rage now, but you were building this before it was.
school, so to speak. Maybe tell us more about the story, how it became and some of the technological
advances you've seen, because I think the company started really close to some of the early
GBT models. So you've seen a lot of it from day one. Yeah. Well, the first thing I'll say is
glean was never started to be a technical product looking for a solution. We were always wanted to
solve a very critical problem first that we saw not only in the companies that we'd worked in before,
but in all of the companies that a lot of the founding team had been in past their time at Google.
So Google has a really neat tool that already kind of does this internally.
It's called MoMA.
And MoMA sort of indexes everything that you use inside Google because they have first party API access,
who has permissions to what document and what documents exist.
And they rank them with their internal search tool.
It's one of those things where when you're at Google, you sort of take it for granted.
but when you leave and go anywhere else, you're like, oh, my God, how do I function without being able to find things that I've worked on?
Like, oh, I remember this guy had a presentation that he made three meetings ago, and I don't remember anything about it.
I don't know where he shared it.
I don't know if he shared it, but I do know that it was something about X.
And I kind of want to find that now.
So that's the core information retrieval problem that we had set out to tackle.
And we realized when we started looking at this problem that,
Enterprise search is actually, it's not new.
People have been trying to tackle enterprise search for decades.
Pre-2000s, people have been trying to build these on-prem enterprise search systems.
But one thing that has really allowed us to build it well is, A, you now have, well, you have distributed elastic,
so that really helps you do a lot of the heavy lifting on core infra.
But B, you also now have API support that's really nuanced on all of the SaaS apps that you use.
So back in the day, it was really difficult to integrate with a messaging app.
They didn't have an API.
It didn't have any way to sort of get the permissions information and get the messaging
information.
But now a lot of SaaS apps have really robust APIs that really let you index everything that
you'd want.
So that's two.
And the third sort of big macro reason why it's happening now and why we're able to do it
well is the fact that the SaaS apps have just exploded.
Like every company uses, you know, 10 to 100.
apps. And so just the urgent need for information, especially with, you know, remote work and
work from home, it's just so critical that people expect this almost as a default that you should
have in your company. And a lot of our customers just say, hey, I don't, I can't go back to a
life without internal search. And I think, we think that's just how it should be. So that's
kind of the story about how Glean was founded. And a lot of the LLM stuff, it's neat that a lot of that's
happening at the same time that we are trying to solve this problem because it's definitely
applicable to the problem we're trying to solve.
And I'm really excited by some of the stuff that we're able to do with it now.
I was talking with somebody last week, and they were saying the last couple of years
we're going from the web used to be syntax driven, you know, US SQL for information retrieval,
going into a semantics driven where the syntax is not as important.
It's like how you actually explain the question.
And we just asked Sarah from CKI on the previous episode.
And instead of doing natural language and,
Things like that for enterprise knowledge is more for business use cases.
So I'm curious to see, you know, the enterprise of the future, what that looks like.
You know, is there going to be way less drop downs and kind of like SQL queries and stuff like that?
And it's more this virtual, almost like person that embodies the company that is like an LLM in a way.
But how do you do that without being able to surface all the knowledge that people have in the organization?
So something like Lean is super useful for that.
Yeah, I mean, already today we see these natural language queries as well.
I will say at this point, it's still a small fraction of the queries you see.
A lot of the queries are, hey, what is just a name of a project or an acronym or a name of a person or someone you're looking for?
Yeah, I think actually the Glean website explains Gleens features very well.
When I can follow the video.
Actually, video wasn't that informative.
Video is more like a marketing video.
But the actual website was showing screenshots of what you see there.
In my language, it's an employee portal that happens to have search because you also surface
collections, which proactively show me things without me searching anything, right?
Like, you even have Go links.
You should copy it, I think, from Google, right?
Which, like, it's basically, you know, in my mind, it's like, this is ex-googler is missing Google
internal stuff.
So they just built it for everyone else.
Well, I can comment on that.
So, A, I should just plug that we have a new website as of today.
I don't know how it's received.
I saw yesterday.
So let me know.
I think today we just launched.
I don't know when we launched the new one.
I think today.
Yeah, yeah.
It's new.
I opened it right now.
It's different than yesterday.
Okay.
It's today.
And yeah, so one thing that we find is that search in itself, this is actually, I think,
quite a big insight.
Search in itself is not a compelling enough use case to keep people drawn to your product.
It's easy to say Google search is like that.
But Google search was also in an era where that was the only website people knew.
new. And now it's not like that. When you're a new tool that's coming into a company, you can't sit on your high horse and say, yeah, of course you're going to use my tool to search. No, they're not going to remember who you are. They're going to use it once and completely forget. To really get that retention, you need to sort of go from being just a search engine to exactly what you said, Sean, to being sort of an employee portal that does much more than that. And yeah, the Go Links thing, I mean, yes, it is copied from Google.
I will say there's a complete other startup called go links.io that has also copied it from Google.
And everyone misses Go Links.
It's very useful to be able to write a document and just be like, go to go slash this.
And that's where the document is.
And so we have built a big feature set around it.
I think one of the critical ones that I will call out is the feed.
Just being able to see not just so documents that are trending in your suborganization documents
that we think you should see are a limited set of them,
as well as now we've launched something called mentions,
which is super useful,
which is all of your tags across all of your apps in one place,
in the last whatever time.
So it's like all of the 100 Slack pings that you have,
plus the Gira pings, plus the email,
all of that in one place is super useful to have.
Do you do GitHub?
Yeah, we do GitHub too.
We do get up to.
All of the mentions.
Oh, my God.
That's amazing.
I didn't know you had it,
but this is something I wish for myself.
It's amazing.
It's still a little buggy right now,
but I think it's pretty good,
and we're going to make it a lot better as we go.
This is not in our preset list of questions,
but I have one follow-up,
which is, you know,
I've worked in quite a few startups now
that don't have employee portals,
and I've worked at Amazon,
which had an employee portal,
but it wasn't as beautiful
or as smart as glean.
Why isn't this a bigger norm in all companies?
Well, there's several reasons.
I would say one reason is just the,
dynamics of how enterprise sales happens is, I wouldn't say broken. It is what it is,
but it doesn't always cater to employees being happy with the best tools. What it does cater
to is there's different incentive structures, right? So if I'm an IT buyer, I have a budget,
and I need to understand that for a hundred of these tools that are pitched me all the time,
which ones really help the company? And the way usually those things are evaluated is,
does it increase revenue and does it cut cost? Those are the two biggest ones. And for a software like
Glean or a search portal or employee portal, it's actually quite difficult when you're in
generally bucketed in the space of productivity to say, hey, here's a compelling use case for
why we will cut your cost or increase your revenue. It's just a softer argument that you have
to make there. It's just a fundamental nature of the problem versus if you say, hey, we're a customer
support tool. Everyone in SaaS knows that customer support tools is just sort of the
last thing that you go to when you're looking for ideas because it's easy to sell.
It's like, here's a metric. How many tickets can your customer support agent resolve?
We've built a thing that makes it 20% better. That means it's $100,000 cost savings. Pay us 50K,
call it a deal. That's a good argument. That's a very simple, easy to understand argument.
It's very difficult to make that argument with search, which you're like, okay, you're going to
see about 10 to 20 searches. That's going to say,
about this much time a day and that results in this much employee productivity, people just
don't buy it as easily.
So the first reaction is, oh, we work fine without it.
Why do we need this now?
It's not like the company didn't work without this tool.
And only when they have it, do they realize what they were missing out on.
So it's a difficult thing to sell in some ways.
So even though the product is, in my opinion, fantastic, sometimes the buyer isn't easily
convinced because it doesn't increase revenue or cut cost.
In terms of technology, can you maybe talk about some of the stack?
And you see a lot of companies coming up now saying, oh, we help you do enterprise search.
And it's usually, you know, embeddings to then do context for like an LLM query, mostly.
I'm guessing you started as like closer to like the vector side of thing.
Maybe, yeah, talk a bit about that and some learnings you've had and as founders try to build products like
this internally. What should they think about? Yeah. So actually leading back from the last answer,
one of the ways a lot of companies who are in the enterprise search space are trying to tackle the
problem of sales is to lean into how advanced the technology is, which is useful. It's useful to say
we're AI powered, LLM powered, vector search, cutting edge, state of the art, yada, yada, put in all
your buzzwords. That's nice. But the question is, how often does that translate to better user
experience is sort of a fuzzy area where it's really hard for even users to tell, to be honest.
You can have one or two great queries and one really bad query and be like, I don't know if this
thing is smart.
And it takes time to evaluate and understand how a search engine is doing.
So to that, I think one of the things that we learned from Google, a lot of us come from
an ex-Google search background.
And one of the key learnings is often with search, it's not about how advanced or how complex
the technology is. It's about the rigor and intellectual honesty that you put into tuning the
ranking algorithm. That's a painstaking long-term and slow process. At Google, until I would say
maybe 2017, 2018, everything was run off of almost no real AI, so to speak. It was just
information retrieval at its core, very basic from the 70s 80s, and a bunch of these ranking
components that are stacked on top of it that do various tasks really, really well.
So one task in search is query understanding.
What does the query mean?
One task is synonymy.
What are other synonyms for this thing that we can also match on?
One task is document understanding.
Is this document itself a high-quality document or not?
Or is it some sort of SEO spam?
And admittedly, Google doesn't do so well on that anymore.
But there's so many tough sub-problems that it breaks search down into and then just
gets each of those problems right to create a nice experience.
So to answer your question also, vector search, we do, but it is not the only way we get results.
We do a hybrid approach, both using, you know, core IR signal, synonymy, query accentuation with
things like acronym expansion, as well as stuff like vector search, which is also useful.
And then we apply our level of ranking understanding on top of that, which includes personalization,
understanding if you're an engineer, you're probably not looking for Salesforce documents.
You know, you're probably looking for documents that are published or co-authored by people in your
team, in your immediate team.
And our understanding of all of your interactions with people around you, our personalization
layer, our good work on ranking is what makes us good.
It's not sort of, hey, drop in LLMs and embeddings and we become amazing at search.
That's not how we think it works.
Yeah, I think there's a lot of polish that makes in.
to quality products, and that's the difference that you see between Hacker News demos and
glean, which is actual search and chat unicorn.
But also, is there a glean chat coming?
What do you think about the chat form factor?
I can't say anything about it, but I think that we are, my politically correct answer is,
we're experimenting with many technologies that use modern AI and LLMs,
and we will launch what we think users like best.
Nice. You got some media training.
Yeah, very well-handed.
We can move off of Glean and just go into Google search.
So you worked on search for four years.
I've always wanted to ask what happens when I type something into Google.
I feel like you know more than others, and obviously there's the things you cannot say.
But I'm sure Google does a lot of the things that Glean does as well.
How do you think about this Google versus chat GPT debate?
Let's maybe start at a high level based on what you see out there.
I think you see a lot of misconceptions.
Yeah.
So, okay, let me start with Google versus.
a chat GPT. First, I think it's disingenuous if I don't say my own usage pattern, which is I almost
don't go back to Google for a large section of my queries anymore. I just use chat GPT. I am a paying
plus subscriber and it's sort of my go-to for a lot of things that I ask. And I also have to
train my mind to realize that there's a whole set of questions in your head that you never realize
the internet could answer for you. And that now you're like, oh, wait, I could actually ask this.
And then you ask it. So that's my.
current usage pattern. That being said, I don't think that chat GPT is the best interface or
technology for all sets of queries. I think humans are obviously very easily excited by new technology,
but new technology does not always mean the previous technology was worse. The previous
technology is actually really good for a lot of things. And for search in particular,
if you think about all the queries that come into Google search, they fall into.
various kinds of query classes, depending on whatever taxonomy you want to use.
But one sort of way of understanding broad, generally, the query classes is something that is
information seeking or exploratory. And for information, for exploratory queries, I think
there are uses where Google does really well. Like, for example, let's say you want to just know
a list of songs of this artist in this year.
Google will probably be able to at 100% tell you that pretty accurately all the time.
Or if you want to say, understand like what showtimes of movies came out today.
So fresh queries, another query class.
Google will be really good at that.
Chat Chiquete, not so good as that.
But if you look at information seeking queries, you could even argue that if I ask for
information about Donald Trump, you know,
maybe chat GPT will spit out a reasonable sounding paragraph and it makes sense,
but it doesn't give me enough stuff to like click on and go to and navigate to and a news article
here. And I just kind of want to see a lot of stuff happening. So if you really break down
the problem, I think it's not as easy as saying chat GPT is a silver bullet for every kind
of information need. There's a lot of information needs, especially for tail queries. So for long
unforeseen queries like, hey, tell me the cheat code in Doom 3, this level, this boss.
chat chit is going to blow it out of the water on those kind of queries because it's going to figure out
all of these from these random sparse documents and random Reddit threads and assemble one consistent
answer for you where it takes forever to find this kind of stuff on Google for me personally
coding is the biggest use case for anything technical I just go to chat GPT because parsing through
stack overflow is just too mentally taxing and I don't care about even if chat chipt
hallucinates a wrong answer. I can verify that, but I like seeing a coherent, nice answer that I can
just kind of good starting point for my research on whatever I'm trying to understand.
Yeah. Did you see the statistic that the All In guys have been saying, which is Stack Overflow
traffic is down 15%. Yeah, I did see that. Makes sense, but I don't know if it's like only because
of chat TVT, but yeah, sure. I believe it. Now, the second part was just about if some of the
enterprise product search moves out of Google. Like, you know, that's obviously
a big AdWords revenue driver?
What are some of the implications in terms of the business there?
Okay, so I would split this answer into two parts.
My first part is just talking about freshness,
because the query that you mentioned is specifically the issue there is being able to access
fresh information.
Google just blanket calls as freshness.
Today's understanding of large language models is that it cannot do anything that's
highly fresh.
You just can't train these things fast enough.
and cost-efficiently enough to constantly index new sources of data and then serve it at the same
time in any way that's feasible. That might change in the future, but today it's not possible.
The best thing that you can get that's close to it is what, you know, the fancy term is
retrieval augmented generation, but it's a fancy way of saying just do the search in the background
and then use the results to create the actual response. That's what Bing does today.
So to answer the question about freshness, I would say it is possible.
to do with these methods, but those methods all involve using search in the back end to sort of
get the context to generate the answer. The second part of the answer is, okay, if you talk about
ad revenue, a lot of Google's ad revenue just comes from the fact that over the last two decades,
it's figured out how to put ad links on top of a search result page that sometimes users click on.
Now, the user behavior on a chat product is not to click on anything. You don't.
click on stuff. You just read and you move on. And that actually, in my opinion, has severe impacts
on the web ecosystem, on all of Google and all of technology, and how we use the internet in the future.
And the reason is one thing we also take for granted is that this ad revenue,
where everyone likes to say Google is bad, Google makes money off ads, yada, yada, but this ad revenue
kind of sponsored the entire internet. So you have Google Maps and Google Search and photo,
and drive and all of this great free stuff,
basically because of ads.
Now when you have this new interface,
sure, it comes with some benefits,
but if users aren't going to click on ads
and you replace the search interface with just chat,
that can actually be pretty dangerous
in terms of what it even means
to have to create a website.
Why would I create a website if no one's going to come to my website?
If it's just going to be used to train a model
and then someone's going to spit out whatever my website says,
then there's no incentive.
And that kind of dwindles the web ecosystem in the end.
It means less ad revenue.
And then the other existential question is, okay, I'm okay with saying the incumbent Google gets
defeated and there's this new hero, which is, I don't know, Open AI and Microsoft now reinvent
the wheel.
All of that stuff is great.
But how are they going to make money?
They can make money off, I guess, subscriptions.
But subscriptions is not nearly going to make you enough money to replace what you can make
on ad revenue, even for Bing today.
Bing makes $11 billion.
of ad revenue. It's not a sidi product. It's a huge product. And they're not going to make
$11 billion of subscriptions, I'll tell you that. So even they can't really replace search with
this with chat. And then there are some arguments around, okay, what if you start to inject ads
in textual form? But, you know, in my view, if the natural user inclination is not to click on
something with chat, they're clearly not going to click on something, no matter how much you try to
inject click targets into your result. So that's my long answer to the ads question. I don't
really know. I just smell danger in the horizon. You mentioned the information augmented generation as
well. I presumably that is literally Bing is probably just using the long context of GPT4 and taking
the full text of all the links that they find, dumping it in, and then,
generating some answer. Do you think like speed is a concern or people are just people willing to
wait for smarter answers? I think it's a concern. We notice that every single product I've worked on,
there's almost a linear, at least for some section of it, a very linear curve, a linear line
that says the more the latency, the less the engagement. So there's always going to be some drop-off.
So it is a concern. But with things like latency, I just kind of presume that time solves these
things, you optimize stuff, you make things a little better, and the latency will get down with
time. And it's a good time to even mention that Bard, we just came out today, Google's LLM, for
Google's equivalent, I haven't tried it, but I've been reading about it. And that's based
off a model called Lambda. And Lambda intrinsically actually does that. So it does query what they
call a tool set. And they query search or a calculator or a compiler or a translator, things that are good at
factual deterministic information and then it keeps changing its response depending on the feedback
from the tool set effectively doing something very similar to what Bing does.
But I like their framing of the problem where it's just not just search.
It's any given set of tools, which is similar to what a Facebook paper called toolformer,
where you can think of language as one aspect of the problem and language interfaces with
computation, which is another aspect of the problem.
And if you can separate those two, this one just talks to these things and figures out how to phrase it.
So it's not really coming up with the answer.
Their claim is like GPT4, for example, the reason it's able to do factual accuracy without search is just by memorizing the facts.
And that doesn't scale.
It's literally somewhere in the whole model.
It knows that the CEO of Tesla is Elon Musk.
It just knows that.
But it doesn't know that this is a computation.
It just knows that usually I see CEO, Tesla, Elon.
That's all it knows.
So the abstraction of language model to computational unit or toolset is an interesting one that I think is going to be more explored by all of these engines.
And the latency, you know, it'll get back.
I think you're focusing on the right things there.
I actually saw another article this morning about the memorization capability, you know, how GPT4 is a lot of marketed on its ability to answer SAT questions and GRE questions and bar exams.
And, you know, we covered this in our benchmarks podcast, Alessio, but like I forgot to mention that all these answers are out there and we're probably memorized.
And if you change them just just a little bit, the model performance will probably drop a lot.
It's true. I think the most compelling proof of that of what you just said is the code forces one where somebody I think tweeted, tweeted about the, yeah, the 2021.
Everything before 2021, it solves, everything after it doesn't. And I thought that was interesting.
It's just dumb. I'm interested in two-former.
interested in React type patterns, Zapier just launched a natural language integration with
LangChain. Are you able to compare and contrast what approaches you like when it comes to
LLMs using tools? I think it's not boiled down to a science enough for me to say anything that's
useful. I think everyone is at a point of time where they're just playing with it. There's no way to
reason about what LLMs can and can't do. And most people are just throwing things at a wall
and seeing what sticks.
And if anyone claims to be doing better,
they're probably lying
because no one knows how these things behaves.
You can't predict what the output is going to be.
You just think, okay, let's see if this works,
this is my prompt,
and then you measure it.
And you're like, oh, that worked versus the Stint.
And things like React and tool form are really cool,
but those are just examples of things
that people have thrown at a wall that stuck well.
I mean, it's provably it works.
It works pretty well.
I will say that one of the things
it's not really of the framing of what kind of ways can you use LLMs to make it do cool things.
But people forget when they're looking at cutting edge stuff is a lot of these LLMs can be used to generate synthetic data to bootstrap smaller models.
And it's a less sexy space of it all.
But I think that stuff is really, really cool.
Where, for example, I want to tag entities in a sentence.
That's a very simple classical natural language problem of,
NER. And what I do is I just, before I had to gather training data, train model, tune model,
all of this other stuff. Now what I can do is I can throw GPT4 at it to generate a ton of synthetic
data, which looks actually really good. And then I can either just train whatever model I wanted
to train before on this data, or I can use something called like low rank adaptation,
which is distilling this large model into a much smaller, cost-effective, fast model.
model that does that task really well. And in terms of productionizable natural language systems,
that is amazing. This is stuff you couldn't do before. You would have teams working for years to
solve NER. And that's just what that team does. And there's a great Reddit viral thread about
are all the NLP teams at Big Tech doomed? And yeah, to an extent now you can do this stuff
in weeks, which is huge. What about some of the utter kind of like,
AI native search things like perplexity,
illicit, have you played with any of them?
Any thoughts on it?
Yeah, I have played with perplexity and NIV.
Everyone, I think both of those products sort of try to do, again,
search results synthesis.
Personally, I think perplexity might be doing something else now,
but I don't see any of those companies or products
are disrupting either OpenAI or ChatGPT or Google Bing, whatever, prominent search engines
with what they do because they're all built off basically the Bing API or their own version
of an index and their search itself is not good enough.
And there's not a compelling use case enough, I think, to use those products.
I don't know how they would make money.
A lot of NIVA's way of making money as subscriptions.
Perplexity, I don't think, has ever turned on the revenue dial.
I just have more existential concerns about those products actually functioning in the
long run. So I think I see them as they're nice. They're nice to play with. It's cool to see the
cutting edge innovation, but I don't really understand if they will be long lasting,
widely used products. Do you have any idea of what it might take to actually do like a new
kind of like type of company in the space? Like Google's big thing was like page rank, right?
That was like one thing that kind of set them apart. Like people tried doing search before.
like do you have an intuition for what like the LLM native page rank thing is going to be to make
something like this exist or we kind of you know hit the plateau when it comes to search innovation
so I talk to so many of my friends who are obviously excited about this technology as well and
many of them who are starting LLM companies you know how many companies in the YC batch of you know
winter 23 are LM companies crazy half of them right it's it's ridiculous but what I always
I think everyone's struggling with this problem is what is your advantage?
What is your moat?
I don't see it for a lot of these companies.
And it's unclear.
I don't have a strong intuition.
My sense is that the people who focus on problem first usually get much further than the people who focus solution first.
And there's way too many companies that are solutions first, which makes sense.
It's always been a big Achilles heel of the Silicon Valley.
We're a bunch of nerds that live in a whole different.
dimension, which nobody else can relate to. But nobody else, the problem is nobody else can
relate to them. And we can't relate to their problems either. So we look at tech first, not problem first,
a lot. And I see a lot of companies just do that, where I'll tell you one, this is quite an
entertaining to me. A very common theme is, hey, LMs are cool. That's awesome. We should build
something. Well, what should we build? And it's like, okay, consumer. Consumer is cool. We should
build consumer. Then it's like, ah, nah, man, consumers, consumer is pretty hard. It's going to be a
clubhouse, it's going to blow up. I don't want to blow up. I just want to build something that's like,
you know, pretty easy to be consistent with. We should go enterprise. Cool, let's go enterprise.
So you go enterprise. It's like, okay, we brought LMs to the enterprise now. What problem do we
tackle? And it's like, okay, well, we can do Q&A on documents. People know how to do that, right?
We've seen a couple of demos on that. So they build it. They build Q&A on documents. And then they struggle
with selling or people just ask, hey, but I don't ask questions to my documents.
Like, you realize this is just not a flow that I do.
Like, I ask questions in general, but I don't ask them to my documents.
And also, like, what documents can you ask questions to?
And they'll be like, well, any of them is, they'll say, can I ask them to all of my documents?
And they'll be like, well, sure, if you give them, give us all your documents, you can ask
anything.
And then they'll say, okay, how will you take all my documents?
oh, it seems like we have to build some sort of indexing mechanism.
And then from one thing to the other, you get to a point where it's like,
we're building enterprise search and we're building an LM on top of it.
And that is our product.
Or you go to like MLOps and I'm going to help you host models.
I'm going to help you train models.
And I don't know.
It seems very solution first and not problem first.
So the only thing I would recommend is if you think about the actual problems and talk to users
and understand what this can be useful for,
it doesn't have to be that sexy of how it's used,
but if it works and solves the problem,
you've done your job.
I love that whole evolution,
because I think quite a few companies
are independently finding this path
and going down this route to build a glorified search spot.
We actually interviewed a very problem-focused builder,
Mickey Friedman,
who's very, very focused on products placement image generation.
And she's not focused on anything else
in terms of image generation,
just focus on product placement and branding.
And I think that's probably the right approach.
And if you think about like Jasper, right,
like they're out of all the other GPT3 companies
when GPT3 first came out,
they built focusing on, you know, writers on Facebook.
You know, it didn't even market it on Twitter.
So like most people that haven't heard of them.
I think it's a timeless startup lesson,
but it's something to remind people
when they're building with language models.
I mean, as an investor, like, you know,
you're an investor, you're a scout with me.
doesn't that make it hard to invest in anything?
Because mostly it's just like the incumbents
will get the innovation faster than startups will find traction.
This is going to be a hard take too, but...
Okay.
My investing with people, especially early,
is often for me governed by my intuition
of how they approach the problem
and their experience with the technology.
and pretty much solely that I don't really pretend to be an expert in the industry or the space.
That's their problem.
If I think they're smart and they understand this space better than me,
then I'm mostly convinced as if they've thought through enough of the business stuff,
if they've thought through the market and everything else,
I'm convinced.
I typically stray away from, you know,
just what I just said, founders who are like,
LMs are cool and we should build something with them.
That's not like usually very convincing to me.
But I don't concern myself too much with pretending to understand what this space means.
I trust them to do that.
If I'm convinced that they're smart and they've thought about it well,
then I'm pretty convinced that they're a good person to have it back.
Cool.
Kind of like super novel idea that you want to shout out.
There's a lot of interesting explorations going on.
I'll preface this with anything in enterprise.
I just don't think is cool.
It's like including,
like it's just,
you can't call it cool,
man.
You're building products for business.
Green is pretty cool.
I'm impressed by Gleet.
This is what I'm saying.
It's cool for the Silicon Valley.
It's not cool.
Like,
you're not going to go to a dinner party
with your parents and be like,
hey, mom,
I work on enterprise search.
Isn't that awesome?
And they're not going to say,
all my notifications in one place.
Whoa.
So I will,
I'll start by saying for me in my head,
cool means like the world finds this amazing.
and it has to be somewhat consumer.
And I do think that the ideas that are being played with,
like Quora is playing with Po.
It's kind of strange to think about and may not stick as is.
But I like that they're approaching it with a very different framing,
which is, hey, how about you talk to this chatbot,
but let's move out of this world where everyone's,
it's like it's not WhatsApp or telegram.
It's not a messaging app.
you're actually generating some piece of content that now everybody can make you use of.
And is there something there?
Not clear yet, but it's an interesting idea.
I can see that being something where people just learn or see cool things that GPT4 has said or chatbots have said.
That's interesting.
In the image space, very contrasted to the language space, there's so much.
I don't even begin to understand the image space.
Everything I see is just like blows my mind.
I don't know how Mid-Journey gets from six fingers to five.
fingers. I don't understand this. It's amazing. I love it. I don't understand what the value is in
terms of revenue. I don't know where the markets are in an image, but I do think that's way,
way cooler because that's a demo where, and I tried this, I showed GPT4 to my mom. And my mom's like,
yeah, this is pretty cool. It does some pretty interesting stuff. And then I showed the image one.
And she is just like, this is unbelievable. There's no way a computer.
could write, do this. And she just could not digest it. And I love when you see those interactions.
So I do think image world is a whole different beast. And in terms of coolness, a lot more
cool stuff happening in image video. Multimodal, I think is really, really cool. I haven't seen
too many startups that are doing something where I'm like, wow, that's, that's amazing.
Oh, 11 labs. I'll mention 11 labs is pretty cool. They're the only ones that I know that are doing
Oh, the voice synthesis. Have you tried it?
I've only played with it. I haven't really tried generating my own voice,
but I've seen some examples and it looks really, really awesome.
I've heard the script is coming up with some stuff as well to compete,
because, yeah, this is definitely the next frontier in terms of podcasting.
One last thing I will say on the cool front is I think there is something to be said about
a product that brings together all these disparate advancements in AI.
and I have a view on what that looks like.
I don't know if everyone shares that view,
but if you bring together image generation, voice recognition,
language modeling, TTS,
and like all of the other image stuff they can do with like clip and dream booth
and putting someone's actual face in it,
what you can actually make, this is my view of it,
is the Harry Potter picture come to life,
where you actually have just a digital stand
where there's a person who's just capable of talking to you
in their voice, in understandable dialogue that is how they speak.
And you could just sort of walk by, they'll look at you, you can say hi, they'll be, say,
hi back, they'll start talking to you, you start talking back to it.
That's sort of my, that's my wild science fiction dream.
And I think the technology exists to put all of those pieces together.
And the implications for people who are older or saving people over time are huge.
It could be a really cool thing to productionize.
There's one more part of you that also tweets about numbers and math,
AI math, essentially, is how I'm thinking about it.
What gets you into talking about costs and math
and just like first principles of how to think about language models?
One of my biggest beefs with big companies
is how they abstract the cost away from all the engineers.
So when you're working on a Google search,
I can't tell you a single number that is cost related at all.
I just don't know the cost numbers.
It's so far down the chain that I have no clue how much it actually costs to run search
and how much these various things costs aside from what the public knows.
And I found that very annoying because when you are building a startup,
particularly maybe an enterprise startup, you have to be extremely cognizant about the cost
because that's your unit economics.
Like your primary cost is the money you spend on infrastructure, not your actual labor
costs. The whole thesis is the labor doesn't scale, but the infra does scale. So you need to
understand how your infra costs scale. So when it comes to language models, given that these
things are so compute heavy, but none of the papers talk about cost either. And it's just
bothers me. I'm like, why can't you just tell me how much it cost you to build this thing?
It's not that hard to say. And it's also not that hard to figure out. They give you everything else,
which is how many TPUs it took and how long they trained it for and all of that other stuff, but they don't tell you the cost.
So I've always been curious because all everybody ever says is it's expensive and a startup can't do it and an individual can't do it.
So then the natural question is, okay, how expensive is it?
And that's sort of the background behind why I started doing some more AI math and one of the tweets that probably the one that you're talking about is,
where I compare the cost of Lama,
which is Facebook's LLM to palm,
with my best estimates.
And the only thing I'll add to that is
it is quite tricky to even talk about these things publicly
because you get rammed in the comments
because why people who are like,
oh, don't you know that this assumption that you made
is completely BS because you should have taken this cost per hour
because obviously people do bulk deals.
And yeah, I have 280 characters.
this is what I could have said.
But I think ballpark, I think I got close.
I'd like to imagine.
I think I was off maybe by 2x on the lower side.
I think I took an upper bound and I might have been off by 2x.
So my quote was $4 million for Lama and 27 for Palm.
In fact, later today I'm going to do one on Bard.
Oh, oh, the exclusive is that it's $4 million for Bard too.
Nice, nice.
Which is like, do you think that's like, don't you think?
think that's actually not a lot. It's a drop in the bucket for these guys.
And one of the valuable things to note when you're talking about this cost is this is the cost
of the final training step. It's not the cost of the entire process. And a common
rebuttal is, well, yeah, this is your cost of the final training process. But in total,
it's about 10x this amount of cost because you have to experiment. You have to tune hyperparamers.
You have to understand different architectures. You have to experiment with different kinds of training
data and sometimes you just screw it up and you don't know why and you have to spend a lot of time
figuring out why you screwed it up. And that's where the actual cost buildup happens,
not in the one final last step where you actually train the final model. So even assuming
like a 10x on top of this, I think is fair for how much it would actually cost a startup to build
this from scratch, I would say. How do you think about open source in this then? I think a lot of
people's big 2023 predictions are an LLM, you know, open source LLM that has comparable performance
to the GPT3 model.
Who folds the bill for the mistakes, you know?
Like when somebody opens a poor request that is not good, it doesn't really cost people
much outside of like a GitHub actions run.
As people try and train these things separately, like do you think open source is actually
bad because you're wasting so much compute by so many people trying to like do
their own things and like do you think it's better to have a centralized team that organizes
these experiments or yeah any thoughts there i have some thoughts i think the most easy comparison
to make is to image generation world where you know you had mid journey and dali come out first
and then you had imad come out with stability which was completely open source but the difference
there is i think stability you can pretty much run on your machine and it's okay it works pretty
fast. So the entire concept of open sourcing, it worked. And people made forks that fine-tuned it on a
bunch of different random things. And it made variants of stability that could do a bunch of things.
So I thought the stability thing, agnostic of the general ethical concerns of training on
everyone's art, I thought it was a cool, cool addition to the sort of trade-offs in different models
that you can have in image generation. For text generation, we're seeing an equivalent effect with Lama
and alpaca, which Lama being Facebook's model, which they didn't really open source,
but then the weights got leaked, and then people cloned them,
and then they tuned them using GPT4 generated synthetic data and made alpaca.
So the version, I think, that's out there is only the $7 billion one.
And then this crazy European C++ god came and said, you know what,
I'm going to write this entire thing in C++, so you can actually run it locally
and not have to buy GPUs and a combination of those.
And of course, a lot of people have done work
and optimizing these things to make it actually function quickly
and we can get into details there.
But a function of all of these things
has enabled people to actually run semi-good models on their computer.
I don't have that much.
I don't have any comments on, you know, energy usage and all of that.
I don't really have an opinion on that.
I think the fact that you can run a local version of this
is just really, really cool.
but also supremely dangerous because with images, conceivably, people can tell what's fake and what's real,
even though there's some concerns there as well.
But for text, it's, you know, like you can do a lot of really bad things with your own, you know,
text generation algorithm.
You know, if I wanted to make somebody's life hell, I could spam them in the most insidious ways
with all sorts of different kinds of text generation indefinitely, which I can't really
with images. I don't know. I find it somewhat ethically problematic in terms of the power is too much
for an individual to wield. But there are some libertarians who are like, yeah, why should only open
AI have this power? I want this power too. So there's merits to both sides of the argument.
I think it's generally good for the ecosystem. Generally, it will get faster and the latency will
get better. And the models may not ever reach the size of the cutting edge that's possible.
but it could be good enough to do 80% of the things
that bigger model could do.
And I think that's a really good start for innovation.
I mean, you could just have people come up with stuff
instead of companies,
and that always unlocks a whole vector of innovation
that didn't previously exist.
That was a really good conclusion.
I want to ask follow-up questions,
but also that was a really good place to go to end it.
Was there any other sort of AI topics that you wanted to touch on?
I think Runway ML is.
the one company I didn't mention, and that one's one to look out for, I think, doing really
cool stuff in terms of video editing with generative techniques. So people often talk about
the open AI and the Googles of the world and anthropic and claw and coherer and bid journey,
all the image stuff, but I think the places that people aren't paying enough attention to
that will get a lot more love in the next couple of years is better whisper, so better streaming voice
recognition, better TTS, so some open source version of 11 labs that people can start using.
And then the frontier is sort of multimodality and videos.
Can you do anything with videos?
Can you edit videos?
Can you stitch things together into videos from images?
All sorts of different cool stuff.
And then there's sort of the long tail of companies like Luma that are working on like
3D modeling with generative use cases and taking an image and creating a 3D model from
nothing.
And that's pretty cool too, although the practical use cases to me are a little less clear.
So that kind of covers the entire space in my head, at least.
I like using the Harry Potter image, like the moving speaking images as an end goal.
I think that's something that consumers can really get behind as well.
Super cool.
Okay, so to double back a little bit before we go into the lightning round,
I have one more thing, which is relevant to your personal story,
but then also relevant to our debate, which is a nice blend.
You're concerned about the safety of everyone having access to language models and the potential harm that you can do there.
My guess is that you're also not that positive on watermarking techniques from in terms of languages, right?
Like maybe randomly sprinkling weird characters so that people can see that this is generated by an AI model.
But also, like, you have some personal experience with this because you found manipulation in the Indian exam board,
which maybe might be a similar story.
I don't know if you have any thoughts
about just watermarking, manipulation,
like ethical deployments of generated data.
Well, I think those two things are a little separate.
One, I would say is for watermarking text data,
there are a couple of different approaches.
I think there is actual value to that
because from a pure technical perspective,
you don't want models to train on stuff they've generated.
That's kind of bad for models.
Yes.
And two is, obviously, you don't want people to keep using chat GPT for, I don't know if you want this, to use it for all their assignments and never be caught.
Maybe you do want it.
Maybe you don't.
But it seems like it's valuable to at least understand that this is a machine generated text versus not.
Just ethically, that seems like something that should exist.
So I do think watermarking is net a good direction of research and I'm fairly positive on it.
I actually do think people should standardize how that watermarking works across language models.
so that everyone can detect and understand language models
and not just Open AI does its own models,
but not the other ones and so on.
So that's my view on that.
And then sort of transitioning into the exam data,
this is really old one,
but it's one of my favorite things to talk about.
I love it.
In America, as you know,
usually the way it works is you take your SAT exam,
you take a couple of APs,
you do your school grades,
you apply to colleges,
you do a bunch of fluff,
you try to prove how you're good at everything,
and then you apply to colleges,
and then it's a weird decision based on 100 other factors
and that they decide whether you get in or not.
But if you're rich, you're basically going to get in anyway.
And if you're a legacy, you're probably going to get in.
And it's a whole bunch of stuff going on.
And I don't think the system is necessarily bad,
but it's just really complicated.
And some of the things are weird.
In India and in a lot of the non-developed world,
people are like, yeah, okay, we can't scale that.
There's no way we can have enough people
like non-riguously evaluate this
because there's going to be too much corruption
and it's going to be terrible at the end
because people are just going to pay their way in.
So usually it works in a very simple way
where you take an exam that is standardized
and sometimes you have many exams,
sometimes you have an exam for a different subject,
sometimes it's just one for everything,
and you get ranked on that exam,
and depending on your rank,
you get to choose the quality
and the kind of thing you want to study,
which the kind of thing always surprises people in America
where it's not like,
oh, it's a glory land where you walk
in and you're like, I think this is interesting and I want to study this.
Like, no.
And most of the world, it's like, you're not smart enough to study this.
So you're probably not going to study it.
And there's like a rank order of things that you need to be smart enough to do.
So it's different.
And therefore, these exams are much more critical for the functioning of the system.
So when there's fraud, it's not like a small part of your application going wrong.
It's your entire application going wrong.
And that's why, that's just me explaining why it's severe.
Now, one such exam.
is the one that you take in school.
It's called a board exam.
You take one in the 10th grade,
which doesn't really matter for much.
And then you take one in the 12th grade
when you're about to graduate,
and that decides where you go to college
for a large set of colleges,
not all, but a large set of colleges.
And based on how much you get,
on your top five average,
you're sort of slotted into a different stream
in a different college.
And over time,
because of the competition
between two of the boards
that are a duopoly, there's no standardization.
So everyone's trying to like give more marks than the other person
to attract more students into their board.
Because that means that you can then claim,
oh, you're going to get into a better college if you take our exam
and don't go to a school that administers the other exam.
So it's, and that's, that's, everyone knew that was happening-ish,
but there was no data to back it.
But when you actually take this exam, as I did,
you start realizing that the,
The numbers, the marks make no sense because you're looking at this kid who's also in your class and you're like, dude, this guy's not smart.
How did he get a 90 in English?
He's not good at English.
You can't speak it.
You cannot give him a 90.
You gave me a 90.
How did this guy get a 90?
So everyone has like their anecdotal.
This doesn't make any sense moments with this exam.
But no one has access to the data.
So way back when what I did was I realized they have very little security surrounding the data where the.
only thing that you need to put in to get access is your role number. And so as long as you
predict the right set of rule numbers, you can get everybody's results. So unlike America,
also exam results aren't treated with a level of privacy. In India, it's very common to sort of
the entire class's results on a bulletin board and you just see how everyone did and you shame
the people who are stupid. That's just how it works. It's changed over time. But that's fundamentally
a cultural difference. And so when I scraped all these results and I published,
it and I did some analysis, what I found was a couple of very insidious things.
One is that if you plot the distribution of marks, you generally tend to see some sort of skewed
but pseudo-normal distribution where it's a big peak and it falls off on both ends.
But you see two interesting patterns, one that is just the most obvious one, which is grace marks,
which is the past grade is 33. You don't see nobody got between 20,
and 32.
Because what they did for every single exam is they just made you pass.
They just rounded up to 33, which is, okay, I'm not that concerned about whether you give
grace marks.
It's kind of messed up that you do that.
But okay, fine.
You want to pass a bunch of people who deserve to fail?
Do it.
Then the other more concerning thing was between 33 and 93, right?
That's about 60 numbers, 61 numbers.
30 of those numbers were just missing, as in nobody got 91 on the 16.
exam in any subject, in any year. How does that happen? You don't get a 91, you don't get a 93,
89, 87, 85, 84. Some numbers were just missing. And at first when I saw this, I'm like,
there's definitely some bug in my code. There's no way that like there's 91 never happened.
And so I started to remember I asked a bunch of my friends. I'm like, dude, did you ever get a 91
in anything? And they're like, no. And it just unraveled that this is obviously problematic.
because that means that they're screwing with your final marks in some way or the other.
And they're not transparent about how they do it.
Then I did the same thing for the other board.
We found something similar there, but not not the same.
The problem there was there was a huge spike at 95.
And then I realized what they were doing is they'd offer various exams.
And to standardize, they would blanket add like a raw number.
So if you took the harder math exam, everyone would get plus 10 arbitrarily no one.
This is not revealed or publicized.
It's randomly that was the harder exam.
You guys all get plus 10,
but it's capped at 95.
That's just this stupid way to standardize.
It doesn't make any sense.
They're not transparent about it.
And it affects your entire life because this is what gets you into college.
And if you add the two exams up,
this is 1.1 million kids taking it every year.
So that's a lot of people's lives that you're screwing with
by not understanding numbers and not being transparent
about how you're manipulating them.
So that was the thesis.
In my view, looking back on it,
10 years later, it's been 10 years at this point,
I think the media never did justice to it,
because to be honest, nobody understands statistics.
So over time, it became a big issue then,
and then there was a big Supreme Court or high court ruling,
which said, hey, you guys can't do this,
but there's no transparency,
so there's no way of actually ensuring that they're not doing it.
They just added a level of password protection,
so now I can't scrape it anymore.
and they probably do the same thing,
and it's probably still as bad,
but people aren't raising an issue about it.
It's really hard to make this.
People understand the significance of it
because people are so compelled
to just go lean into the narrative of exams are bullshit,
and we should never trust exams,
and this is why it's okay to be dumb,
and that's not the point.
Like the point, so I think the response was lackluster in retrospect,
but that's what I unveiled in 2013.
That's fascinating.
you know, in my finance background, the similar case happens with the made-off funds.
Because if you plot the statistical distribution of the made-off funds, you could see that
they were just not a normal distribution and therefore they would probably made-up numbers.
And we also did the same thing in my first job as a regulator in Singapore for hedge funds returns,
which is watermarking.
This is a watermark of a human or some kind of system, you know, making it up.
And statistically, if you look at the distribution, you can see like this,
violates any reasonable assumption, therefore something's wrong.
Well, I see what you mean there.
Like, in that sense, yes, that's really cool that you worked on a very similar problem.
And I agree that it's messed up.
It's a good way to catch liars.
In Madoff's case, like they actually made it a big deal.
But I don't know.
I don't see how this wasn't a bigger deal in India.
But anyway, that's a conversation for another over drinks, perhaps.
So now we're going to go into the lightning round, just to cat things off with an overview.
What are your favorite AI people and communities?
You mentioned Reddits.
Let's be specific about which subredits.
I actually don't really use Reddit that much for AI stuff.
It was just a one-off example.
Most of my learnings are Twitter.
And I think there are the obvious ones, like everyone follows Riley Goodside now.
And there's a bunch of like the really famous ones.
But I think amongst the lesser known ones, there are, let me say just my favorite one is probably AI Pub.
because it does a roundup of everybody else's stuff regularly.
I know Brian who runs AIPub as well,
and I just think I find it really useful
because often it's very hard to catch up on stuff
and this gives you the entire roundup of last two weeks,
here's what happened in AI.
Good, good, good.
And any other communities, like Slack communities, the scores,
you don't do that stuff?
I try to, but I don't because it's too time-consuming.
I prefer reading at my own pace.
Yeah, yeah, yeah.
Okay, so my learning is start a Twitter,
a weekly recap of here's what happened in the AI.
I mean, it makes sense, right?
It'll do very well.
It'll do very well.
A year from now, what do you think people will be the most surprised by in AI?
I think they're going to be surprised at how much cheaper they're able to bring out down the cost to
and how much faster that these models get.
I'm more optimistic about cost and latency more than I am about just quality improvements at this point.
I think modalities will change, but I think quality is near about like a maxima that we're going to achieve.
So this is a request for startups or a request for site projects.
What's an AI thing that you would pay for is somebody else built it?
Aside from the Harry Potter image one, which I would definitely, I would pay a lot of money to have like a floating, I don't know, Bill Clinton in my room, just saying things back to me whenever I talk to it.
That would be cool.
But in terms of other products, if somebody built.
a product that would smartly, I know many people have tried to build things like this,
that would smartly auto-respond to things that it can auto-respond to.
And for the things that are actually important, please don't auto-respond and just tell me to do it.
And that distinction, I think, is really important.
So somewhere in between the automate everything and the just-suggest-everything hybrid,
that works well, I think that would be really cool.
Yeah, I've thought about this as well.
Even if it doesn't respond for you, it can draft an answer.
for you to edit, right?
So you at least get to review.
I actually build that this morning, if you guys want it.
You just all out with Gmail, and then it pre-drafts every email in your inbox.
Really?
Yeah.
You have to change the prompt because my prompt says, like, you know, I'm a software engineer.
I'm a venture capitalist.
This is where it works, blah, blah, blah, blah, blah, blah.
But you can modify that and then it works.
It works.
Are you going to open source it?
I probably will.
But sometimes it's like it cares too much about.
about the prompt. So for example, in the prompt, I was like, if the person is asking about
scheduling, suggest the time and I put like the calendar, my calendar, or give the scallel link,
in every email, it will respond. And if you ever want to chat, here's my calendar. Like, no matter
what the email was, every email, it would tell them to schedule time. So there's still work to be
done. You're just very helpful. It's just very, very helpful. Well, so actually, I have a GitHub
version of this, which I actually would pay someone to build, which is read somebody who opened a GitHub
issue, like, and check if they have missed anything for resolutions.
And then generate response to, like, request for resolution.
And then, like, you know, if they haven't answered in like 30 days, close the issue.
Absolutely.
And one thing I'll add to that is also the idea of the AI just going in and making PRs for you, I think is super compelling.
That it just says, hey, I found all these vulnerabilities of Patchman.
Yeah.
Yeah.
We got a sell company doing it.
So I'll let you do more.
Didi, thank you so much for coming on.
I think to wrap it up, is there any parting thoughts,
kind of like one thing that you want everyone to take away about AI
and the impact that's going to have?
Yeah, I think my parting thought is I have always been a big fan of people
of bridging the gap between research and the end consumer.
And I think this is just a great time to be alive
where if you are interested in AI or if you're even remotely interested,
of course you can go build stuff.
Of course you can read about it.
But I think it's so cool that you can just go read the paper
and read the raw things that people did to make this happen.
And I really encourage people to go and read research,
follow people on YouTube who are explaining this.
Andre Carpathie has a great channel where he also explains it.
It's just a great time to learn in this space.
And I would really encourage more people to go and read the actual stuff.
It's really cool.
Thank you so much, Didi, for coming on.
It was a great chat.
Where can people follow you on Twitter?
Any other thing you want to plug?
I think Twitter is fine, and there's a link to my website for my Twitter, too.
It's my first name, Debargo underscore Das is my Twitter, and debargo Das.com is my website,
but you can also just Google Didi Das, and you will find both of those links.
Awesome.
All right.
Thank you so much.
Thank you.
Thanks, guys.
