Tech Brew Ride Home - (BNS) Leap Labs

Starting point is 00:00:00 As the crispy chicken sandwich from 7-Eleven, people always call me loud. And I'm like, yeah, I know. I'm crispy. Did you expect me to whisper? If you want quiet, go eat some soup and reflect. Like, I know I'm a handful. I'm bold, I'm juicy. Throw some pickles and barbecue sauce on me, and baby, I'm a whole meal.

Starting point is 00:00:17 And with seven rewards, I'm just $4. Quiet. No. Krispy, saucy, and $4? Very. Only at 711. Valley 362326, participating stores only while supplies lastly out for full terms. Welcome to another bonus episode of the TechMean Right Home podcast, another portfolio profile episode.

Starting point is 00:00:41 As always, I'm your host, Brian McCullough. And hey, look at this. Our friend is back. Hey, I'm back. Chris Messina is here. Hey, Chris. Hey, hello. It's been a while, but we're here because we are going to talk about a company that Chris and I invested in through the Right Home AI Fund.

Starting point is 00:00:58 This company is Leap Labs. And we're speaking to Jessica Rumbolo. Jessica? and uh, Juga-Patel. Hi there. Um, and you are the founders of Leap Labs. So we, this is going to be a more getting into the tech and, uh, what you're actually doing. This is more, this company is making it an advancement than, oh, we have a product. So, uh, but before we do that, just tell us a little bit about Leap Labs, um, what you're attempting to do. And then let's get into the science of it.

Starting point is 00:01:31 Yeah. So we are automating scientific. discovery from data. There's a lot of data in the world. Companies spend huge amounts of money gathering data, doing R&D, but the outcomes from this process are like pretty uncertain, pretty noisy, pretty path-dependent. There are lots of good reasons for this, which I'm kind of excited to talk to you guys about. But what we're able to do, basically, is extract even complex, combinatorial, non-linear patterns from arbitrary data sets at incredible speed. and scale. And we've made a bunch of novel scientific discoveries doing this. Science being the key, and we're going to get into all this specifically, but just in a broad sense,

Starting point is 00:02:18 you've even written about this online, is there a sense that the current models of ML and especially LLMs, they're not exactly perfectly designed for scientific research and the like? Yeah. Yeah, so there are a couple of major problems here. These are language models, right? They're trained on language. In the first case, in the first instance, really, language is actually just a really noisy abstraction over the real world,

Starting point is 00:02:51 over underlying data, over observations, over the true generative functions of the world. So there's that problem. The much bigger problem is that our scientific literature is absolutely terrible. the replication crisis is real. And LLMs can't tell the difference between papers that replicate and papers that don't. So, like, a lot of the time that they're just wrong, even when they're not hallucinating, even when they are perfectly recalling facts from their training data, those facts are fundamentally unreliable.

Starting point is 00:03:24 Right. So you're saying this is even beyond the hallucination problem that any user of an LLM is familiar with. What you're saying is... We have to first explain actually what the replication crisis is, which essentially means... Sorry, do that. For the basic listener, it's sort of like, you know, how does a law get made? It's sort of like how does science get made? And it's like you have a theorem or an idea, a thesis about the world. And then you go about designing experiments or several to test out that hypothesis.

Starting point is 00:03:50 I'm sorry, I'm getting my words confused. See, this is the lossiness of language. And as a result of doing successive trials of that hypothesis, you come into clearer coherence over time that asserts that whatever you thought is true, perhaps is true. That's one set of, you know, sort of data about the world. And the question is, can you replicate that elsewhere? That's the replication part of this. And so if you can replicate it over and over again, for example, one plus one equals two, doesn't matter where you are or what language that's in. I mean, not that you would scientifically prove, I suppose, a mathematical theorem,

Starting point is 00:04:22 but one plus one is two. Basically, in most cases, almost all that we've been able to reproduce comes out the same. And so that's the replication aspect where science gets its foundation. as opposed to human laws or human insights about the world that are subjective or simply not replicate, not subject to replication and therefore they're whimsical, perhaps. And so I think what I'm hearing, just to play it back again for the listener is the real challenge is that we have a lot of science out there that asserts that the world is a certain way. There are some attempts at replication, some of which has been successful, but in fact, much of which is not. And if there isn't, an enormous amount of effort put into replication, then the science from which we derive so many

Starting point is 00:05:09 assumptions about the way things work actually is built on faulty pretences. Exactly. Okay. It's actually really upsetting. It's like kind of possible, right? Because this happens because the incentives, largely in academic publishing, are just fundamentally misaligned. They incentivize paper count.

Starting point is 00:05:33 they incentivize citation count. Can you start? Can you say what those incentives are first and then you can talk about the outcomes? Yeah. So if you're a research scientist, say, normally in academia, and in order to progress in your career, in order to get jobs, get promotions, become an esteemed scientist, you have a Which is the most important thing to be esteemed. Naturally.

Starting point is 00:05:56 It's like being an influencer on TikTok or something. You want to be esteemed in the scientific world. Of course you do. Who does it? Yeah, of course. And the way, like, we kind of metricized this in science is... That's a good word. Thank you very much.

Starting point is 00:06:12 How many novel papers have you published? How many citations do those papers have? Are those papers published in, like, good journals, right? And on the face of it, this sounds pretty sensible. But actually, it's a terrible, terrible idea, and we should immediately stop doing it and do something else. because what happens is you're a scientist, you've got some data, you do some experiment, you have some hypothesis.

Starting point is 00:06:39 Maybe you don't actually find anything that interesting. Or maybe you find something interesting, but only if you do the analysis in like a very specific way. Or maybe you run your analysis like many, many times and pick the one that is like most exciting and convincing. And then you inflate the importance of your discovery and gloss over the inconvenient details so that you can get that sexy publication that will get lots of citations. And like who has the time to replicate other people's work, right?

Starting point is 00:07:08 It's boring. Replications almost never get published. And especially if it's so idiosyncratic, right? You have to like recreate the biases that led to the outcome, which therefore is unlikely to produce the same results. And so you sort of just wasted a bunch of time, whereas you could be exploring novelty. You know, for like fans of like the PC Revolution, like if you played a game called civilization, like the old version,

Starting point is 00:07:28 You know, it's sort of like the scientific, like, world as you're describing it, as sort of like going out into like the black areas and just like finding new spaces. But then you're always being rated by barbarian hordes. You know, it's just sort of like you never get to build a civilization. So I think this is kind of what you're talking about. It's in this case being peer reviewers. Correct. Yes.

Starting point is 00:07:45 Yeah. That's right. Yeah. So obviously, you know, this is a generalization. Most scientists are not out there committing academic fraud. I hope. Like lots of scientists actually take. How much of it is intentional would you say?

Starting point is 00:07:58 it's actually like fraudulent versus like incidental and the, as you say, the incentives encourage a set of behaviors that are about, you know, glowing up the research. I think everybody who publish it is to. That's a large indictment. Well, if you want to work as a scientist. No, but realistically, sure. I mean, I think what I'm interested in is the scope and scale of the problem. And what it sounds like you're saying is that all the incentives point in one direction.

Starting point is 00:08:28 which is sort of like in the social media world, like number goes up. So if we destroy democracy in the process, that's fine because we got more followers and we got more engagement, right? We got more grant funding for our research lab. You know, our university has like really high. New buildings named by other people, etc. Okay. It's completely understandable. You know, like it's not really the scientist's fault at all.

Starting point is 00:08:53 And a lot of them are extremely, extremely concerned about this. But it's very hard to kind of change the system. whilst also succeeding in it, right? Yeah, so I don't, I just, I want to be very clear. I'm a scientist by, by, by training, by background. I was in academia for a long time. I have a PhD. I've been through the system.

Starting point is 00:09:13 Like, the vast majority of our employees here at Leap are also scientists of one kind or another. Like, we love scientists. We're here for the scientists. But they're working in, like, inside of this structure, this incentive structure that, like, is actively pushing against doing, really novel work, really exciting work.

Starting point is 00:09:31 The incentives are to play it safe, big up your results. Yeah. So also just because I think the diagnosis of this problem is critical to arrive at the solution, which you're going to describe momentarily. And I think it's important then to, I guess, ask the question about how science became somewhat perverted. And if it's because of the nexus of science and capitalism, where capitalism tends to infect everything that it sort of touches and therefore absorbs the elements of the profit

Starting point is 00:10:06 motive, you know, in order to organize effort or labor. So, for example, if you can imagine, and I didn't live back then, but I understand if there were like patrons, you could sort of invest in the sciences. And the idea would actually be that the ideas would battle. And it was less about, you know, blowing up some big theory, but instead of like having big ideas about the world and then trying to find ways to discover if those ideas were valid and then developing various tests. And then the replication piece was actually the economic kind of like driver of participation in science.

Starting point is 00:10:37 And that was obviously in contrast to like religion. Am I often this? Correct. Corrects my history. I mean, that sounds, that sounds broadly correct. However, I would point out that science actually seems to work a hell of a lot better in industry than academia. because like your outcomes are directly tied to how successful your company.

Starting point is 00:10:59 I see. But so I'm trying to sort of like create the lineage of the incentive structure in academia where like blowing up the outcomes of your results, where you're only doing incremental kind of, you know, expansions of a thought or an idea is like that feels how the incentive structures are misaligned. So you're doing incremental work, but you're trying to like blow it up into something that's much more significant. and then you're moving very quickly through the process to get more money and grants to just keep the game going.

Starting point is 00:11:27 So it's like an infinite game, but it's not quite the way it works in academia, or I'm sorry, in industry where the outcomes of your effort will actually lead to products that get to market, and then you're actually competing in the real marketplace. And so if your stuff doesn't work, then obviously you can't sell a product. And so that's the sort of corrective aspect that exists in the direct capitalist market. Absolutely. It's important to note that these problems in academic publishing in general also infect industry. Because everybody's drawing from this literature base, which is incredibly unreliable. Okay. And it forms this.

Starting point is 00:12:04 It's feeding itself. It's a set of corrosive functions on information. And large language models are going to make this so much. Okay. So, okay. Again, to bring it back for dummies to understand here, essentially what we're saying, is we've had this LLM revolution. Everyone's like, great, let's train it on the corpus of scientific literature.

Starting point is 00:12:28 And we're going to get novel insights. And your hypothesis is that maybe that's not going to be successful. And your solution to that is the discovery engine, correct? So tell us about, yes, please tell us about the discovery engine. Yeah. So it's kind of leaning into this idea that language is a really lossy abstraction over data. The logical thing to do is to go straight to the data. The problem is that humans are actually really, really bad at like looking at massive or even like small numerical data sets and finding patterns in them. We have some tools, we have some statistical tests, we have some analyses that we can run. But it's incredibly laborious. It's incredibly. like path dependent. It's full of confirmation bias, right? Because you can't, well, up until recently, you can't systematically find all of the insight there is in a data set. You can't find all of the patterns. You use this phrase path dependency. And I think that's also a little bit

Starting point is 00:13:36 jargony. My understanding is that it sort of requires that you do a series of steps. And in those steps, you actually cut off a bunch of other possibilities, even if those other possibilities are valid. And so it's almost like going from a CPU, which is like sequential, into sort of like a GPU, which is like relational. And so essentially you're creating path dependency means that you don't get to find, you know, lateral or latent relationships that might be present because you've gone down a certain path and going backwards is just too costly or just won't work. Is that? Exactly that. Yeah. You end up exploring only like a tiny fraction. of all of the possible insight, discoveries, information that might be there in your data,

Starting point is 00:14:23 which is a problem. And I guess our key insight at Leap and the thing that powers our technology was that machine learning models, especially deep neural networks, are just extremely good at finding complex patterns in data. Honestly, been true for a while. The issue has been, we are really bad at understanding neural networks. So, like, maybe they learn all of these interesting novel patterns. that would be really important for us to learn about,

Starting point is 00:14:50 but we've got no way of getting them out. And that's kind of where Leap comes in. Our core research is really interpretability. So we train big neural networks or even smaller machine learning models on completely arbitrary data sets, and then we use interpretability to extract what those models have learned from that data.

Starting point is 00:15:11 And often, you know, like it's a lot of stuff that scientists already know, right? Because they're domain experts. But way more often than you would expect, we find stuff that's completely new. And that's where our recent publications over the past few weeks. Do you have some examples that could bring this to life? Yeah. In fact, Jugal loves to talk about actually, this was our first ever case study that we did.

Starting point is 00:15:36 Yeah, it's pretty exciting. Yeah, yeah. So we had spent months working in R&D trying to get this system to work end-to-end and it bugging. and it was such a struggle. And we finally... What about that? What is real? Bugs in software?

Starting point is 00:15:53 No. No. Share your struggles. It feels real. It feels real. Any trouble for that. But the magic happened when we were thinking, oh, we're going to have to go through a ton of data sets and work with a ton of scientists before you find anything that is worth

Starting point is 00:16:08 knowing. That's a novel discovery we can publish. The very first collaborator that we worked with, he was a plant biologist from an institute. a Research Institute in France. He was working on trying to figure out the right combination of genotype of the plant and nutrients of the plants

Starting point is 00:16:26 and environmental conditions in order to make the plant root growth more efficient. And this is very important because in order to grow climate-resistant crops, you need to understand how to make these plants work

Starting point is 00:16:41 in a different rate, to be flood-resistant, to be drought-resistant, etc. And this is incredibly important for food security. So he had this dataset, and we were not very hopeful about this data set because it only had 700 rows, 700 samples of data. These samples only had 20 features. And then when we actually narrowed it down to the features he cared about, it only had seven.

Starting point is 00:17:03 So you're talking about like a tiny data is that when you're talking about, when you think about the size of data sets to use, you know, in AI today. So to sit to set what you're about to say up, I think, tell me a little bit more about how does data. was collected, like over what time period? And because like 700 rows, like, it sounds like, okay, that's a good amount of information, but it's maybe not so much as you're saying, right? You'd have 700,000 rows, 7,000 rows, much more. So in this case, how long did it take him to assemble this? And like what roughly was the process by which he gathered this?

Starting point is 00:17:34 That's a really good question. I can say, I know all about it. Yeah. So Matt, our collaborator from Institute of Fan Sciences, he does most of his experiments in a screening lab. So he grows, can't pronounce the name. It's in our blog post. It's one, you know, it's one like test species that are used because they grow really quickly.

Starting point is 00:17:56 I see. So it's like a, like a, is it a fruit fly of plants? For plant biologists, exactly. So, yeah, so he grows plants for only like maybe 15 days. And he takes lots and lots of measurements, both of the roots. They're on these like really cool slides. so you can like digitally measure the root structure and also of all of all of the conditions. So he will typically take plant that he is growing.

Starting point is 00:18:24 He will take one measurement per day. And obviously that measurement would also contain all of the information about like the mutation, the genotype and the nutrient profile of the soil that the plant is growing in. Got it. Okay, great. So like, and also just to like put a finer point on this in the world of like digital simulations, the idea is how many times can you simulate something over a frequency? And the more you can do, obviously, the more you can sort of like see different things happening. But in the

Starting point is 00:18:52 biological world, if you're dependent on, you know, a life organism doing the simulation, then you're a little bit less independent from, well, you're more dependent, I guess, on the actual world of biology. Okay, go ahead. We want to stay as close to the real world as we can in our data, you know, we don't want to And also on this point, right, if he's actually observing the real world and getting data from the real world, this also might help to address some of the issues that you were talking about before, where if you got all this data that's in the LOMs, but it, you know, wasn't actually captured with a great amount of fidelity or authenticity, then that can also cause spoilage down the line. Okay. Not to interrupt one more time, but this might be useful. as people are listening, their website is leap dash labs.com,

Starting point is 00:19:40 leap dash labs.com. And you can see some of the papers and blog posts that we've been talking about. So, Jugal, continue how this research went. Yeah, absolutely. So we put this data through the system and it flew through in a matter of hours. And what came out was not only patterns that the, scientist knows about that he knows to be true within his domain, which gave him a lot of confidence in the system that it's working, but also a novel genotype and nutrient combination

Starting point is 00:20:15 that he was unaware of that maximize this root growth feature that he really cared about to maximize the efficiency. And this was after he had already, as a domain expert, spent months scrolling through Excel trying to find patterns in this data. And our system, which is completely agnostic, was able to find these patterns that he had... What does it mean to scroll through Excel? Like, literally he's like looking at numbers and trying to make correlations. Like, that seems like a while that's like... I imagine he's doing kind of your standard scientific analyses.

Starting point is 00:20:49 Okay. But they kind of fall down if you're looking for like nonlinear patterns and not... I imagine, like, if I were like a fly on a dartboard trying to figure out what surface I was on, that feels kind of what you're describing, you know? And then you like zoom out and you're like, oh, like, you know, here's the red square and you know, this one, anyways, not really a dark player, but yeah. Okay. I see. Yeah, I think what was the most exciting was when we got on a call with him after the delivery of these results, and he immediately said, when can we work on another data set together? And we have something here. And then also,

Starting point is 00:21:25 he's already changing his experimental process because of what our system allows him to do. So previously he would do like a very simple targeted experiment and only measure certain things because he only has the capacity to go through the results in this very many I see so he can broaden the aperture effectively yeah so he has the capacity to take in more data but before he couldn't actually analyze it and so you're giving him sort of a super like a super skeleton to or like a brain on top of his brain to be able to understand okay got it and so his research is going like no and he's using what he got from the discovery engine to guide his research. Amazing.

Starting point is 00:22:07 It is another analogy for the discovery engine that, like, as opposed to, okay, I have a hypothesis test, yes or no. Hypothesis test, yes or no. Hypothesis test, yes or no. Maybe I use an LLM to prompt it to generate hypotheses. What you're saying is the discovery engine basically, it's a delivery engine for here's 18 new hypotheses you might want to try. Okay.

Starting point is 00:22:36 But I think it's really important to note that, like, everything we find is empirically validated. It's not the model, well, we do two things, right? We provide patterns, discoveries, insights, whatever, that are empirically validated in the data. So these are not the model extrapolating, saying, like, hey, why not try this? This is, here is a pattern that I have found in your data, and here is all of the evidence for it. Like citations for these discoveries, effectively leading back to the data, the source of the data. Or validation, yeah. Yeah.

Starting point is 00:23:13 So it's like, here is a subset of the data. And if you filter by this pattern on this data, you will see exactly the pattern that the model has found. So it's empirically validated kind of built in. Of course, we can also get the models to extrapolate from the data that we have. So, you know, for example, with the plant biology stuff, to find combinations of variables that aren't actually present in the data. data set, but we flag these as like more speculative. Like the model thinks, you know, for maximizing the thing that you care about, this region of the parameter space seems promising.

Starting point is 00:23:47 But a lot of the time, like because this data analysis, when you do it manually, is so laborious, there's so much low hanging fruit just by finding these combinatorial patterns automatically. So one of the things that sounds interesting, challenging and, um, I don't know, I suppose this is like where you guys are at in terms of like the business is thinking about like context and focus. So I'm sort of imagining that what you guys are building from this discovery engine is almost like, you know, Google Earth, but for reality. And so you can sort of say, okay, if you want to like discover some new, you know, plant type that survives very well in a certain

Starting point is 00:24:26 region, then Google Earth knows where all the temperature zones are and sort of like points you in a part of the world. And then it's like, okay, now you want to like zoom in. And the level of Zoom that you want to have will determine your ability to then maybe try a set of experiments or to learn about that part of the world. Now, this is obviously another very gross bastardization slash metaphor, but in terms of, you know, the known reality that people could like relate to, perhaps, it's like the world is there. Reality exists. The question is, how do you understand it and how do you bring together the right sensor data about it in order to make interpretation of reality? And then how do you apply these mathematical models or machine learning models to see those

Starting point is 00:25:04 patterns that exist, perhaps through the world, from one end of the planet to the other. So from an information perspective. So my question is kind of that. Like, it sounds like it would be great to be able to dump in all, like, all of the data. Let's say if you just like got rid of all of the scientific knowledge that's ever been produced. And you just started today with all of the sensors that exist in the world so that you have some ground truth. And you just let the models run to say, where are you seeing patterns? and then later on we develop language to describe what these patterns mean.

Starting point is 00:25:36 I mean, that would be on the one hand amazing. It would take a long time. It would take probably all to compute in the world and we'd have to like drain the sun to make that happen. How do you then sort of apply this to the right size problems? Right. Like you've talked about this like plant biologist. That sounds like very specific, very tight. Now he's expanding because he doesn't have to worry so much about the data analysis.

Starting point is 00:25:55 These patterns will be discovered roughly as a result of him producing more data and putting the data into the system, but at some point, you almost end up with too much noise. So is noise ultimately a problem in terms of getting too much data, or is that not something that you're worried about? I mean. Okay. So there's like loads and loads of stuff to talk about in what you just said. I know. I'm sorry. No, no, no, that's fine. It's good. I end up sort of dropping these zip files and then we expand them and they're like all these like files and it's like going to all these different folders. You like, right?

Starting point is 00:26:32 Yeah. Yeah. I need to talk about that. What can I say? I think noise is a really interesting point. We can talk about sources of noise in the journey from the real world to like understand. And also I don't want to be like pejorative about noise. Noise is beautiful, you know, so. That's a lovely sentiment.

Starting point is 00:26:49 I'm not such a big fan of noise myself. But I think the point about like data scale is also really interesting. Maybe I'll say that first because like a couple of years ago when we. when we were like, we have this idea. We're building this like really cool interpretability. Neural networks, we think probably no stuff that we're not aware of. Maybe this could be a new scientific method. Like we think we might have something here.

Starting point is 00:27:19 We were kind of envisaging something very similar to what you suggested. Like passive data, sensors, robots, all of the data we will find all of the patterns and it will be amazing. What has actually happened kind of as our case study with, with Matt, a very first case study has shown that there is actually so much low hanging fruit, even in small data sets. Humans, God love them, which is really, really bad at, like, finding these patterns manually ourselves. Like, we, we, there were probably like trillions of dollars on the table hanging around in R&D data sets on servers just, just because we don't know how to

Starting point is 00:27:57 find them. I'm, I figure we're going to do that first. And then later, we will tile. the known universe with senses and figure out. Well, to that end, I mean, we're talking about biology, you know, you're thinking of like medical discovery and stuff like that, but is this applicable to basically anything, like materials, science, like, what, if I'm listening, what's a left field thing that maybe I could potentially use this for? Oh, a left field thing. Yeah.

Starting point is 00:28:28 I don't know. So I was having a conversation with my friend the other day who's, like, really interesting. into like Brian Johnson and quantified self and health and longevity. Don't die. Don't die. I'm a big fan. I don't want to die. And you can either. Yeah. Yeah. So, so I mean, obviously I see everything in terms of data sets these days. So I was like, hey, you know, give us your data. We'll run it through.

Starting point is 00:28:55 We'll find the patterns. We are very much by design, domain agnostic. To the neural network, it's all numbers. doesn't really matter. In terms of like go to market and actually like serving this technology to scientists in a way that makes sense for them and like fits in with their worldview and their processes, that's obviously a little bit different. But yeah, under the hood, I mean, you can train neural networks on anything these days. Go ahead, Brian. Yeah, you mentioned one, the first case study. How many folks at least

Starting point is 00:29:29 to date are you working with? So it's not just the one case study. How many other folks have you been working with. Yeah, so we've got, how many publications have we've got, yeah, we've, we've, we have published four, uh, preprints. Um, we are also working on a collaboration with meta that should be another publication soon. That's in, that's actually in materials, yeah. Um, we've done a couple of other case studies that didn't make it to publication, because it was just validation, like, we found a load of known patterns, but nothing's not super interesting. But, but across multiple different areas, People are like, this is useful. So you're proving out that it's useful to folks in a lot of different areas.

Starting point is 00:30:13 Yeah, we've got plant biology, meteorology, advanced materials, immunology, catalyst-y stuff. I guess that's advanced materials as well. Yeah, oh, Alzheimer's, all of that medical, clinical stuff. That organism thing. Oh, ocean proteomics. I want to get this in here again because I'm imagining people listening and being like, oh, I'd like to test this out.

Starting point is 00:30:40 So we're not wrapping yet because I want to hear your backgrounds. And Chris has some more questions too. But if I am intrigued by what we're talking about right now, where should I go to start working with your model? You should email us. We are in the process of standing up a self-service dashboard, which is obviously very, very exciting. But yeah, very, very much. Also, what's a good email for you guys? Hello at leap dash labs.com. Yeah.com. Yeah. Yeah. Checking out our blog page on our website

Starting point is 00:31:15 will give you a good idea of like the variety of different scientific domains who've worked in what we've been able to find. Yeah. And you can follow me on Twitter for occasional rants about how science is broken and how we must immediately fix it. Okay. Sorry, Brian. I'm going to jump in. like, because I could, I could like, obviously like spool out this conversation, you know, indefinitely. I guess it would be valuable to get a sense for where you guys in terms of like where you are in terms of your startup journey. What is for like the next steps? What's the roadmap? You know, I'd love to continue to talk about like fixing the incentive structure in science. But at the same time, you guys do live in the capitalist system. We did put an investment into you guys.

Starting point is 00:31:53 And so, you know, we'd love to know kind of where things sit in terms of the, the evolution. of the business. Yeah. Do you want the story from the start or just the fact that we're like... Yeah, we're not. Yeah. Okay, cool. So Jigel and I founded Lee two...

Starting point is 00:32:09 Two years. Yeah, two and a half years. I'm with... Yeah, basically to continue some interpretability research that I'd been working on. And initially, we were like, we're going to build an interpretability engine because interpretability is really important. Like, you can use it to detect bias. You can use it to, like, predict.

Starting point is 00:32:29 failure modes on out of distribution data. Oh, and like maybe you can use it for scientific discovery as well. Sorry, just so I understand, like when you talk about like interpretability, what is the format of the output? Like, what do I get a report that says here's how to interpret what you found or is it like something else? So in interpretability in general, there are many, many, many different methods. A bunch of these are like proprietary stuff at Leap and they can output information in

Starting point is 00:32:59 all kinds of different formats. We're really leaning into using violin plots and bar plots at the moment. I'm sorry, what plots? Violin plot. I need to show you a picture. If you are curious about violin thoughts, visit. Violin, like the musical instrument? Hmm?

Starting point is 00:33:17 Like the, how do you spell that? Violin, as in... Is it? Okay, okay, got it. All right. You understand it if you look at them. Okay. Maybe the short answer,

Starting point is 00:33:28 because I know time is tight, is that there are many different ways to kind of express the patterns that we find in a human readable format. We do like some charts and plots of various kinds. We also provide logical rules that allow you to filter the data to kind of find the samples that support this pattern.

Starting point is 00:33:46 But there are many different ways. Like data visualization is incredibly interesting. Totally. Yeah. Okay. So that's kind of like how it started. Yeah. Okay.

Starting point is 00:33:56 And so you went from interpretability into disco. Yeah, because we decided that of all of the different use cases of interpability, scientific discovery was the most difficult, and we should probably be that. I appreciate the ambition. It's the most important thing, right? Also that. Scientific progress is the bottleneck on humanity flourishing. Like, it's the biggest lever.

Starting point is 00:34:20 So we want to. Sorry, just like it does occur to me also. Like, in terms of, let's say, like the last 2,000 years of culture, like science has a very specific place in it. But truth, reality, authenticity are aspects that are becoming even more important when you can synthesize and generate nearly anything. And so to your point, the faster we can get to an ability of almost like turning raw data into intuition about reality, then that will actually settle a lot of the polarizing topics

Starting point is 00:34:52 of our time because we can simply, as you say, look to the data and have an interpretability layer on top that essentially says, look, here are the patterns that are there that you as humans, with your, you know, grandiose ideas about the world, but your very limited perspective on reality, should know about what's actually happening here behind the scenes. Like making scientific methodology better is the more popular on basically everything that I care about. So totally. Seemed like a good, yeah, good path. So yeah, so we, sorry, in startup journey land,

Starting point is 00:35:25 we've done our like interpretability research that we've done, we've, we've, we've raised, to seed rounds. After that seed round, we kind of decided to really focus more on the discovery application of the interpretability research that we've done. Yeah. And so, so like we started prototyping this system. We knew it needed to be automated because like lots and lots of scientists can't train machine learning models. And this is okay. Reasonably. They can focus on what they're good at and you know, you can do the other part. Absolutely. But it's not going to fly if we make them train models for everything, so we'll do that. And that's fine. So we built this prototype system, and it worked, which was incredible. And then like very, very, but it was quite like manual and

Starting point is 00:36:09 messy and stuff. You know, it was a proof of concept. How long ago was this? About a year ago. So we'd gone from super scrappy prototype to like full automation end-to-end system. And yeah, and now we're, now we're doing fully automated discovery. What's your, just quickly, What's your tech stack? Oh, God. You need to talk to the CTO. Okay. I'm not going to, my background is as a research scientist, AI research.

Starting point is 00:36:35 So I use Python and Pytorch, and I used to use Map Plotlib, but now I get chat GP2 to make my plots for me. Nice, nice. I'm wondering, like, you know, are you guys, like, you know, on like a raw Nvidia compute, or are you using, like, you know, something else? Like, what is sort of like the cloud solution there? We're on GCP. Okay. Yeah. We have, like, a distributed fancy autoML set up on there that, like, spins up the cloud.

Starting point is 00:36:56 and stuff. And then the front end is not interesting for me to talk about. Okay. That means I don't actually know because I'm not a software development. I say, okay. It's not interesting for you as in it's not interesting to you. You're like, okay, it's incidental. There be tests and engineers and things like. I see. I understand. It's a black box as far as you're concerned. Well above my pay grade. Understood. Okay. So where are you guys at and then in your startup journey now then? Yeah, so we are looking for our first industry pilots. We're talking to some really good guys.

Starting point is 00:37:34 I'm very, very excited about that. And we are just starting to raise our series A. Exciting. And how far into that? So you're basically like, what kind of investors are you seeking? We want stupid. Google has opinions. We're looking for investors that are familiar with D-TAC, right?

Starting point is 00:37:55 So like investors that were very early in deep mind or very early and anthropic or like very like long term big vision folks that get it and want to get in early and want to get it on the floor and like are familiar with developing a really groundbreaking breaking world changing technologies like that. Yeah. We've been we're also fortunate in that we've been I guess like building relationships with with with some funds that we really like for a little while now. So, yeah, it's, it's feeling pretty good. We're having conversations. Oh, and I'm going to be in San Francisco from, we both will be in San Francisco. July 26th. But like three, three, three, three weeks.

Starting point is 00:38:38 Yeah. Yeah. Okay. Coming up. Yeah. Great. And where are you guys based typically? London and San Francisco.

Starting point is 00:38:44 I could hear that somehow in the accent. Yeah. So as you can hear at this point, I tried to come back into the conversation, and again, it didn't work out. All I tried to say was, if you're interested in Leap Labs at all, look them up at Leap dashlabs.com, send them an email if you want to work with them. They're taking all comers at the moment. I will have an email in the show notes as well as the white paper and all that other good stuff

Starting point is 00:39:12 in the show notes as well. No, I think this is great. This is super helpful. I'm really excited about where you guys are at. You know, I was thinking also, and I feel like this is like one of the things that got me excited about the investment when we first talked. I'm kind of a fan of Alan Watson. He describes this concept of the grid of words. And the grid of words essentially suggests that human language could be understood as if it were on a graph paper. The words that we use to describe reality

Starting point is 00:39:42 are just the dots. And in fact, reality is made up of all the parts in between, all the negative spaces. And so in a large way, what you guys are talking about is being able to map and understand what those negative spaces are. And so the more that we're able to actually kind of like blur out from the dots to see like the entire picture, the better off will be in terms of understanding reality. And so that's ultimately what I think you guys are building and applying, you know, machine learning and AI to do. So that's why I'm personally excited about it.

Starting point is 00:40:14 And I'm super excited that you guys are here at this part of the journey. And, you know, your thought leadership will continue to grow, especially once you come to San Francisco and start to talk to people here. Like, it'll be infection, or infectious rather. So anything else you guys want to leave with? Just, yeah, like if you're a scientist and you've got some interesting data, even if it's just a few hundred samples. Oh, that's exciting.

Starting point is 00:40:40 So with this self-serve thing, are people going to be able to go to some place and, like, upload a file and just like the thing's going to happen? Are you, like, charging for this? Like, how does that work right now? Right now, this is all, we're trying to figure it out at the moment, right? Okay, great. But my hope is that we'll be able to make the SELSA platform completely free for academics. And then, like, do enterprise sales.

Starting point is 00:41:04 Sure, I got it. Actually, yeah, so on that point, I think this will be important for that, for both of those audiences, which is around, like, privacy, ownership, and IP. Like, these things, you know, are whatever. whatever they are. But obviously they're probably quite important to those different groups. And so how does that factor into what you guys are doing? So we can do on-prem, basically. Okay. Like we have a secure cluster and stuff.

Starting point is 00:41:34 If you've got like your own compute and you want to run disco on that, that's, that's fine too. We can support that. Yeah, basically, we totally get it. Like we're all scientists too. We want to protect your data. Yeah, and we're like, we don't keep the data, we don't aggregate it, we don't, we don't sell it, we don't do anything nefarious as well, we are here for the science and lightly a bit more now. Especially if you've got data, you've got this discovery engine, it's going to look at it in this like, it's like sort of, have you ever like seen those like infrared cameras like looking at like flowers? It's like you can like see entirely, have you never seen this before?

Starting point is 00:42:10 You got to like check this out. It's amazing. So it's the way that birds and I don't know if it's birds, maybe it's bees. Anyways, that's a whole different conversation. But there's like different ways of seeing an infrared that allows you to see the world in an entirely different way. And so flowers actually become almost like these landing pads. They're like these targets that are so clear and easy to see when you're seeing infrared. But like when you see them as, you know, in human eyeballs, you don't see it that way.

Starting point is 00:42:33 So I feel like that's kind of what you guys are offering in terms of this different way to see through the data and to get these insights. I should make all the world a rose garden. Love it. Love it. All right. it there. Thanks so much, guys. This is exciting. Thanks, thanks, bye. See you all in the day. See you soon. Cool.

Tech Brew Ride Home - (BNS) Leap Labs

To find out more about Leap Labs go to Leap-Labs.com The white paper is here. Blog is here (with case studies). To get in touch with them: hello@leap-labs.com Learn more about your ad choi...ces. Visit megaphone.fm/adchoices

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.