Tech Brew Ride Home - (BNS) Leap Labs
Episode Date: July 19, 2025To find out more about Leap Labs go to Leap-Labs.com The white paper is here. Blog is here (with case studies). To get in touch with them: hello@leap-labs.com Learn more about your ad choi...ces. Visit megaphone.fm/adchoices
Transcript
Discussion (0)
As the crispy chicken sandwich from 7-Eleven, people always call me loud.
And I'm like, yeah, I know.
I'm crispy.
Did you expect me to whisper?
If you want quiet, go eat some soup and reflect.
Like, I know I'm a handful.
I'm bold, I'm juicy.
Throw some pickles and barbecue sauce on me, and baby, I'm a whole meal.
And with seven rewards, I'm just $4.
Quiet.
No.
Krispy, saucy, and $4?
Very.
Only at 711.
Valley 362326, participating stores only while supplies lastly out for full terms.
Welcome to another bonus episode of the TechMean Right Home podcast, another portfolio profile episode.
As always, I'm your host, Brian McCullough.
And hey, look at this.
Our friend is back.
Hey, I'm back.
Chris Messina is here.
Hey, Chris.
Hey, hello.
It's been a while, but we're here because we are going to talk about a company that Chris and I invested in through the Right Home AI Fund.
This company is Leap Labs.
And we're speaking to Jessica Rumbolo.
Jessica?
and uh, Juga-Patel.
Hi there. Um, and you are the founders of Leap Labs. So we, this is going to be a more
getting into the tech and, uh, what you're actually doing. This is more, this company is making
it an advancement than, oh, we have a product. So, uh, but before we do that, just tell us a little
bit about Leap Labs, um, what you're attempting to do. And then let's get into the science of it.
Yeah. So we are automating scientific.
discovery from data. There's a lot of data in the world. Companies spend huge amounts of money
gathering data, doing R&D, but the outcomes from this process are like pretty uncertain,
pretty noisy, pretty path-dependent. There are lots of good reasons for this, which I'm kind of
excited to talk to you guys about. But what we're able to do, basically, is extract even complex,
combinatorial, non-linear patterns from arbitrary data sets at incredible speed.
and scale. And we've made a bunch of novel scientific discoveries doing this.
Science being the key, and we're going to get into all this specifically, but just in a broad sense,
you've even written about this online, is there a sense that the current models of ML and especially
LLMs, they're not exactly perfectly designed for scientific research and the like?
Yeah.
Yeah, so there are a couple of major problems here.
These are language models, right?
They're trained on language.
In the first case, in the first instance, really,
language is actually just a really noisy abstraction over the real world,
over underlying data, over observations,
over the true generative functions of the world.
So there's that problem.
The much bigger problem is that our scientific literature is absolutely terrible.
the replication crisis is real.
And LLMs can't tell the difference between papers that replicate and papers that don't.
So, like, a lot of the time that they're just wrong, even when they're not hallucinating,
even when they are perfectly recalling facts from their training data, those facts are fundamentally unreliable.
Right. So you're saying this is even beyond the hallucination problem that any user of an LLM is familiar with.
What you're saying is...
We have to first explain actually what the replication crisis is, which essentially means...
Sorry, do that.
For the basic listener, it's sort of like, you know, how does a law get made?
It's sort of like how does science get made?
And it's like you have a theorem or an idea, a thesis about the world.
And then you go about designing experiments or several to test out that hypothesis.
I'm sorry, I'm getting my words confused.
See, this is the lossiness of language.
And as a result of doing successive trials of that hypothesis, you come into clearer coherence over time
that asserts that whatever you thought is true,
perhaps is true. That's one set of, you know, sort of data about the world. And the question is,
can you replicate that elsewhere? That's the replication part of this. And so if you can replicate it
over and over again, for example, one plus one equals two, doesn't matter where you are or what
language that's in. I mean, not that you would scientifically prove, I suppose, a mathematical theorem,
but one plus one is two. Basically, in most cases, almost all that we've been able to reproduce
comes out the same. And so that's the replication aspect where science gets its foundation.
as opposed to human laws or human insights about the world that are subjective or simply not
replicate, not subject to replication and therefore they're whimsical, perhaps. And so I think what I'm
hearing, just to play it back again for the listener is the real challenge is that we have a lot of
science out there that asserts that the world is a certain way. There are some attempts at replication,
some of which has been successful, but in fact, much of which is not. And if there isn't,
an enormous amount of effort put into replication, then the science from which we derive so many
assumptions about the way things work actually is built on faulty pretences.
Exactly.
Okay.
It's actually really upsetting.
It's like kind of possible, right?
Because this happens because the incentives, largely in academic publishing, are just
fundamentally misaligned.
They incentivize paper count.
they incentivize citation count.
Can you start?
Can you say what those incentives are first and then you can talk about the outcomes?
Yeah.
So if you're a research scientist, say, normally in academia, and in order to progress in your
career, in order to get jobs, get promotions, become an esteemed scientist, you have a
Which is the most important thing to be esteemed.
Naturally.
It's like being an influencer on TikTok or something.
You want to be esteemed in the scientific world.
Of course you do.
Who does it?
Yeah, of course.
And the way, like, we kind of metricized this in science is...
That's a good word.
Thank you very much.
How many novel papers have you published?
How many citations do those papers have?
Are those papers published in, like, good journals, right?
And on the face of it, this sounds pretty sensible.
But actually, it's a terrible, terrible idea,
and we should immediately stop doing it and do something else.
because what happens is you're a scientist, you've got some data, you do some experiment,
you have some hypothesis.
Maybe you don't actually find anything that interesting.
Or maybe you find something interesting, but only if you do the analysis in like a very
specific way.
Or maybe you run your analysis like many, many times and pick the one that is like most
exciting and convincing.
And then you inflate the importance of your discovery and gloss over the inconvenient
details so that you can get that sexy publication that will get lots of citations.
And like who has the time to replicate other people's work, right?
It's boring.
Replications almost never get published.
And especially if it's so idiosyncratic, right?
You have to like recreate the biases that led to the outcome, which therefore is unlikely
to produce the same results.
And so you sort of just wasted a bunch of time, whereas you could be exploring novelty.
You know, for like fans of like the PC Revolution, like if you played a game called
civilization, like the old version,
You know, it's sort of like the scientific, like, world as you're describing it,
as sort of like going out into like the black areas and just like finding new spaces.
But then you're always being rated by barbarian hordes.
You know, it's just sort of like you never get to build a civilization.
So I think this is kind of what you're talking about.
It's in this case being peer reviewers.
Correct.
Yes.
Yeah.
That's right.
Yeah.
So obviously, you know, this is a generalization.
Most scientists are not out there committing academic fraud.
I hope.
Like lots of scientists actually take.
How much of it is intentional would you say?
it's actually like fraudulent versus like incidental and the, as you say, the incentives
encourage a set of behaviors that are about, you know, glowing up the research.
I think everybody who publish it is to.
That's a large indictment.
Well, if you want to work as a scientist.
No, but realistically, sure.
I mean, I think what I'm interested in is the scope and scale of the problem.
And what it sounds like you're saying is that all the incentives point in one direction.
which is sort of like in the social media world, like number goes up.
So if we destroy democracy in the process, that's fine because we got more followers and we got more engagement, right?
We got more grant funding for our research lab.
You know, our university has like really high.
New buildings named by other people, etc.
Okay.
It's completely understandable.
You know, like it's not really the scientist's fault at all.
And a lot of them are extremely, extremely concerned about this.
But it's very hard to kind of change the system.
whilst also succeeding in it, right?
Yeah, so I don't, I just, I want to be very clear.
I'm a scientist by, by, by training, by background.
I was in academia for a long time.
I have a PhD.
I've been through the system.
Like, the vast majority of our employees here at Leap
are also scientists of one kind or another.
Like, we love scientists.
We're here for the scientists.
But they're working in, like, inside of this structure,
this incentive structure that, like,
is actively pushing against doing,
really novel work, really exciting work.
The incentives are to play it safe, big up your results.
Yeah.
So also just because I think the diagnosis of this problem is critical to arrive at the solution,
which you're going to describe momentarily.
And I think it's important then to, I guess, ask the question about how science became
somewhat perverted.
And if it's because of the nexus of science and capitalism, where capitalism tends to
infect everything that it sort of touches and therefore absorbs the elements of the profit
motive, you know, in order to organize effort or labor. So, for example, if you can imagine,
and I didn't live back then, but I understand if there were like patrons, you could sort
of invest in the sciences. And the idea would actually be that the ideas would battle. And it was less
about, you know, blowing up some big theory, but instead of like having big ideas about the
world and then trying to find ways to discover if those ideas were valid and then developing
various tests.
And then the replication piece was actually the economic kind of like driver of participation
in science.
And that was obviously in contrast to like religion.
Am I often this?
Correct.
Corrects my history.
I mean, that sounds, that sounds broadly correct.
However, I would point out that science actually seems to work a hell of a lot better in
industry than academia.
because like your outcomes are directly tied to how successful your company.
I see.
But so I'm trying to sort of like create the lineage of the incentive structure in academia
where like blowing up the outcomes of your results,
where you're only doing incremental kind of, you know, expansions of a thought or an idea
is like that feels how the incentive structures are misaligned.
So you're doing incremental work, but you're trying to like blow it up into something
that's much more significant.
and then you're moving very quickly through the process to get more money and grants to just keep the game going.
So it's like an infinite game, but it's not quite the way it works in academia, or I'm sorry, in industry where the outcomes of your effort will actually lead to products that get to market, and then you're actually competing in the real marketplace.
And so if your stuff doesn't work, then obviously you can't sell a product.
And so that's the sort of corrective aspect that exists in the direct capitalist market.
Absolutely.
It's important to note that these problems in academic publishing in general also infect industry.
Because everybody's drawing from this literature base, which is incredibly unreliable.
Okay.
And it forms this.
It's feeding itself.
It's a set of corrosive functions on information.
And large language models are going to make this so much.
Okay.
So, okay.
Again, to bring it back for dummies to understand here, essentially what we're saying,
is we've had this LLM revolution.
Everyone's like, great, let's train it on the corpus of scientific literature.
And we're going to get novel insights.
And your hypothesis is that maybe that's not going to be successful.
And your solution to that is the discovery engine, correct?
So tell us about, yes, please tell us about the discovery engine.
Yeah. So it's kind of leaning into this idea that language is a really lossy abstraction over data. The logical thing to do is to go straight to the data. The problem is that humans are actually really, really bad at like looking at massive or even like small numerical data sets and finding patterns in them. We have some tools, we have some statistical tests, we have some analyses that we can run. But it's incredibly laborious. It's incredibly.
like path dependent. It's full of confirmation bias, right? Because you can't, well, up until
recently, you can't systematically find all of the insight there is in a data set. You can't
find all of the patterns. You use this phrase path dependency. And I think that's also a little bit
jargony. My understanding is that it sort of requires that you do a series of steps.
And in those steps, you actually cut off a bunch of other possibilities, even if those other possibilities
are valid. And so it's almost like going from a CPU, which is like sequential, into sort of like a
GPU, which is like relational. And so essentially you're creating path dependency means that you don't
get to find, you know, lateral or latent relationships that might be present because you've
gone down a certain path and going backwards is just too costly or just won't work. Is that?
Exactly that. Yeah. You end up exploring only like a tiny fraction.
of all of the possible insight, discoveries, information that might be there in your data,
which is a problem.
And I guess our key insight at Leap and the thing that powers our technology was that machine
learning models, especially deep neural networks, are just extremely good at finding complex
patterns in data.
Honestly, been true for a while.
The issue has been, we are really bad at understanding neural networks.
So, like, maybe they learn all of these interesting novel patterns.
that would be really important for us to learn about,
but we've got no way of getting them out.
And that's kind of where Leap comes in.
Our core research is really interpretability.
So we train big neural networks
or even smaller machine learning models
on completely arbitrary data sets,
and then we use interpretability
to extract what those models have learned from that data.
And often, you know, like it's a lot of stuff
that scientists already know, right?
Because they're domain experts.
But way more often than you would expect, we find stuff that's completely new.
And that's where our recent publications over the past few weeks.
Do you have some examples that could bring this to life?
Yeah.
In fact, Jugal loves to talk about actually, this was our first ever case study that we did.
Yeah, it's pretty exciting.
Yeah, yeah.
So we had spent months working in R&D trying to get this system to work end-to-end and it bugging.
and it was such a struggle.
And we finally...
What about that?
What is real?
Bugs in software?
No.
No.
Share your struggles.
It feels real.
It feels real.
Any trouble for that.
But the magic happened when we were thinking, oh, we're going to have to go through a ton of
data sets and work with a ton of scientists before you find anything that is worth
knowing.
That's a novel discovery we can publish.
The very first collaborator that we worked with, he was a plant biologist from an institute.
a Research Institute in France.
He was working on
trying to figure out the right combination
of genotype of the plant
and nutrients of the plants
and environmental conditions
in order to make the plant
root growth more efficient.
And this is very important
because in order to grow
climate-resistant crops,
you need to understand
how to make these plants work
in a different rate,
to be flood-resistant,
to be drought-resistant, etc.
And this is incredibly important
for food security.
So he had this dataset, and we were not very hopeful about this data set because it only had 700 rows, 700 samples of data.
These samples only had 20 features.
And then when we actually narrowed it down to the features he cared about, it only had seven.
So you're talking about like a tiny data is that when you're talking about, when you think about the size of data sets to use, you know, in AI today.
So to sit to set what you're about to say up, I think, tell me a little bit more about how does data.
was collected, like over what time period?
And because like 700 rows, like, it sounds like, okay, that's a good amount of information,
but it's maybe not so much as you're saying, right?
You'd have 700,000 rows, 7,000 rows, much more.
So in this case, how long did it take him to assemble this?
And like what roughly was the process by which he gathered this?
That's a really good question.
I can say, I know all about it.
Yeah.
So Matt, our collaborator from Institute of Fan Sciences, he does most of his experiments in a
screening lab.
So he grows, can't pronounce the name.
It's in our blog post.
It's one, you know, it's one like test species that are used because they grow really quickly.
I see.
So it's like a, like a, is it a fruit fly of plants?
For plant biologists, exactly.
So, yeah, so he grows plants for only like maybe 15 days.
And he takes lots and lots of measurements, both of the roots.
They're on these like really cool slides.
so you can like digitally measure the root structure and also of all of all of the conditions.
So he will typically take plant that he is growing.
He will take one measurement per day.
And obviously that measurement would also contain all of the information about like the mutation,
the genotype and the nutrient profile of the soil that the plant is growing in.
Got it.
Okay, great.
So like, and also just to like put a finer point on this in the world of like digital
simulations, the idea is how many times can you simulate something over a frequency? And the more
you can do, obviously, the more you can sort of like see different things happening. But in the
biological world, if you're dependent on, you know, a life organism doing the simulation, then you're a
little bit less independent from, well, you're more dependent, I guess, on the actual world of biology.
Okay, go ahead.
We want to stay as close to the real world as we can in our data, you know, we don't want to
And also on this point, right, if he's actually observing the real world and getting data from the real world, this also might help to address some of the issues that you were talking about before, where if you got all this data that's in the LOMs, but it, you know, wasn't actually captured with a great amount of fidelity or authenticity, then that can also cause spoilage down the line.
Okay.
Not to interrupt one more time, but this might be useful.
as people are listening, their website is leap dash labs.com,
leap dash labs.com.
And you can see some of the papers and blog posts that we've been talking about.
So, Jugal, continue how this research went.
Yeah, absolutely.
So we put this data through the system and it flew through in a matter of hours.
And what came out was not only patterns that the,
scientist knows about that he knows to be true within his domain, which gave him a lot of
confidence in the system that it's working, but also a novel genotype and nutrient combination
that he was unaware of that maximize this root growth feature that he really cared about
to maximize the efficiency. And this was after he had already, as a domain expert, spent
months scrolling through Excel trying to find patterns in this data. And our system, which is
completely agnostic, was able to find these patterns that he had...
What does it mean to scroll through Excel?
Like, literally he's like looking at numbers and trying to make correlations.
Like, that seems like a while that's like...
I imagine he's doing kind of your standard scientific analyses.
Okay.
But they kind of fall down if you're looking for like nonlinear patterns and not...
I imagine, like, if I were like a fly on a dartboard trying to figure out what surface I was on,
that feels kind of what you're describing, you know?
And then you like zoom out and you're like, oh, like, you know, here's the red square and
you know, this one, anyways, not really a dark player, but yeah. Okay. I see. Yeah, I think what was
the most exciting was when we got on a call with him after the delivery of these results, and he
immediately said, when can we work on another data set together? And we have something here. And then also,
he's already changing his experimental process because of what our system allows him to do. So
previously he would do like a very simple targeted experiment and only measure certain things because
he only has the capacity to go through the results in this very many I see so he can broaden the aperture
effectively yeah so he has the capacity to take in more data but before he couldn't actually
analyze it and so you're giving him sort of a super like a super skeleton to or like a brain on top of his
brain to be able to understand okay got it and so his research is going like no and he's
using what he got from the discovery engine to guide his research.
Amazing.
It is another analogy for the discovery engine that, like, as opposed to, okay, I have a
hypothesis test, yes or no.
Hypothesis test, yes or no.
Hypothesis test, yes or no.
Maybe I use an LLM to prompt it to generate hypotheses.
What you're saying is the discovery engine basically, it's a delivery engine for here's
18 new hypotheses you might want to try.
Okay.
But I think it's really important to note that, like, everything we find is empirically validated.
It's not the model, well, we do two things, right?
We provide patterns, discoveries, insights, whatever, that are empirically validated in the data.
So these are not the model extrapolating, saying, like, hey, why not try this?
This is, here is a pattern that I have found in your data, and here is all of the evidence for it.
Like citations for these discoveries, effectively leading back to the data, the source of the data.
Or validation, yeah.
Yeah.
So it's like, here is a subset of the data.
And if you filter by this pattern on this data, you will see exactly the pattern that the model has found.
So it's empirically validated kind of built in.
Of course, we can also get the models to extrapolate from the data that we have.
So, you know, for example, with the plant biology stuff, to find combinations of variables that aren't actually present in the data.
data set, but we flag these as like more speculative.
Like the model thinks, you know, for maximizing the thing that you care about,
this region of the parameter space seems promising.
But a lot of the time, like because this data analysis, when you do it manually,
is so laborious, there's so much low hanging fruit just by finding these combinatorial
patterns automatically.
So one of the things that sounds interesting, challenging and, um,
I don't know, I suppose this is like where you guys are at in terms of like the business is thinking
about like context and focus. So I'm sort of imagining that what you guys are building from this
discovery engine is almost like, you know, Google Earth, but for reality. And so you can sort of say,
okay, if you want to like discover some new, you know, plant type that survives very well in a certain
region, then Google Earth knows where all the temperature zones are and sort of like points you
in a part of the world. And then it's like, okay, now you want to like zoom in. And the level of
Zoom that you want to have will determine your ability to then maybe try a set of experiments or to
learn about that part of the world. Now, this is obviously another very gross bastardization slash
metaphor, but in terms of, you know, the known reality that people could like relate to, perhaps,
it's like the world is there. Reality exists. The question is, how do you understand it and how do
you bring together the right sensor data about it in order to make interpretation of reality?
And then how do you apply these mathematical models or machine learning models to see those
patterns that exist, perhaps through the world, from one end of the planet to the other.
So from an information perspective.
So my question is kind of that.
Like, it sounds like it would be great to be able to dump in all, like, all of the data.
Let's say if you just like got rid of all of the scientific knowledge that's ever been produced.
And you just started today with all of the sensors that exist in the world so that you have some ground truth.
And you just let the models run to say, where are you seeing patterns?
and then later on we develop language to describe what these patterns mean.
I mean, that would be on the one hand amazing.
It would take a long time.
It would take probably all to compute in the world and we'd have to like drain the sun to make that happen.
How do you then sort of apply this to the right size problems?
Right.
Like you've talked about this like plant biologist.
That sounds like very specific, very tight.
Now he's expanding because he doesn't have to worry so much about the data analysis.
These patterns will be discovered roughly as a result of him producing more data
and putting the data into the system, but at some point, you almost end up with too much noise.
So is noise ultimately a problem in terms of getting too much data, or is that not something that you're worried about?
I mean.
Okay. So there's like loads and loads of stuff to talk about in what you just said.
I know. I'm sorry.
No, no, no, that's fine. It's good. I end up sort of dropping these zip files and then we expand them and they're like all these like files and it's like going to all these different folders.
You like, right?
Yeah.
Yeah. I need to talk about that.
What can I say?
I think noise is a really interesting point.
We can talk about sources of noise in the journey from the real world to like understand.
And also I don't want to be like pejorative about noise.
Noise is beautiful, you know, so.
That's a lovely sentiment.
I'm not such a big fan of noise myself.
But I think the point about like data scale is also really interesting.
Maybe I'll say that first because like a couple of years ago when we.
when we were like, we have this idea.
We're building this like really cool interpretability.
Neural networks, we think probably no stuff that we're not aware of.
Maybe this could be a new scientific method.
Like we think we might have something here.
We were kind of envisaging something very similar to what you suggested.
Like passive data, sensors, robots, all of the data we will find all of the patterns
and it will be amazing.
What has actually happened kind of as our case study with,
with Matt, a very first case study has shown that there is actually so much low hanging fruit,
even in small data sets. Humans, God love them, which is really, really bad at, like,
finding these patterns manually ourselves. Like, we, we, there were probably like trillions of dollars
on the table hanging around in R&D data sets on servers just, just because we don't know how to
find them. I'm, I figure we're going to do that first. And then later, we will tile.
the known universe with senses and figure out.
Well, to that end, I mean, we're talking about biology, you know, you're thinking of like
medical discovery and stuff like that, but is this applicable to basically anything, like
materials, science, like, what, if I'm listening, what's a left field thing that maybe I could
potentially use this for?
Oh, a left field thing.
Yeah.
I don't know.
So I was having a conversation with my friend the other day who's, like, really interesting.
into like Brian Johnson and quantified self and health and longevity.
Don't die.
Don't die. I'm a big fan. I don't want to die.
And you can either. Yeah.
Yeah. So, so I mean, obviously I see everything in terms of data sets these days.
So I was like, hey, you know, give us your data. We'll run it through.
We'll find the patterns.
We are very much by design, domain agnostic.
To the neural network, it's all numbers.
doesn't really matter. In terms of like go to market and actually like serving this technology
to scientists in a way that makes sense for them and like fits in with their worldview and their
processes, that's obviously a little bit different. But yeah, under the hood, I mean, you can train
neural networks on anything these days.
Go ahead, Brian. Yeah, you mentioned one, the first case study. How many folks at least
to date are you working with? So it's not just the one case study. How many other folks have you
been working with. Yeah, so we've got, how many publications have we've got, yeah, we've, we've, we have
published four, uh, preprints. Um, we are also working on a collaboration with meta that should
be another publication soon. That's in, that's actually in materials, yeah. Um, we've done a couple of
other case studies that didn't make it to publication, because it was just validation, like, we found a
load of known patterns, but nothing's not super interesting. But, but across multiple different areas,
People are like, this is useful.
So you're proving out that it's useful to folks in a lot of different areas.
Yeah, we've got plant biology, meteorology, advanced materials, immunology,
catalyst-y stuff.
I guess that's advanced materials as well.
Yeah, oh, Alzheimer's, all of that medical, clinical stuff.
That organism thing.
Oh, ocean proteomics.
I want to get this in here again because I'm imagining people listening and being like,
oh, I'd like to test this out.
So we're not wrapping yet because I want to hear your backgrounds.
And Chris has some more questions too.
But if I am intrigued by what we're talking about right now, where should I go to start
working with your model?
You should email us.
We are in the process of standing up a self-service dashboard, which is obviously very, very
exciting. But yeah, very, very much. Also, what's a good email for you guys?
Hello at leap dash labs.com. Yeah.com. Yeah. Yeah. Checking out our blog page on our website
will give you a good idea of like the variety of different scientific domains who've worked
in what we've been able to find. Yeah. And you can follow me on Twitter for occasional rants
about how science is broken and how we must immediately fix it. Okay. Sorry, Brian. I'm going to jump in.
like, because I could, I could like, obviously like spool out this conversation, you know,
indefinitely. I guess it would be valuable to get a sense for where you guys in terms of like
where you are in terms of your startup journey. What is for like the next steps? What's the roadmap?
You know, I'd love to continue to talk about like fixing the incentive structure in science.
But at the same time, you guys do live in the capitalist system. We did put an investment into you guys.
And so, you know, we'd love to know kind of where things sit in terms of the, the evolution.
of the business.
Yeah.
Do you want the story from the start or just the fact that we're like...
Yeah, we're not.
Yeah.
Okay, cool.
So Jigel and I founded Lee two...
Two years.
Yeah, two and a half years.
I'm with...
Yeah, basically to continue some interpretability research that I'd been working on.
And initially, we were like, we're going to build an interpretability engine because
interpretability is really important.
Like, you can use it to detect bias.
You can use it to, like, predict.
failure modes on out of distribution data.
Oh, and like maybe you can use it for scientific discovery as well.
Sorry, just so I understand, like when you talk about like interpretability, what is the
format of the output?
Like, what do I get a report that says here's how to interpret what you found or is it
like something else?
So in interpretability in general, there are many, many, many different methods.
A bunch of these are like proprietary stuff at Leap and they can output information in
all kinds of different formats.
We're really leaning into using violin plots and bar plots at the moment.
I'm sorry, what plots?
Violin plot.
I need to show you a picture.
If you are curious about violin thoughts, visit.
Violin, like the musical instrument?
Hmm?
Like the, how do you spell that?
Violin, as in...
Is it?
Okay, okay, got it.
All right.
You understand it if you look at them.
Okay.
Maybe the short answer,
because I know time is tight,
is that there are many different ways
to kind of express the patterns that we find
in a human readable format.
We do like some charts and plots of various kinds.
We also provide logical rules
that allow you to filter the data
to kind of find the samples that support this pattern.
But there are many different ways.
Like data visualization is incredibly interesting.
Totally.
Yeah.
Okay.
So that's kind of like how it started.
Yeah.
Okay.
And so you went from interpretability into disco.
Yeah, because we decided that of all of the different use cases of interpability,
scientific discovery was the most difficult, and we should probably be that.
I appreciate the ambition.
It's the most important thing, right?
Also that.
Scientific progress is the bottleneck on humanity flourishing.
Like, it's the biggest lever.
So we want to.
Sorry, just like it does occur to me also.
Like, in terms of, let's say, like the last 2,000 years of culture,
like science has a very specific place in it.
But truth, reality, authenticity are aspects that are becoming even more important when you can
synthesize and generate nearly anything.
And so to your point, the faster we can get to an ability of almost like turning raw data
into intuition about reality, then that will actually settle a lot of the polarizing topics
of our time because we can simply, as you say, look to the data and have an interpretability layer
on top that essentially says, look, here are the patterns that are there that you as humans,
with your, you know, grandiose ideas about the world, but your very limited perspective on
reality, should know about what's actually happening here behind the scenes.
Like making scientific methodology better is the more popular on basically everything that I care
about. So totally.
Seemed like a good, yeah, good path.
So yeah, so we, sorry, in startup journey land,
we've done our like interpretability research that we've done, we've, we've, we've raised,
to seed rounds. After that seed round, we kind of decided to really focus more on the discovery
application of the interpretability research that we've done. Yeah. And so, so like we started
prototyping this system. We knew it needed to be automated because like lots and lots of scientists
can't train machine learning models. And this is okay. Reasonably. They can focus on what they're good
at and you know, you can do the other part. Absolutely. But it's not going to fly if we make them
train models for everything, so we'll do that. And that's fine. So we built this prototype system,
and it worked, which was incredible. And then like very, very, but it was quite like manual and
messy and stuff. You know, it was a proof of concept. How long ago was this? About a year ago.
So we'd gone from super scrappy prototype to like full automation end-to-end system.
And yeah, and now we're, now we're doing fully automated discovery. What's your, just quickly,
What's your tech stack?
Oh, God.
You need to talk to the CTO.
Okay.
I'm not going to, my background is as a research scientist, AI research.
So I use Python and Pytorch, and I used to use Map Plotlib, but now I get chat GP2 to make my plots for me.
Nice, nice.
I'm wondering, like, you know, are you guys, like, you know, on like a raw Nvidia compute, or are you using, like, you know, something else?
Like, what is sort of like the cloud solution there?
We're on GCP.
Okay.
Yeah.
We have, like, a distributed fancy autoML set up on there that, like, spins up the cloud.
and stuff. And then the front end is not interesting for me to talk about.
Okay. That means I don't actually know because I'm not a software development.
I say, okay. It's not interesting for you as in it's not interesting to you.
You're like, okay, it's incidental. There be tests and engineers and things like. I see. I understand.
It's a black box as far as you're concerned. Well above my pay grade.
Understood. Okay. So where are you guys at and then in your startup journey now then?
Yeah, so we are looking for our first industry pilots.
We're talking to some really good guys.
I'm very, very excited about that.
And we are just starting to raise our series A.
Exciting.
And how far into that?
So you're basically like, what kind of investors are you seeking?
We want stupid.
Google has opinions.
We're looking for investors that are familiar with D-TAC, right?
So like investors that were very early in deep mind or very early and anthropic or like very like long term big vision folks that get it and want to get in early and want to get it on the floor and like are familiar with developing a really groundbreaking breaking world changing technologies like that.
Yeah.
We've been we're also fortunate in that we've been I guess like building relationships with with with some funds that we really like for a little while now.
So, yeah, it's, it's feeling pretty good.
We're having conversations.
Oh, and I'm going to be in San Francisco from, we both will be in San Francisco.
July 26th.
But like three, three, three, three weeks.
Yeah.
Yeah.
Okay.
Coming up.
Yeah.
Great.
And where are you guys based typically?
London and San Francisco.
I could hear that somehow in the accent.
Yeah.
So as you can hear at this point,
I tried to come back into the conversation, and again, it didn't work out.
All I tried to say was, if you're interested in Leap Labs at all, look them up at Leap dashlabs.com,
send them an email if you want to work with them.
They're taking all comers at the moment.
I will have an email in the show notes as well as the white paper and all that other good stuff
in the show notes as well.
No, I think this is great.
This is super helpful.
I'm really excited about where you guys are at.
You know, I was thinking also, and I feel like this is like one of the things that got me
excited about the investment when we first talked. I'm kind of a fan of Alan Watson. He describes
this concept of the grid of words. And the grid of words essentially suggests that human language
could be understood as if it were on a graph paper. The words that we use to describe reality
are just the dots. And in fact, reality is made up of all the parts in between, all the
negative spaces. And so in a large way, what you guys are talking about is being able to map
and understand what those negative spaces are.
And so the more that we're able to actually kind of like blur out from the dots to see
like the entire picture, the better off will be in terms of understanding reality.
And so that's ultimately what I think you guys are building and applying, you know,
machine learning and AI to do.
So that's why I'm personally excited about it.
And I'm super excited that you guys are here at this part of the journey.
And, you know, your thought leadership will continue to grow, especially once you come to San
Francisco and start to talk to people here.
Like, it'll be infection, or infectious rather.
So anything else you guys want to leave with?
Just, yeah, like if you're a scientist and you've got some interesting data, even if it's
just a few hundred samples.
Oh, that's exciting.
So with this self-serve thing, are people going to be able to go to some place and, like,
upload a file and just like the thing's going to happen?
Are you, like, charging for this?
Like, how does that work right now?
Right now, this is all, we're trying to figure it out at the moment, right?
Okay, great.
But my hope is that we'll be able to make the SELSA platform completely free for academics.
And then, like, do enterprise sales.
Sure, I got it.
Actually, yeah, so on that point, I think this will be important for that, for both of those audiences, which is around, like, privacy, ownership, and IP.
Like, these things, you know, are whatever.
whatever they are. But obviously they're probably quite important to those different groups.
And so how does that factor into what you guys are doing?
So we can do on-prem, basically.
Okay.
Like we have a secure cluster and stuff.
If you've got like your own compute and you want to run disco on that, that's, that's fine too.
We can support that.
Yeah, basically, we totally get it.
Like we're all scientists too.
We want to protect your data.
Yeah, and we're like, we don't keep the data, we don't aggregate it, we don't, we don't sell it, we don't do anything nefarious as well, we are here for the science and lightly a bit more now.
Especially if you've got data, you've got this discovery engine, it's going to look at it in this like, it's like sort of, have you ever like seen those like infrared cameras like looking at like flowers?
It's like you can like see entirely, have you never seen this before?
You got to like check this out.
It's amazing.
So it's the way that birds and I don't know if it's birds, maybe it's bees.
Anyways, that's a whole different conversation.
But there's like different ways of seeing an infrared that allows you to see the world in an entirely different way.
And so flowers actually become almost like these landing pads.
They're like these targets that are so clear and easy to see when you're seeing infrared.
But like when you see them as, you know, in human eyeballs, you don't see it that way.
So I feel like that's kind of what you guys are offering in terms of this different way to see through the data and to get these insights.
I should make all the world a rose garden.
Love it.
Love it.
All right.
it there. Thanks so much, guys. This is exciting. Thanks, thanks, bye. See you all in the
day. See you soon. Cool.
