No Priors: Artificial Intelligence | Technology | Startups - AI-Powered Biological Software with Jakob Uszkoreit, CEO of Inceptive
Episode Date: August 24, 2023"Biological Software" is the future of medicine. Jakob Uszkoreit, CEO and Co-founder of Inceptive, joins Sarah Guo and Elad Gil this week on No Priors, to discuss how deep learning is expanding the ho...rizons of RNA and mRNA therapeutics. Jakob co-authored the revolutionary paper Attention is All You Need while at Google, and led early Google Translate and Google Assistant teams. Now at Inceptive, he's applying these same architectures and ideas to biological design, optimizing vaccine production, and magnitude-more efficient drug discovery. We also discuss Jakob's perspective on promising research directions, and his point of view that model architectures will actually get simpler from here, and be driven by hardware. Show Links: Inceptive - CEO & Founder - Jakob Uszkoreit | LinkedIn Inceptive Sign up for new podcasts every week. Email feedback to show@no-priors.com Follow us on Twitter: @NoPriorsPod | @Saranormous | @EladGil | @kyosu Show Notes: (0:00:00) - Creating Biological Software (0:06:54) - The Hardware Drivers of Large-Scale Transformers (0:14:32) - Challenges in Optimizing Compute Allocation (0:23:25) - Deep Learning in Biology and RNA (0:32:49) - The Future of Drug Discovery (0:41:41) - Collaboration and Innovation at Inceptive
Transcript
Discussion (0)
What would the world look like if we could create biological software that allows us to compile RNA?
That's the big question this week on the podcast.
Sarah and I are sitting down with Jacob Uskerite, co-founder and CEO of Inceptive.
Jacob spent more than a decade at Google, where he co-authored The Attention is All You Need Paper,
and several other papers set the foundation for today's AI revolution.
He has also started and led the research teams that transform Google Search, Google,
translate and Google Assistant. Now at Inceptive, he built biological software with the aim to make
widely accessible medicines and biotechnologies. Yakup, welcome to NoPriars. Thank you. Thank you for
having me. You worked at Google for more than a decade, working on many leading research teams.
You were really seminal in the original Transformer paper. And I think when I talked to the other
authors of the Transformer paper, people sort of in the know at Google, you're widely credited
with really coming up with the idea of focusing on attention, which was sort of the basis for
the attention is all you need paper. Could you tell you?
talk a little bit more about how you came up with that and how the team started working on it
and sort of the origins of that pretty foundational breakthrough in terms of the transformer.
It's really not that simple, right?
It's also really important to keep in mind that always in deep learning.
You can't make something, in quotes, really work that is maybe pretty far on, say,
the theoretical or formal end without really going deep on the engineering implementation side.
And it just has to be efficient at the end of the day, in my mind,
the one and only thing we know really works if you want to push your learning forward is to make
it faster and more effective and more efficient on a given piece of hardware. There's a lot of
evidence that the way we actually understand language, and that's something that then shapes
language in terms of its statistical properties, is actually somewhat hierarchical. And the best
piece of kind of just circumstantial anecdotal evidence for that is just looking at what the linguists
do, right? They draw these trees. And while I don't think that they're
really true, they're also definitely not always false.
And so they do capture some of the statistics that are inherent in language,
and probably language was actually evolved this way in order to exploit our cognitive
capacities really in a fairly optimal way.
And so you can safely assume that it is not necessary to go through the entirety of a sequential
signal beginning to end and maybe also end to beginning simultaneously in order to
understand it, but actually you can gain a lot of the understanding in air quotes by looking
at individual groups of, say, your signal, right? And ultimately, if you now are given a piece of
hardware that has the very key strength of doing lots and lots of simple computations in peril,
as opposed to complicated structured computations sequentially, then really that's actually
a kind of statistical property you really want to exploit, right? You want to imperil, understand
pieces of an image first, and then maybe that's not possible in its entirety, but you can actually
get a lot of it. And then only once you've done some of that, you put these incomplete understandings
or representations together, and as you put them together more and more, that's when you
disambiguate the last remaining, or that's when you get rid of the last remaining ambiguity at
the end of the day. And when you think about what that process looks like, it's a tree, and when
you think about how you would actually run something that evaluates all possible trees,
then a reasonable approximation is that you repeat an operation where you look at all combinations of things,
that's this quadratic step, right, that ultimately is at the core of this attention step,
and then you effectively pull information in for a given representation of a given piece,
the other representations of all the other pieces, and rins and repeat.
And it seems intuitive and it also seems intuitively clear that that's a really good fit
for the kind of accelerators that we had at the time that we still have today.
And so that's really where that idea came from.
And if you want to look at, say, the biggest difference is, for example, between
the Transformer, as it was described, and the attention is all you need paper, and some of
its ancestors like this decomposable attention model, the big difference is just that the
transformer was implemented by folks like No Arm, Industries, etc., in a way that's such an excellent
fit for the accelerators that we had at the time.
So one question that I've kind of heard people bring up is a lot of the behavior that we've
seen in Transformers, to some extent, is most interesting at scale, right?
You get interesting emerging properties.
Yeah.
And there may be other architectures that have equally interesting or perhaps more interesting
properties at scale, but there's sort of two impediments.
Number one is people just aren't throwing a lot of money and compute at it.
And two is the underlying accelerator architecture actually fits so well.
It is dramatically less performant to do other architectures, and therefore we may never actually
test them.
Do you think that's a true statement?
I think that the big question is, does it matter?
It would be really interesting to evaluate, especially if we can make them simpler,
to evaluate combinations of different hardware
and then models or architectures
that fit like gloves to those.
And I feel at the moment,
given where GPUs came from,
they weren't built for this.
Why would it be that they are anywhere near optimal?
If at least they were engineered for this purpose
and lots of people basically banged their head against walls
until they had this somewhat optimized,
but that's not how the basic architecture came to be.
And so you can talk a lot about, and reason a lot about, and I think that some of that is true,
the generality of basically really fast, scalable matrix multipliers and how that just does everything
in scientific computing really well, but there's still lots of bells and whistles, and there
are lots of specific tradeoffs, say, for example, things like memory, bandwidth, and ultimately
inherent parallelism versus latency. I don't think GPUs are at the sweet spot when it comes
to large-scale deep learning with respect to exactly those trade-offs.
And so it may very well be that if we actually try these combinations,
we might actually even quickly find something that's better.
When you think about how we get progress from here,
usually people think of software as driving the hardware, right?
Do you think we get accelerators designed for the large-scale transformer architectures
we already have or new hardware designs.
Like, it's chicken or egg a little bit here.
It's chicken and egg.
And if you look at the newest accelerator designs,
they are taking this into account to a significant extent, actually, increasingly.
So there are a couple of interesting examples.
We had a computer vision architecture that really was just an MLP called Mixer.
And while it wasn't significantly better,
it also wasn't significantly worse than the vision transformers.
And I think that already goes to show.
it's not that difficult. And especially if you simplify on the way, it might really be a possibility.
I will say one other thing, aside from efficiency, just really raw efficiency in terms of
the architecture is fit to the accelerator hardware. The other main contributor, I think,
to the success of this architecture was optimism and hope. So suddenly you were in a situation
where, for whatever reason, a bunch of things that people tried with this started to work.
and then more started to work
and that's not coincidence. It's really just
because ultimately the human
cycles invested to getting
all these things, all these diverse things to work
are ultimately fueled by
suspension of disbelief and
AKA Hope or whatever you want to call it
and that really
I mean the community became so
energized so quickly and then just try
everything under the sun and because
the fire was just a different one. The fire now was
oh look we have this thing where it just
works which is just not true
the reality is where you try something else the first time and you really have to work hard for a long period of time, then, lo and behold, sometimes it works. And if you do that many more times, then it will work many more time. And I think that's really what we're seeing. Where do you think people should invest that sort of optimism going forward? Like, what are the big areas that people need to work on to increase the performance of these systems or add memory or do other things that you feel are, if you were to sort of paint their roadmap going ahead in terms of making these really valuable performance systems, what would you feel?
focus on. I mean, I think there's one thing that still boggles my mind in terms of just from first
principles that it can't be optimal. And that is that if you think about it, the way you today
scale the compute that's invested in a given problem, right? Let's say the problem is what's the
response to a prompt in some large language model. Then ultimately, the way you scale that compute
depends on the prompt and how much, how long that is. The longer the prompt, the more compute you get.
And it depends on, and there's, of course, many different screws to tweak here, the length of the response.
There are many very hard problems where the response is incredibly short.
And you can, in many cases, actually formulate those problems very, very succinctly.
So you're not going to be using a lot of compute, even though the problem we know is really, really difficult.
Say, I don't know, prime factorization.
A problem like that simply stated big potential impact.
And right now, there's no knob that you can easily tweak as a user.
but also really there's no knob that the architecture can tweak itself
when it comes to then basically deciding,
oh, this is hard, I actually need to use more compute for this.
And ironically, and this comes back to a question that many people ask,
I think around, does it make any sense to train on generated data?
Because information theory, founding information theory, very clearly says,
nope, you're not going to get more information out of it.
You can do it all you want.
But there is an artifact, or there's an emission maybe even in that information, in that flavor of information theory, which is it doesn't take into account compute.
It doesn't take into account actually the energy expenditure necessary to generate that data.
So if you now think back to these problems, right, if you were to just let LLMs run, generate stuff and then train new LLMs or even the same LLM actually on that output, what you do is you amortize compute that was expended at some point.
time. And so now suddenly, right, you actually have models that if you retrain them over and over
again, they're starting to spend more compute on the same problems, but it's amortized over
all of these iterations effectively of the system. And that seems clunky. That just seems so
clunky that ultimately it should be something where at inference time, at runtime, the model
effectively can decide or maybe even query, right? So there's this notion of any time algorithms
where it might just depend on your resources.
If you have more time or more money,
then let it run longer.
But you don't want that to happen in cases
where the answer or the problem in question is simple.
You only want to do that in cases where it's actually hard.
And that right now doesn't work.
Because if you pose a very, very simple problem,
like 2 plus 2 to GPD4 right now,
and you write that in a very long-winded way in a prompt,
and you ask GPD4 to generate a very complicated answer,
then it will actually expend a ton of compute to add to two, to two, which makes no sense.
And so that, I mean, out of all the different problems that I currently see at a high level,
because it's how clear how you would exactly address it, that is maybe the one that boggles my mind most.
Yeah. Are there other big research areas that you're excited about right now or areas where you see
enormous progress being made? So in terms of foundations, I think different flavors.
of elasticity are
really interesting. So
you could actually claim
that a lot of these questions
boil down to the question
that I just, to basically this problem that I
just described, right, that
compute is in a certain sense very crudely
allocated. But you can look at different
incarnations of this problem. So another one would
be, why don't we have models
that in an elegant way manage
to consume, say,
visual sensor output
of different resolutions,
different sampling rates, different durations.
Right now, it's actually quite tricky to have,
other than maybe recurrent architectures,
a model that takes videos of different links,
different image resolutions,
or ultimately different densities,
if you wish, in different sizes,
and really elegantly adjusts compute
to what you really want to know about this,
or how difficult it really has to be
to generate the representations that you need
in order to do whatever you want to do.
And here, again, an example that makes this,
I think pretty clear is you can take a video,
you can scale it up, you can frame and interpolate with trivial algorithms,
and then run it again.
And if the problem you're trying to solve conditioned on that video is the same,
then I wouldn't want more computers to use.
But right now, that's what's going to happen.
You're going to use a ton more computer.
And so effectively, these types of, in a certain sense,
kind of elasticity or your flexibility of these models,
I believe our lack of techniques addressing those,
ultimately is incredibly wasteful.
I've seen increasing attention around like two different concepts in these general directions.
One is, I think it was some people at Meta that did depth adaptive transformers, right?
So just adjusting the amount of computation for each input and like a prediction on that, right?
And then I don't know how much more work has gone in that direction.
And then I think a number of people are more excited about doing test time search,
especially for problems like code generation where you can evaluate.
it with compilation or something
to sort of get loop of
success in the model
itself? I think it's super effective
in test time search. I do
think it's clunky because
it's not something that you can easily end to end
optimize. So basically this is
also what I'm what I was trying to get at
a little bit maybe with saying
some of these efficiency improvements that were not
yet really financing, I believe
would dramatically affect training time
and if you look at kind of how test time
actually affects training, it's just a clunky.
And I don't think we'll be able to optimize it as well, although as an engineering
in a certain sense, I don't know, hack could sound negative.
That's not what I mean.
I think it's an awesome hack.
As an engineering hack around this problem, it's really, really effective.
It basically comes back to this whole idea of amortizing compute in a certain sense,
with the stuff you already have lying around and memorized,
even though it was the humans that actually put it there in many cases.
in terms of adaptive, adaptive time transformers,
et cetera, we tried this universal transformer thing
actually a long time ago.
It just hasn't caught on,
and that's because it just doesn't work, right?
At this point, it doesn't work well enough.
It's not like it doesn't work at all,
but if it worked really well,
then because of the fact that compute right now
is this incredibly scarce resource,
we would see it everywhere.
And I think what that tells us is,
and I don't think here it's really just for a lack of trial,
probably there's too little experimentation,
but at least those known or proposed methods here,
they just don't work well enough yet.
So one thing that you've been working on
for the last few years is Inceptive,
which is really starting to focus on
how can you apply machine learning
and different aspects of software to biology.
Could you share a little bit about the company,
how you got interested in bio,
and what you view is some of the interesting problems there?
Yeah, so basically, I've always been interested in bio
and know nothing about it.
And that's a conundrum because it's difficult to learn a lot about biology when you're not in school and I didn't want to go back to school.
But at the same time, it always felt like something where there's a lot of headroom in terms of efficiency and actually also where maybe even alternative approaches, at least what you are interested in is really solving acute problems, where there's maybe a dire need for alternative approaches.
alternative to basically biology, the science that is trying to develop a complete conceptual
understanding of how life works. I don't have very high hopes for humanity to develop that
complete conceptual understanding to the level that we would need in order to do all the
interventions we want to do. We don't really have great tools in our toolbox, or we didn't
have them until somewhat recently as alternatives to understanding how it works and then basically
based on that understanding, fixing it if it needs fixing. And I think now we have an alternative
that's an extremely good match, and that's deep learning at scale.
We're really, we can potentially, to a pretty large extent, if not entirely,
whatever this even means, work around the following two problems.
Number one is we don't know all the stuff that's going on in life, right?
So we still just don't even have a complete inventory, let alone really understand all the
mechanisms.
And number two, we ultimately, even for the stuff that we do know, so far haven't real
in many cases, have been able
to come up with
sufficiently predictive theories
to really make that understanding useful.
A concrete example, here is protein folding, right?
Or basically, even if you just act
as if there are no chaperones,
there is no other stuff in
this environment in which folding or
whatever you want to call it, in which that process
in which the earliest kinetics
during translation happen,
even if you make that massively
simplifying assumption,
the theory just
wasn't practical, and it seems like deep learning is at least potentially a really good answer
to both of those aspects, because you can basically treat everything in quotes as a black
box, and as long as you are able to observe that black box in terms of whatever input output
pass enough, and at sufficient scale, you might go somewhere with that.
So Inceptive is pretty stealthy.
Is there anything you can share in terms of how you're applying deep learning or other
techniques to biology in the context of the company?
Yep. My daughter was born, my first child, and just that entire process gave me a really
fundamentally different appreciation for the fragility of life and a really wonderful one,
but also a pretty fundamentally different one. And so here we are, we have this new tool,
namely Alpha-Fold 2, that solves one of these fundamental problems in structural biology.
We have instances of a macromolec family that's basically about to save the world, and I basically
want to fix life because I now have this wonderful daughter. It became clear that
using the exact rules we had been working on at Google before and applying those to this
neglected stepchild, namely RNA, or more specifically at first MRI, could have massive impact
on the world. And ultimately, what we're trying to do is to design better RNA and at first
mRNA molecules for a pretty broad variety of different medicines. Infectious disease vaccines,
are, I guess, maybe the obvious first example given the COVID vaccines. But if you look at the
pipelines of Moderna and Biontech and all those companies, the at least potential applicability
of RNA, more specifically, is near the limitless. There's already now hundreds of programs
underway in different stages of development. That number is expected to climb hitting high triple
digits before the end of the decade. And now we're talking about a modality that might end up
before the end of the decade being the second or third biggest modality in terms of revenue
and potentially also in terms of impact. And if you now take that in terms of just trajectory
and look at how suboptimal in a certain sense the mRNA vaccines were when you compare it to
what's possible using RNA, just looking around in nature, looking at how,
severe the side effects were for what fraction of ultimately patients that received the vaccines,
how few people comparatively really had access to any of those vaccines when they really were
necessary and needed. And it seems like currently, if we look around in our toolkit,
the only tool we have to potentially change that quickly is deepering. So at Inceptive,
we think of this now as something that you could call biological software, where MRNA and RNA in
general is maybe the equivalent to bytecode that then forms the substrate, forms like
the actual stuff that the software is made of. And what you do is you learn models that allow you
to translate biological programs, programs that might look like some bit of Python code that
specify what you want a certain medicine to do inside yourself, inside yourselves, and translate those
programs compile them into descriptions of RNA molecules that then hopefully actually do what
you wrote, what you programmed them to do. And ultimately right now, if you look at mRNA vaccines,
our programming language is just a print statement, right, just print this protein. But you can
easily imagine that with self-amplifying RNA as one example, and with ribos switches, so-called
ribos switches, basically RNAs that change dramatically in structure or self-destruct in the presence
of, say, given small molecule or so, you can effectively have conditionals, you can have
recursion, and as a computer scientist, you squint, and you're like, oh, wow, okay, this is basically
touring complete, you have some I-O, and you kind of have all sorts of tools now at your disposal
to really build very, very complex, ultimately, medicines that then might also be produced,
manufactured, and distributed in a way that is much more scalable than anything that we've
been able to do so far. Protein-based biologics oftentimes don't make it to the market because
it's just not possible to manufacture them at scale.
If we wanted to medicate everybody in the world with all the protein-based biologics
that they should actually receive,
the real estate on the planet wouldn't be enough to make all the stuff.
But right now, if you look at RNA manufacturing and distribution infrastructure,
we're going to have 6 to 8 billion doses two years from now,
manufacturable and distributable across the globe.
And that number is going to go up really, really quickly.
At Inceptive right now in our lab, we can actually print pretty much
any given R-name.
And that's just something you can't do with small molecules.
You can't easily do with protein, certainly not at scale.
And that's not something that only matters when you have a product in your hand.
If you want to treat this as a machine learning problem,
you need to generate training data.
It doesn't already exist.
And so you also really want to have scalable synthesis and manufacturing,
which is unprecedented as a consolidation.
So your view is that you can actually search for the program
that codes for, let's say, the COVID spike protein at a certain amount with different
stability characteristics, with different immune reaction characteristics that doesn't need
cold chain logistics, that condition of whatever cell type, I'm saying in the future,
not inceptive today, but that's the goal of all of the 10 to 630 variants.
That's right. And it's not certain, I mean, ultimately it's not going to be a search, right?
Just like today, the output of an LLM isn't coming out of a proper search procedure.
It has to be a generation procedure exactly in the same way and for the same reason as you basically see it in large language models or image generation models.
But yeah, that's exactly the goal.
Because screening is just not going to cut it 10 to the 60th.
And that's really just one antigen that we're coding for there when we actually want to code for many and update those for any given.
For any given, yeah.
exactly. When you do personalized cancer vaccines, it is going to be many antigens for each patient
over time, right? And there's just no hope of basically tackling this with screening approaches
at all. Yeah, I'm excited to just get to the right answer without having to understand or
discover every single mechanic and do the mass expensive screens we have today.
I mean, that's really the big question. Are we here maybe at a crossroads where the discovery and
understanding is actually a hindrance. The hope to discover and really get it how this works
might actually be holding us back. And there is a pretty direct analogy to language understanding.
Computational linguistics and linguistics in general tried this for a while to develop
a sufficiently accurate and complete theory of language to make this really actionable.
Yeah, when you talked about how transformer model works, for example, I actually was thinking about
genomic sequencing where you used to do the sequential sequencing contig by contig and you'd have
these big chunks of chromosomes that you'd sequence through sequentially, and then eventually
you moved into an era where you just broke it up into tons and tons of tons of tiny little
sequences that were randomly generated, and then you'd reassemble it with the machine, right?
And that felt like a very interesting parallel or analog to what you were talking about
from a language perspective. It's effectively the same thing.
It is exactly. And the parallels are so striking, and they don't end there. So, yeah,
it's really, really interesting to see. And the invariant that I feel just holds through
across the board is that these formalisms that we make up in order to communicate our
conceptual understanding or intuitive understanding and conceptualizing explicitly is great for
education. It's also great for many other types of maybe that reasoning about them. It might
actually, because of our limited cognitive capabilities, really not be the right tool to actually
really predict what's going to happen with a given intervention. Yeah. And I think the other
point that I think really resonated in terms of what you mentioned was just if you look
at drugs, especially traditionally, we actually didn't understand how most drugs worked until
very recently. And so aspirin, we had no idea how it worked when it was taken out of the bark
of a U-tree or whatever in the 1800s. And it was fine. Like people were fine taking these things
that had minimal side effects. There's very popular drugs in the market like metformin that bind
a multiple targets. We still aren't sure exactly how they work. And so a lot of the emphasis right now
from a regulatory pathway for drugs is, oh, you need a mechanism of function or you need to
proven pathway. And all these things that create hurdles that don't necessarily help with drug
efficacy. And some of them might actually also be, in a certain sense, kind of, I should say.
It's a waste of time and money. If the thing works, it works. Yes, it's a waste of time and money,
and it might not even be true. And we have no way of telling. Because in the end, the ground
truth is, right, does it work and does it actually do more good than harm? And it's empirical.
And yeah, maybe there's really just, maybe that should be the focus. Yeah. And everything else should be
treated as something that we should at least do after we get the first take the first thing.
In that historical framing of we don't actually understand many of the things that have been
most important in medicine or if we've discovered their mechanisms after the fact, you know,
the end-to-end like black box like deep learning pipeline approach seems a little more rational,
a little less heretical, which I think upon first blush it certainly is controversial.
Yeah, I mean, the part that one can look at as blasphemous is that now suddenly you don't know the theory anymore that you're testing, right?
And you might never, because it's not clear to us today, as far as I can tell, that if there is a theory in that black box today, that we could get it out.
There are people trying, and I think it's worth trying.
I'm not super optimistic about that.
I think it'll work for some cases, right, where it's simple enough that we can get it.
I think there are many cases where it just isn't, like say, client.
and weather forecasting, I just don't think we're going to get it. We're going to get it
in the sense that we understand, I think we understand the Schrodinger equation and how that
could be used intractively, though, in theory, to just solve all these things. But that's
not practical. And to develop a theory that is both predictive and practical here might just
not be something we can put in our heads. Yeah. This is kind of interesting because I actually
feel like this, again, is the basis of a lot of traditional drug discovery from way back when,
as well as just the basis for how you think about genetic screens, right?
You'd basically do functional screens, so you'd mutagenize a bunch of organisms.
You'd look for output, and then you'd say, okay, I've identified genes that are part of this
pathway or output, and I can map in some ways that they're interacting with each other,
but before molecular biology, we actually didn't understand anything from a function perspective.
We just understood sequencing and output, right?
And so it feels like deep learning is really just a throwback to other forms of biology
that have been incredibly fruitful, but just with a new sort of technology modality to interrogate
these systems. Exactly. So how do you think about human augmentation in the context of all this
stuff? You know, how bullish are you on human augmentation and what forms do you think it'll take
in the near term? I'm very bullish on human augmentation in the very long term, but it's one that
I don't see intuitively. I think looking at our brains, even just physically, they seem to be
very focused, and this is not surprising, on RIO. And why would there somewhere in there be some
kind of computational capacity that if we just boosted our IO by a few orders of magnitude
could still cope. Why would evolution put that there? I don't know why. And so, yes, you could
argue, you know, maybe to do long-term planning tasks and so on and so forth, but sure,
let's bound it a lifetime. So, right, it's just not so clear whether there would have been any
evolutionary pressures to really make our capacity there much bigger than, say, some multiplier,
basically time on our IO capacity.
If you look at the number of tokens that you use
to train in LLM, and then you look at the number of tokens
or words that are used to train a kid, right, a child,
a human baby or a human toddler.
I mean, a human toddler is probably exposed to what?
Hundreds of thousands, maybe millions of words
before they can speak, like fluently.
But I think that's because we confuse fine-tuning and pre-training.
Pretraining is all of evolution.
Sure.
and then basically you arrive at this thing
that it's maybe doing something
that's completely in a certain sense
a completely irrelevant task at first
but it has all the capacity in there
to then with a comparatively small amount of data
maybe it's something in between,
but be then fine-tuned towards something
that we would regard as oh so advanced cognitively.
The compute has been amortized
over the last several millennia
of 10 millennia of humans
and we come pre-wired for language
and so it only takes a million tokens at the end.
Exactly. And now the thing is that you can now say, okay, great. So we come to rewired. Let's look at our wires and try to find language. That might not, it might not be that simple, right? Because, of course, it's this co-evolution and it's all fuzzy. And so how much we're pre-wired for it, how much language is in a certain sense also pre-wired for. It might be, it might be the case that it's maybe even impossible, right, to actually read out what it's pre-wired for from just looking at the wire.
Yeah. You can see circumstances where people are literally born without a hemisphere of their brain, or there's other sort of mass scale deficiencies brain-wise, and then things just rewire to effectively compensate. And so you have parts of the brain taking over other functionality that they're normally not designed for, which is also fascinating because it seems like certain parts are extremely specialized visual cortex, et cetera, and then other parts are basically almost general-purpose machines that can be reallocated.
I completely agree with what you're saying. I feel general-purpose machines,
is a really tricky term because, right, I mean, could they, could the brain after a massive trauma
rewired to do something very different? Fair.
I'm clear, right? So it could be that it's actually still specific, but it is in a certain
sense, general, namely preparing for a certain flavor of redundancy. And this is also why
I find AGI as a term particularly problematic, because I don't know what the general means.
I think they're referring to general toast chicken as part of, no, I'm just, sorry, really dumb joke.
I finally get it.
I'm sorry.
Finally all makes sense.
What's the theory of data generation at Inceptive?
I feel like I understand the mission you describe,
and then you need to go do wet lab experiments with observation
to understand all the properties of these sequences,
and you have to figure out how to do that efficiently, right?
Still a young company with all your pedigree and resources.
Yes.
I would love any intuition on that.
Yeah.
So let me try to get across how we think about this.
So number one, we look at ourselves actually as one anti-disciplinary team.
So it's not quite anti-disciplinary, although there is a correlation maybe with a lack of discipline
or disregard for fundamental discipline or disciplines and being anti-disciplinary.
But we think we're really in the sense pioneers of a new discipline.
It doesn't have a name yet, but it draws a lot from deep learning and draws a lot from biology.
We think ultimately designing the experiments or assays that we're using to generate the data,
that we need to then train the models in a certain sense is at the core of this discipline,
if you wish, because the experiments or the assays that were running,
they use the models that we're training on the data that their predecessors actually produced.
And so really, if you squint, then in a certain sense, I guess there was always this dream of,
and I think it's a pipe dream, of having the cycle between experimentation,
and then you put that into something in silico, something running on computers,
and then that informs the experiments,
and then you kind of iterate that cycle.
I think that's just, it would be beautiful and simple and nice.
I don't think it's really that easy.
So what you see at the incentive is actually there is not that one cycle,
although maybe now somewhere hazily there actually is that cycle too,
but by design, actually, there are tons of little cycles.
So, right, you started an assay,
and the first thing you do is actually you query a neural network,
and then you do some stuff, and then you get certain readouts,
and those, you then together with some other stuff
feed into yet another model. And then that
actually gives you parameters for some instrument.
And then you run that instrument on
the stuff that you've created. And so
it's really just this kind of giant mess
where the boundary
actually is increasingly blurry.
And so we actually think
that our work happens on the beach, because
that's where the wet and the dry meet in harmony.
Ah, huh.
And so initially, folks join inceptive
and they usually, most
of them, they come from, say, either
end quote side, right? They've spent most of their careers working on deep learning or
either robotics or biology. But ultimately, it doesn't take them that long to start speaking
some weird kind of creole of all of these languages and also think in these ways. And what
then happens is magic. It's really amazing because then you suddenly find solutions to
problems that say the biologists they were two years ago just wouldn't even think about.
and they work together with folks.
They would have otherwise maybe never even met.
And the results sometimes don't work at all,
but sometimes they really are magic.
That's a really inspiring note to end on.
Thanks, Jakob.
Thank you.
Find us on Twitter at No Pryor's Pod.
Subscribe to our YouTube channel.
If you want to see our faces,
follow the show on Apple Podcasts, Spotify, or wherever you listen.
That way you get a new episode every week.
And sign up for emails or find transcripts for every episode
at no dash priors.com.