This Week in Startups - TWiST 500 interviews with Cortical Labs, Turing, AND Mercor | E2159
Episode Date: August 1, 2025Today’s show:Alex is back with three more awesome interviews with founders on the bleeding edge of innovative tech.Dr. Hon Weng Chong walks us through the basics of biological computing and Cortical... Labs’ first-ever commercial computer running on living human cells.Turing founder Jonathan Siddarth unpacks the secrets of LLM benchmarking, and explains why even our most advanced tests need to get much much harder right away.Finally, Mercor founder Brendan Foody on how AI is about to reinvent the hiring process, and marrying the effectiveness of recruiters with the ease of online job boards.It’s three — count ’em, three — can’t miss TWiST interviews guaranteed to make you smarterTimestamps:(0:00) OpenAI’s GPT-5: When is it coming out? Is it going to be TOO smart?(08:06) Cortical Labs’ Hon Weng Chong on the electric connection between neuroscience and machine learning(10:20) Northwest Registered Agent. Form your entire business identity in just 10 clicks and 10 minutes. Get more privacy, more options, and more done—visit https://www.northwestregisteredagent.com/twist today!(11:27) Show Continues…(15:17) The extreme difficulty of going from the lab to a shippable product(20:00) .TECH: Say it without saying it. Head to www.get.tech/twist or your favorite registrar to get a clean, sharp .tech domain today.(21:05) Show Continues…(28:15) Why data is a factor of time(29:52) AWS Activate - AWS Activate helps startups bring their ideas to life. Apply to AWS Activate today to learn more. Visit aws.amazon.com/startups/credits(31:16) Show Continues…(42:44) Turing CEO Jonathan Siddarth explains why it’s so important to keep benchmarking our LLMs(47:24) What it means when a model “saturates” a test, and why benchmarks need to get HARDER(50:22) What happens with the LLMs can answer all of our smartest questions?(53:44) AI Agents train in gyms? Wait, really?(01:01:33) Coding teaches models how to think, and more training mysteries don’t understand(01:03:11) Brendan Foody from Mercor explains the “matching problem” that makes hiring such a pain(01:07:02) How Mercor combines a job board’s distribution with the value of a recruitment agency(01:10:49) Brendan recalls building his first AI interviewer in his college dorm(01:20:16) Mercor has the opposite of a retention problem and crazy growthSubscribe to the TWiST500 newsletter: https://ticker.thisweekinstartups.comCheck out the TWIST500: https://www.twist500.comSubscribe to This Week in Startups on Apple: https://rb.gy/v19fcpFollow Lon:X: https://x.com/lonsFollow Alex:X: https://x.com/alexLinkedIn: https://www.linkedin.com/in/alexwilhelmFollow Jason:X: https://twitter.com/JasonLinkedIn: https://www.linkedin.com/in/jasoncalacanisThank you to our partners:(10:20) Northwest Registered Agent. Form your entire business identity in just 10 clicks and 10 minutes. Get more privacy, more options, and more done—visit https://www.northwestregisteredagent.com/twist today!(20:00) .TECH: Say it without saying it. Head to www.get.tech/twist or your favorite registrar to get a clean, sharp .tech domain today.(29:52) AWS Activate - AWS Activate helps startups bring their ideas to life. Apply to AWS Activate today to learn more. Visit aws.amazon.com/startups/creditsGreat TWIST interviews: Will Guidara, Eoghan McCabe, Steve Huffman, Brian Chesky, Bob Moesta, Aaron Levie, Sophia Amoruso, Reid Hoffman, Frank Slootman, Billy McFarlandCheck out Jason’s suite of newsletters: https://substack.com/@calacanisFollow TWiST:Twitter: https://twitter.com/TWiStartupsYouTube: https://www.youtube.com/thisweekinInstagram: https://www.instagram.com/thisweekinstartupsTikTok: https://www.tiktok.com/@thisweekinstartupsSubstack: https://twistartups.substack.comSubscribe to the Founder University Podcast: https://www.youtube.com/@founderuniversity1916
Transcript
Discussion (0)
It's really important to benchmark LLMs to figure out where they are good at,
where are gaps in the models, so that you can generate data that could help the LLMs get better
at those specific tasks.
One big challenge, Alex, that I see today in the overall evaluation and benchmarking
market is that a lot of the evaluations are somewhat academic and somewhat synthetic and don't
connect to real-world applications or real-world use.
Ideally, you want AGI to progress in a way where the models get better at useful tasks.
This Weekend Startups is brought to you by Northwest Registered Agent.
Starting your business should be simple.
With Northwest Registered Agent, you can form your entire business identity in just 10 clicks and 10 minutes.
From LLCs to trademarks, domains to custom websites, they've got you covered.
Get more privacy, more options, and...
and more done. Visit northwest registeredagent.com slash twist today. Dottech. Say it without saying it.
Head to get. Dot tech slash twist or your favorite registrar to get a clean, sharp dot tech domain today.
And AWS Activate. AWS Activate helps startups bring their ideas to life. As you build and
scale your business, activate credits grow with you to support your changing needs.
to AWS activate today and receive up to $100,000 in credits.
Visit aWS.
com slash startups slash credits.
Hey, welcome to this week at startups.
You may notice I am not Jason Kalakanis.
He is on the road this week.
We're going to do a quick show without him.
I'm your host, Lon Harris.
Joining me, as always, Alex Wilhelm.
Hello, hello.
We have three incredible, amazing Twist 500 interviews coming up right now that Alex did all week.
Don't move from that seat.
But first, I do think we've got to talk about GPT5 a little bit out.
Yes.
Yes, GPT5.
So this has gotten a lot of play.
But Sam Altman, one of the founders of Open AI and kind of what you might call the face of
AI from the software side, Jensen being the hardware face, if you will, went on a podcast
with a man named Theo Vaughn, and they were having a pretty candid conversation about what they're building.
And Sam says, you know, I was playing with GPT5, and I gave it a challenge, and it did it very quickly.
and he talks about how it sat him back
and made him think about what they're building
and the quote that really went viral was
he was like, you know, there are moments in time
in science when people sit back and go
what have we done?
And you know, on one hand, Lon,
I have three things here.
First of all, Sam Allman loves to hype things up
and then get mad at everyone when they get a little overhyped.
He's a master at PR.
So part one.
Two, if you go back in time to GPT3,
even maybe even
There was a lot of worries that as this technology advanced and progressing got better, it will be misused.
And we're seeing that now when people are using voice cloning to trick elders out of their money and
AI is being used in more and more nefarious circumstances.
But we also at the time had a lot of people thinking that there was going to be a lot of improvement
and things were going to get much better more quickly.
So when I think about his GPT5 comments, they don't actually feel that different to me than
what we've seen with GPT4, GPT3, GPT2.
So that's, to me, this probably does feel that impressive to him as he felt blown away before.
So to me, I don't think he's being cynical.
I don't think he's overhyping here.
I think we're just seeing improvement.
And I don't think we know exactly where it's all going.
I mean, I think that sounds reasonable enough.
And look, I'm sure he's playing around with GPT5 and is genuinely impressed at what it can do.
Like, I think there's two levels here.
Like, even like, I was very cynical about AI early on.
Even I have had experiences like that where I'm impressed by what it can do and it sort of raises
the bar on what I thought was possible.
I'm not discounting that experience at all.
I'm sure he does have that experience.
But I also think this is a weird and unfortunate narrative that we've sort of fallen in.
And Jason has talked about this many times that like real science and real technology kind of
chases sci-fi.
And I feel like sci-fi introduced this idea that AI is scary and that AI is scary and that AI
is going to be thinking for itself and all of those things like from sky net and terminator to
Scarlett Johansson and her evolving beyond Joaquin Phoenix and wanting to go off on her own like
I think that's always spoilers so spoilers for her sorry everybody uh I feel like that's always
the fictional narrative around AI and I do feel like that has an impact on real world AI CEOs
and they're trying to be part of that narrative and play into that narrative because it's a helpful
narrative for them. We're creating iconic, legendary history-shaping technology that's akin to the
atomic bomb or the discovery of electricity. And I mean, like, it plays into a friendly narrative if you're
the CEO of an AI company. That's absolutely true. No, you're not wrong. I just think that AI
has put up enough in terms of real results, real progress, real products, real services, that to me,
we're just talking about how correct they are
versus how incorrect they are
when they make those large pronouncements.
But I do think the context here is that open AI
is expected to release their open weights model,
their kind of open source-ish project sometime soon,
and then GPT5 is anticipated,
we think sometime in August,
which means that we're going to get past the hype
and more into the testing.
So, you know, will GPT5 be as good as it's expected to be?
Well, we're going to have to find out pretty soon,
and that's encouraging.
Pretty soon, in fact,
the media is sort of saying,
you know, early August or sometime in August, the Sharps at Polymarket, of course.
Yeah, yeah, we were looking at this.
Yeah, not content to wait and see what happens.
There's already, of course, wagering going on.
There's already a market going on for this.
25% are betting that by August 5th, which is early next week, that would be we are right on the verge of GPT5 being here.
Yeah.
One in four.
Yeah, one in four.
And then by August 15th, people think it's a two-thirds chance that it's out.
So my read of this lawn is that basically middle August is kind of the over-under and that
after that will be a surprise earlier than that will be a surprise.
But we're only a couple weeks away.
That's what's exciting to me.
That's the cool thing.
Remember how exciting it was when XAI dropped GROC 4 and they talked about GROC4 heavy and
then the GROC4 coding?
Then we got the Kimmy K-2 family of models from Moonshot.
Then we got the Z.AI, I forget, the GLM 4.5 series.
The point is it's been a flurry of models and Alibaba's Quinn models are improving.
Yeah, we're in the middle of that open that meme that you are here the world's greatest
AI model like we're right in the middle of another cycle, yeah.
But the question is, will this be just another iterative improvement?
And if so, Open AI is going to take a lot of egg on its face?
Or is it as big of a jump as maybe GPT 3 to 4 was or larger?
And if that's the case, it does change the entire fabric and landscape that the technology
world plays on. But, Lon, let's talk to some founders. I love it. Let's do it.
All right. So we have three interviews for you today. One, cortical, then we're going to talk to
Turing, and then we're going to talk to Murkor. Each one of these was an absolute treat.
Cortical is probably the most out there in that we're talking about kind of like hybrid
biologic and digital computers. It's, uh, it's an interview I chased down for a long time.
It's super fun. And then Turing and Mercor, a bit more traditional in the software space,
but incredibly interesting founders, incredibly interesting companies,
and especially I think in the case of Merckar,
a vision for a company that could become incredibly instrumental
to how labor functions in the future.
So these are a treat.
I had a blast.
Please enjoy them and say hello to three of our founders from the Twist 500.
Hey, welcome back to Twist.
This is Alex.
Today we have another Twist 500 interview with a company
that has been my white whale for several months,
but I'm so glad to have them on the show.
We're going to talk about Quarterly.
Now, here at Twist, we love chips.
We love talking about what GROC is building to handle AI inference.
We loved that etched is building ASICs just for LLMs.
We've talked to X-Tropic and their thermodynamic computing paradigm.
But all that's entirely silicon-based, more or less.
Cortical wants to bring the world of biology to the world of computing in a way that when
I first heard about it, I thought was science fiction, but they've taken it out of the lab
and they're commercializing it.
So please join me and welcoming to the show.
It's Cortical Labs CEO and co-founder, Han Wing Chong.
Han, how you doing?
Hi, good things.
Thanks for having me on the show, Alex.
I'm so excited.
Okay, so you guys are fusing biology and I would say digital computing in a way that's
really interesting.
But I think to help everyone understand how you got to where you are.
Today, we should go back to the earlier days of the company when you guys were experimenting
with your technology and you used it not to play Grand Theft Auto as everyone wants to do
with generative AI, but instead to play pawns.
So, Han, can you tell me about why you chose that and then how the experiment went to prove out what corticolabs wants to build?
Yeah.
So, you know, if you take a few steps back to, I guess, 2017-2018, there was, I guess, the first AI boom that happened.
That was, I would call it the convolutional neural network reinforcement learning boom.
I had just exited from my last business, and as part of that, it was a MetTech business that gathered a lot of data.
I was looking at using machine learning to do automated diagnostics and really got deep into the rabbit hole of machine learning and artificial intelligence.
And I came across a paper written by Sir Demis Saba from DeepMind, who wrote that the machine learning,
AI community had to go back to its roots in neuroscience because that's where most of the initial
discoveries were made.
Founders, if you've got a product, maybe you've got some customers or even just a little bit
of traction, guess what?
You've got yourself a startup.
And it's time to make things legit.
You want to be official.
Tighten it up.
Investors don't want to wire money to a Gmail account with a PO box.
No, they want to know you're a serious person and that your company is incorporated.
In 10 clicks in 10 minutes, you can follow your LLC, get an actual domain name, launch your official website, claim your business email, and even start fast-tracking your trademark application.
That's right.
You need Northwest registered agent, the service that's going to do all of that for you.
With NWRA's privacy by default option, they're going to use their address for all your public documents, not yours.
So you're not going to get a ton of spam or junk mail.
And you can also get a legit address and working phone number with their virtual office setup.
So get more than just an LLC.
Get your entire business identity.
Go to Northwest Registeredagent.com slash twist and show the world you're in business.
So I did exactly that.
I went to the neuroscience department of the University of Melbourne, which is my alumni,
and spoke to the researchers there and asked what's exciting in your world?
What are you working on these days?
You know, applying machine learning and AI to the new assistant.
science world. And one thing that really kind of stood out to me was they said, oh, we have this
device called the multi-electrodor array. It's not a new device that's been around, I guess,
since the early 2000s, where you can actually itch in electrodes, in petri dishes, where you can
grow neurons on top of them. And because neurons communicate electrically, and because chips run
electricity, you now have this common language between the biological system and a compute
system.
Electricity is the shared language between both neurons and chips.
Absolutely.
I mean, do you think about it, that's how neuralink works, right?
I mean, the joke we have at Quirical Labs is that we are the inverted neuralink.
Yes, you've taken the brain out and put it into the computer versus putting the computer
into the brain.
Correct.
Exactly.
I had read the paper from Demis.
I was inspired by it.
And the first thing that Demas tried to do at Debe mine was he tried to get the machine to play
palm.
So I said, why don't we do the same thing?
Yeah, yeah, yeah.
Exactly.
So quite a lot of the first few attempts were,
weren't really successful.
But, you know, fortunately, these neurons are very malleable,
and they're very dynamic and they self-organize.
And when we realized that, you know,
if we were trying to tweak the system while they were learning,
you would end up with these very weird, so-called attracted dynamics
where, because, you know, it's like how to describe it,
when you're trying to find somebody in a shopping wall,
but the other person's also looking for you
and you're looking for them.
And so you almost never meet
because they're always moving around in that same pattern.
That's what we were doing.
So what we ended up doing was we said,
we're not going to change some parameters,
we're just going to leave it fixed.
But we are going to, you know,
flash them with different, like, signals and so forth.
Maybe hopefully that way they wouldn't actually be continuously,
like, we wouldn't be wandering around
trying to find each other.
So, Han, this is actually when I,
I fell in love with what cortical was doing because according to, at least NPR's reporting of this,
you guys provided kind of a positive and negative response electrically to the neurons that were interacting with the chip.
And you gave a burst of, I guess, unfortunate white noise to the device if it was the wrong action and then a positive organized burst of electrical activity if it got it right.
So essentially, you very politely shocked it if it did bad pong and you rewarded if it did good pong.
Kind of. I mean, I wouldn't say it's not really a reward or a shock kind of thing,
because at the other day, we're still giving them electrical signal. It's just a structure of it.
And this was actually driven by us coming across a theory developed by a neuroscientist,
based in UCL in London, my name of Carl Fristin, which is actually really interesting,
really fascinating, called the Free Energy Principle. The Free Energy Principle,
The FNaginti principle kind of posits is that the brain, or all biologically intelligent systems, are actually generative models.
What we do in what it's now belief is happening in our brains is that we're not reactive machines.
We're actually predictive machines.
And so we're actually generating hypotheses about the world that is external to the brain in the brain as simulations.
and then we're using the census that we receive
and the ability to affect the world
as many experiments to prove and disprove the hypotheses
that are generated by the brain.
Okay, so you guys did the Pong experiment
and showed that you could, in fact,
create a hybrid biologic and digital computer
that could play Pong reasonably well.
It wasn't going to set world records for Pong playing.
Now, take me from there to the CL1,
your first commercial device that kind of brings this technology out into the world.
How hard was it to go from successful experiment to commercialization and creating an actual
shippable hardware product?
Oh, really hard.
I would say this is probably one of the hardest things I've ever had to do because there's
some element of doing something really cool, something like really novel and so forth.
But more importantly is that if you're doing it as a commercial offering, it has to work a lot,
like at least most of the time rather than some random time.
So it has to work very reliably.
You know, you do a lot of things that are very boring that no one really cares about,
but it's important and vital for the product and the machine to work.
So for instance, life support plumbing, you can't publish that.
That's kind of boring. Everyone knows what you need to do.
But to actually go through the weeds of like developing, prototyping, and testing is,
is quite difficult.
So it did take a lot of work,
but what was actually the motivation for it
was that we had a lot of our colleagues
reach out to us
after we had published the work
and they were asking us,
where can they buy this machine?
Not this machine,
but the machine that we had used.
What software did we had to write?
Whether we could help them if there were any questions.
You know, we saw this recurring pattern.
I said, look, you know,
people shouldn't be continuously reinventing the wheel, right?
Like a round wheel works really well,
and we know how to make it,
maybe you should focus on building the rest of the car
and we can build the wheel.
And then you just use our wheel.
If you think about the explosion in AI, right?
It had a very fertile ground
because the groundwork had been laid by decades of gamers
buying GPUs, keeping the likes of AMD and Nvidia alive
until the AI systems came online, right?
The venture capitalist, you're welcome
for all my purchases to play games.
Exactly. So I think this is something that we saw that and we're like, well, you know, somebody's going to take that leap and try to support the research community. And we decided that that was going to be our thing going forward. Mind you, we still do a lot of research work, you know.
Of course. We have quite a few publications that came out. We have some very interesting stuff that's happening in the lab at the moment. But we think that this is more important that no one company can do this.
by ourselves. And we need to actually expand this out to everyone who's interested and try to
reduce the barrier of entry for people to get into the space.
Okay, I want to talk about commercial applications in a second. But first, I want to give
people a quick look at this. So if you're on the video version of the podcast, you'll see what
I'm about to show you. First of all, Han, what is this here? I believe this is from the Pong era.
It's a petri dish with what appears to be a small dish in the middle. So what is this?
Yeah, so this is what I was sorry about, the multi-electric array.
It's got a well, so that round thing, that cylinder there, is used to contain the, what we call the cell culture media.
So the cell culture media is that liquid that bates the neurons to keep them alive.
It's very similar to, I guess, cerebral spinal fluid, you know, that bathe our own brain.
As these neurons are processing, they're consuming glucose.
But at the same time, they're also producing lactic acid, just like running.
You end up with the cramps.
So, you know, one thing to be here we try to do is prevent it from getting too acidic.
And we use the CO2 as a buffer for that.
Hon, I'm just imagining now, like, you know, the dog didn't eat my homework.
Instead, my computer got the cramps, so I couldn't do it anymore.
Yes.
I want to connect this to this image right here.
This, I believe, shows a closer up of the neurons actually spread across a similar electric interface.
Yeah, yeah. So if we remember that image of the multilelectric array with that cylinder thing, in the center of a cylinder, there is actually a CCD or a CMOS sensor.
This is actually the same sensor that you have in your cameras.
And each of those square grids are actually a sensor that captures the electrical productivity.
It's also somewhat photo receptive. So that's why we have to keep them in a double.
We all understand the importance of a crisp, memorable, easy-to-spell domain name.
One of those names you can say over the phone and people know how to type it in without asking
you the spelling. But let's get real. The good ones are either taken or there's some poacher
who's holding it and waiting for some huge payday and they don't reply to you. Even if you want
to pay for a premium domain, you don't want to use up all your runway on a domain name. That's just
the truth for a startup. You want to put that valuable cash back into your
startups operations. So you should consider this, a dot-tech domain. You can get a clean, crisp,
super memorable name for your website and company and signal out loud to your customers and investors.
We're a tech company that's instant branding for you. That's why over 500,000 founders have
collectively raised over $5 billion in investment, building their companies on dot-tech. So skip the hassle,
head to www.w.gat.comet tech slash twist.
Or go to your favorite registrar and grab your dot tech domain today.
And so you'll see things like, you know, the large long piece that's going across,
that's an axon, and all the stringy bits are the dendrites.
So if you think about that, that's just one segment, and 50, that's microns.
That's actually the width of a human hair, 50 microns.
And so, yeah, that's how small it is.
and the level of connectivity,
the level of self-organization as well
is actually pretty phenomenal.
It's so, this is, this image just,
I've actually stared at this for probably 20 minutes
just straight because it's such a,
it's literally to me like 1980s science fiction,
but just actually real and now turning into a commercial product.
Like this is so cool.
And I just want to show everyone,
the culmination of all this work that you guys have done
at Cortical Labs is this bad boy.
This is the CL1.
If you're watching, sorry, if you're listening to the audio version, imagine a very long toaster with a clear plastic top and lots of tubes.
And I believe this bad boy can keep neurons alive for six full months, Han, going back to your point about the plumbing for life support, yeah?
We'd say six months because it's just easier to explain it that way.
But actually, you can keep, you know, we as a lab and other people have tapped them alive for, for, you know, months, even years kind of thing.
When you're talking about trying to do large-scale experiments, you need a lot of samples.
You need a lot of units to get statistical significance in power through a large end.
We wanted that scale.
We wanted that longitudinal ability to study the neurons, but we also didn't want to have to spend a lot of effort doing it.
So a lot of, a bit of engineering went into a bit of plumbing to build this system that encapsulates the neurons in a sealed environment, which is really important because these neurons don't have an immune system.
And so, you know, if you had COVID and you coughed into it, they would actually just also get COVID.
So your computer could literally catch the cold.
That's really funny.
Yes, pretty much.
You know, don't eat bread next to it because you'll get the yeast and mold that goes into it and they'll kind of kill the cells.
But we also built in our own neural interfacing system.
You don't really see it at the base of the CL1 has a compute unit.
So we have FPGA and what do we call it general purpose, compute units in there as well.
That's where we digitize the signal.
We then put it through our very tight processing loops.
And we're developing APIs and SDKs for this.
so that, you know, as a programmer,
you don't have to go deep into getting these things
to stimulate the neurons with high precision,
but, you know, a lot of code.
You can just use our API,
and we've developed a DSL domain-specific language,
to express how do you want to stimulate these neurons
at individual channels at low-latency loops?
That brings me to kind of,
I think the question on everyone's mind
who's listening to us right now, which is, okay, this is awesome. You've made a digital
biologic brain. What's it good for? And I think that this is actually where my knowledge
really kind of runs short because I'm not sure what compute loads, what experiment types,
what commercial applications, the CL1 and its successors, are going to be best at. And so
when we think about the market that cortical outs is going after, how big is it?
I think the two most interesting things that we've learned a long way about the system,
here is that they do two things really well. Well, actually three things, but that third thing is
attached to the second one. Firstly, they use significantly less energy than your traditional
compute. They're not a phenomenon-based architecture. The energy is derived from the glucose
in the actual cell culture media. So doing it back on the envelope calculation, we came across
some really phenomenal numbers. So for our original upon experiment, we grew about 800,000 to a million
neurons. And we did a very rough estimate where we just said, what's the glucose content in the
media? How often are you changing the media? So therefore, if it retained a media,
you know, X number of times a week, and there's this much of glucose and they're combusting
all of it, assuming, because they never combust 100%, you know, how much energy were they using?
And it turned out it was actually 10 to the minus 4 watts of energy. So that's 0.001 watt of energy.
It's probably even less than that because that's a massive overestimation.
But still, even with massive overestimation, it's effectively zero.
And glucose, last time I checked is sugar water.
So it's nice and cheap.
Exactly.
I mean, if you think about it, you and I as walking reference, you use only 20 watts in our brain.
The guys at Xtropic were telling me this.
They're like, you know, we have this amazing probabilistic computer that runs on 20 watts
and it's literally inside your head.
And they're working on thermodynamic committee, which is different.
But I've been thinking about that ever since they brought it up.
Okay, so clearly this is not going to be used right now to train the next major LLM,
but the people who are purchasing these CL1s that will come out sometime next year,
what do they want to do with it?
I think the thing you understand here as well is that LLMs could not exist 20 years ago, right?
Even if we had the compute available, we could not make it work because LLMs required large datasets.
I mean, they're called large for a reason, right?
We've got into this point of, what is it, 40 years, what are we not, 20, 25, let's say 40 years of the public internet being this repository of free language training data.
There are lots of other domains that are not language-based that do not have large data sets that are publicly available.
And this is the second point that we've discovered along the way is that if you do a hit-to-hit comparison, and there's a paper.
coming out. Actually, this was in Europe's about two years ago where we put it in the RL Workshop.
It's one of the posters.
You took reinforcement learning agents.
So we did an experiment where we took three of them, DQN, which is your classic AlphaGo.
It's kind of, you know, I guess the gold standard, but it's kind of old now.
There are better algorithms out there than what's so-called sample efficient.
And we put that against the biological system and we get to the same game.
But we said, here's the one thing.
We're going to constrain the reinforcement learning.
agents so that they get the same amount of data that the biological system gets, right?
Because if you think about reinforcement learning systems, nobody really talks about this.
But the way they actually learn in a simulation is that they spawn millions of parallel
processes and they speed up the time of the game by 200, 300 fold.
And so if you read the Alpha Star paper or something like that, Deidmind kind of said that,
you know, if you were to train a thing or human to play at the amount of data that Alpha Star was playing,
it would take 400 years of continuous gameplay to get to their level of performance.
And so what we did was we said, we're going to sample constraint it because if you think about this,
the real world, you can't speed up time. Time ticks at the same rate that you and I have.
And if data is a factor of time, right, say bit rate and so forth, then if you,
And if you wanted to get, you know, say 10 minutes of data, you would have to spend 10 minutes
collecting that data.
Knowing that, you know, these systems use far less data when we did the comparison with
the reinforcement agent and, you know, data being a threat of time, we realize these things are
actually potentially very good for problems that don't have a data set, problems that are real
time specific.
And that is actually the vast majority of the domains that have not actually been touched by language.
This is when I get really excited.
And I don't want to make you project the future too much because that's not really fair.
But to me, with neurons that we can interact with the software, it seems like we've managed to take kind of the best of both worlds, the programmability of computers, as we understand them, and also the probabilistic low power consumption and kind of generalized intelligence of brains and brought them together, Hans.
So to me, in time, it feels like we should be able to use both, you know, sure, our mass GPU clusters that everyone's building, but also we should have probably large tanks of biologic, you know, neurons that are also helping us do a lot of, a lot of work.
Is that too kind of science fictiony, or am I on the right path?
We're all familiar with AWS, Amazon Web Services.
That's the cloud platform that powers so many of your.
favorite brands. But do you know about AWS Activate? That's their program for startups, where they
provide up to $100,000 in AWS credits for all startups. Whether you're backed by an investor or
you're bootstrapping. Money is time to keep innovating. Time to delight customers and time for you
to keep gaining traction. You need runway and AWS is Activate is going to help you with that runway.
We hear this story from so many of our founding university companies. They're finding product market
fit, the words getting out about their product, just a little boost. Finding some savings here or there
to get a little extra time can be the difference between getting traction and bringing in revenue
or, hey, let's call it what it is, running out of money, okay, and shutting down. AWS knows this. That's why
they've created the ultimate toolkit for early stage startups looking to boost growth. With
AWS activate, you're going to get up to $100,000 in AWS credits, hands-on support, and
training plus exclusive discounts with some of our favorite companies and tools.
So start getting the support you need at every stage of your startup journey.
To learn more, visit aWS.
Amazon.com slash startups slash credits.
That's right.
AWS.
dot Amazon.com slash startups slash credits.
I think you're on the right path.
I mean, the thing about it is that we're good at some things that machines aren't.
And machines are good at things that we aren't, right?
So this is called Moravex paradox, particularly in the robot.
And I think this is the motivating fact for why Elon set it up things like NeuroLankteri and the BCI people,
which is that as AI gets smarter and better, we also need to skill up, right?
They're getting, so if you think about it, the machines are getting better at what we reserve as traditionally human or biologically centered tasks.
It's not beyond the realms of possibility that we could also go the other way, assuming that the BCI has really become a thing,
where not only do we utilize our brains for the low-power, stochastic computation,
the publicistic computation.
But we can also just, you know, have a, I don't know, an iPhone chip in there that can just
give us the square root number at the speed of thought.
Right.
So I can essentially allow compute to handle the math that I don't want to do myself, but all
the learning I can use my brain for.
So once again, you guys are kind of neuralink in reverse.
So look ahead of just a couple of years, not super long term, but the CL1 gets into the
market next year, customers.
You guys are also going to have an API to let people access the technology remotely if they
don't have the lab setup necessary to run this thing.
What's the next generation?
What are you guys going to build after the CL1?
What is the CL2?
Well, I think the CL2 is, we've been thinking about it,
and there were a lot of features that didn't make into the CL1
that we'll probably start thinking about putting into CL2.
I mean, the things, your standard things, right,
make it like smaller, make it, you know, probably cheaper to obtain,
easier to program, you know, maybe more modular units.
there are quite a few things that we've been thinking about.
The other question is, does it seem like it to the CL2 or does it get pushed out to the CL3
is a discussion for the team?
But again, I think it's still a little bit premature.
What we really want to do is get the CL1 out to our partners, our research collaborators,
and really get their feedback.
What does it work?
So that way we can sort of triage and prioritize what we want to do for the CL2.
No point of putting in features that no one else.
once. I mean, we might want it, but we have the ability to just spin up whatever, CL3, 4, whatever
internally. But, you know, if there are features that are missing that we have on our list
that's kind of lower down our sort of priority, but it would, you know, tremendously help the community
get going faster or do more work for less. I think that's something that we want to be prioritizing.
You talked about using mice neurons at the top of the show.
And anyone who knows anything about biology and experiments knows that a lot of mice and rats die.
So that is kind of par for the course for how we currently handle ethics.
Are mice neurons and human neurons radically different?
Are they relatively fungible?
Silly question, but I'm not sure about the answer.
Before I get to an ethics question, I thought we'd start there.
So this is the really interesting thing.
We used to think that there was no difference between a human and a mouse neuron.
If you read any of the papers before, I think, 2020,
everyone said, yeah, there's no difference.
All mammalian neurons are the same.
And it turns out that I think on average,
human dendrites are actually longer than mouse dendrites.
But they have the same number of ion channels.
So these are the gates that allow the potassium and sodium
to go back and forth between the cell and the external environment.
and that's what generates the actual potential.
But having space that more,
you actually have the ability to hold more so-called electrical states.
Well, that's the theory that's been pushed by some of the researchers at Harvard, MIT.
And so, yeah, it turns out they are different,
and that probably contributes to the differential in the overall performance,
and we've seen that as well.
The human neurons are better at playing and processing information.
But for now, for now, the mice neurons are sufficient.
for the state of technology
and the progress you guys want to make.
So we're not,
there's going to eventually become a new story entitled
Startup takes human neurons, puts them into computer,
are we creating a new form of consciousness?
Oh, oh, panic.
But it sounds like...
Well, we actually do use human neurons now.
Alongside mouse neurons,
human neurons is the other really big area of research
because that's where a lot of the biomedical stuff happens
where we're looking for new drugs,
you know, understanding disease models and so forth.
We stand at the shoulder of giants.
we use a lot of the techniques that have been developed by the, what you call the synthetic
biology, the biological engineering field to grow stem cells obtained from adult cells.
So if we went back about 10 years ago, in Japan, there was a researcher by name of Yamakana.
Professor Yamakana wanted them at what price, I think, a few years back for discovery of inducerable
pluripotin stem cells.
And what he discovered was that
if you took anybody's cells,
your cells, my cells,
you know,
skin or blood,
anything with a nuclei,
and you expose them to four
compounds,
you can actually reverse
the clockback of these cells
where they go from a blood cell
or a skin cell
back into a naive stem cell
that can they become turned into anything.
We utilize that functionality.
Yeah,
to,
to essentially grow human neurons from stem cells.
So this way we don't kill any animals.
It's continuously renewable as long as you provide the right conditions.
And yeah, that is how we do it now at Chronicle hours,
because we don't really particularly like killing the mice as well.
And so we think that this is actually an easy approach.
It's absolutely brutal.
No, I'm totally in favor of that.
I think there's going to come in time some questions
that appear ethically interesting with what you guys were building.
but I'm also of the opinion that they're not actually going to be ethically serious.
I think they're going to be mostly good for not making fun of the media of my industry,
but great for a headline, but less dicey in practice.
So I'm very bullish on the company.
Just before I let you go, how has demand been for the CL1?
I know you guys have a form-up.
I know you guys are working on deliveries.
Has demand exceeded expectations?
Is it about what you thought?
Just where's the business side of things going?
Yeah, demand has actually far exceeded what we had expected.
Suffice to say, we are now struggling to try to figure out our logistics and supply chain to balance supply and demand.
We've had a lot of sign-ups, particularly for the cortical cloud, I think over 3,000 sign-ups.
We have no ability to service more than 20 or 30 people.
on the cloud system.
So you have 100X, more demand than you can say.
Yeah, and so they're on the waiting list and so forth.
And then, you know, for the hardware, we have had a lot of labs and research groups reach out to us.
We are, you know, very excited about this, but now we're looking at the challenger of, oh, God, now we actually have to ship this thing.
And we're going to have to ship lots of them.
And, you know, if they break in the field, we're kind of screwed.
So, you know, we better make sure they don't break out there.
So there's a lot of like, we're doing our best at the moment to finalize all of the software, finalize the hardware, you know, getting our partners ready.
It's also very, very expensive, capax wise, right?
Because we have to, we don't charge our customers until we ship them, but we still have to build the units first.
And so, you know, trying to figure out how the next source of funding is going to come from is going to be a bit of a challenge.
Han, want to just charge them half up front and resolve some of your cash flow issues?
Well, we could, but you know, it's...
Do us.
Yeah.
We could do that.
But, you know, I think for us, we want to make sure that, you know, we...
I don't want to do a Kickstarter kind of thing, right?
And at the start of the product like this.
I don't think that's Kickstarter.
But I respect where you're at.
I guess then the correct question to close with is this.
Sometimes people say that venture capital is a little bit conservative.
I think the VCs that bet on cortical were clearly being relatively adventurous, which is good.
So as you guys do need more capital, is the venture capital ecosystem in Australia, Europe, and the West at large?
Is it interested in putting more money into the business?
Or are you guys still a little bit outside of what you might call it the venture norms?
Yeah, I think we're outside of venture norms.
I don't know what it is, but the venture industry likes to gravitate to thematics.
So in this case, it's been, what is it?
We went from transformer LLMs to agents.
and something whatever the next year brings.
It's very hard, I think, unless you fall into specific thematics to track that.
So, you know, I think we have some of the best investors in the world backing us
because they're just the ones who are willing to take bets,
which is essentially what the industry initially started out with, right?
Blackbird ventures and Horizons ventures.
Correct, yeah.
And, you know, we've had, you know, a new round that was just put together
get us a little bit more fuel to deliver our products by kind of people like 3C, AGI,
they've just come onto the game.
And then, you know, we've had players like Nuketel as well who have backed us.
It's really about, I think, trying to figure out who are the Mavericks in the space,
who are the ones who have the imagination, right, and the sci-fi background.
Because at the end of the day, if you have 200, like maybe, not 200, maybe like, say,
even 10 foundational model companies, but they're all pretty much, I couldn't really tell you,
the difference between Claude versus Gemini versus ChatGBT, GBT,
versus, I don't know, whatever else, like Quinn or Geepsy.
They are mostly the same.
And so if they're all mostly the same and I can jump between one another,
I don't really see any, you know, significant stickiness or moat to it.
So anyway, that's a problem of the VCs to resolve.
But, you know, rather than all jumping in into one space,
I think we should try to spread it out.
So for us, you know, we're very focused on delivering
the product and, you know, we welcome anyone who's interested who's listening to your podcast,
who understands the fundamental nature of building hardware first before you can get the software,
you know, to look into the space because the thing, as we really talked about, the AI space
already had that groundwork put in by the gamers, right? We didn't need to invest in the hardware
because the gamers just bought it already. So I think that's something that we have to bear in mind
with any new compute space. There is always going to be a
have a component, and we shouldn't shy away from having to invest and build into that space.
Well, I do look forward to the eventual future when we have Canva in Sydney and we have
Cortical in Melbourne, and we'll have a good old fashion competition about who's going to be
the next future leader of Australian technology. Han, thank you so much for coming on.
Thank you for answering all of my questions. And when you do have so much extra production capacity,
I'll give you my address to ship my CL1 and you just tell me where to send the check, okay?
All right. Thanks, Alex.
Thanks, hon.
When meta invested in Scale AI, absorbing its CEO and co-founder in the process,
startups that competed with Scale saw an absolutely huge opportunity.
Now, mostly people focused on the data labeling side of what Scale had done historically,
as the place that startups might actually capture the most market share.
But Scale also offered AI evaluation tools to folks who build LLMs.
So startups that offer AI evaluation tools may also be in line to benefit from scale aligning with Meta.
a company that competes with other firms to build the next great AI model.
Why work with Scale if Meta owns about half of it?
One startup CEO, Turin's Jonathan Siddharth, told Reuters after Scales partial exit to the social giant,
that leading AI labs now realize that neutrality is no longer optional amongst service providers.
It's, quote, essential.
So to help us understand the LLM evaluation market, just how big it is and how data comes into play in 2025 to build those next models,
please welcome to the show.
It's Turin's CEO, Jonathan Siddharth.
Jonathan, hey, welcome to the show.
Thank you, Alex, for having me.
It's great to be here.
I'm very impressed.
Also, we're both in our home offices today,
and I think this just goes to show that remote work,
not entirely dead,
even though everyone seems to claim that it is.
I'm glad that you're here.
So, starting for folks who are less aware of what LLM evaluation is,
Jonathan, can you just give us the working definition
from your side of the fence?
Yeah.
So it's really important to benchmark LLMs,
to figure out where they are good at, where are gaps in the models,
so that you can generate data that could help the LLMs get better at those specific tasks.
One big challenge, Alex, that I see today in the overall evaluation and benchmarking market
is that a lot of the evaluations are somewhat academic and somewhat synthetic
and don't connect to real-world applications or real-world use.
Ideally, you want AGI to progress in a way where the models get better,
at useful tasks.
So when we evaluate models, the three dimensions to look at are complexity.
Like, are you evaluating them on really hard tasks?
Real world use.
Are you evaluating them on something that a human would actually care about?
And third is diversity.
You want like a wide breadth of test cases that you're evaluating the models on.
It's such an exciting space.
And I think of evaluation as step zero of data generation.
That's why we do both evaluation and data generation.
Okay, so let's talk about the benchmarks because there's been some commentary.
I think Apple had a paper that came out and they said that, you know, one of the problems we have
with a lot of the benchmarks that everyone likes to shot out their new LLM and put them up against
and have the charts is that there's data contamination issues, there's overfitting.
So based on what you just said about trying to solve real-world problems and also the fact that
we know that these benchmarks are getting a little bit, I don't know, dicey to use,
is the standard way that companies announce how their LLMs perform an effective form of evaluation,
or is that mostly window dressing to get more social media hits because you're one point higher
on one test or another?
So I'd say the labs care about the benchmarks because it's good for bragging rights.
It's good for recruiting.
Like who has the best model for coding?
Who has the best model for STEM, et cetera?
So benchmarks serve some useful purpose.
And you can argue that in today's $100 million sort of talent wars,
or maybe it's closer to $300, 400, 400 million dollar talent wars,
those bragging rights help because the best researchers ideally want to join,
like, the winning team, like who's already kind of close to being number one.
But the labs care about two things.
Public benchmarks and private e-bails.
And I would argue that private e-vails are actually more important
because there you're evaluating how well is your model doing
relative to the competition on the prompt distribution
that you care about, meaning if you're Google
or if you're meta or if you're opening eye or Anthropic,
the type of queries you might get inside in a coding context
might look different from what you might get on the phone
in like a general chat assistant context
or what a human might put when you're searching through like a desktop.
So you want to optimize for your own query distribution that your users are testing your model on.
So that's why these private e-vals are helpful.
So you need public benchmarks and private e-vals.
The challenge with public benchmarks, Alex, is if you look at coding, for example, there is this good benchmark called SweeBench, which is created by this lab at Princeton.
Last year, we went from 2% to 50% in Swaybench.
This year, the best models are already north of 60%.
we'll probably saturate Rebench this year, and then where do you go?
Right?
Like, we haven't, clearly we haven't automated all of software engineering yet.
The benchmarks are getting saturated.
Yeah, can you double-click on what you mean by saturated in that context?
I think it's an important point.
Yes.
So if a model does 90% plus on a benchmark, you've kind of aase the test to some degree.
And researchers call that saturating a test, because now the test no longer tells you
whether your models are improving or not, right?
So the obvious answer is you have to create a harder benchmark where the models would have even more headroom to climb, right?
Even more room to improve, which is why at Turing, we're actually creating a benchmark for coding that's even harder than SwayBench.
And it gives us deep satisfaction because we managed to have all the models start at zero.
Oh, okay. So everyone's failing 100% of this new tech. Ah, okay, let's talk about private evals because this is a core thing of what?
Turing offers to its customers. So I'm curious, let's say that I'm open AI. I have a new model.
I want you guys to take a look at it. You find a couple of places where it's not quite up to snuff,
not quite where we expected it to be. So then do you turn around and say, hey, guys, here are the
places where there are gaps and then help them solve those issues? Or do you guys just point out,
here's the spot where you know, you might want to do some more work?
So we do both. We evaluate the model and we generate data to help the models improve. But let's
Let me take a step back, Alex, and let's look at this landscape, which is super interesting
right now.
So, as these models, what's happening is, as these models have gotten smarter and smarter,
the data that's needed to advance the models has become increasingly harder to generate,
right?
The models are advancing in depth.
They're improving in coding, STEM, reasoning, et cetera.
They're advancing in breadth in multimodality, multilinguality, multi-industry, et cetera.
and the models are becoming agentic,
meaning the models can now execute complex,
multi-step workflows in a real-world business context.
Now, with the models advancing like this,
what's needed in a platform is you don't just need Iron Man,
you need the Avengers.
Frontier models need frontier data.
Frontier data needs a team of frontier humans, right?
So we have like PhDs from physics, chemistry, math, biology,
expert Olympiad level coders,
people who are literally at the pinnacle of their fields,
and sometimes you have to have them working together to break the model.
For example, in physics,
if an expert is evaluating the model in physics to test some theory in physics,
a PhD in physics might ask the question to test a theory.
A software engineer might need to build a simulation to test that theory,
and then a data scientist might have to analyze the results,
of that simulation.
So you need to daisy chain really smart humans together to generate data.
Actually, Jonathan, can I talk to you about that?
Because what you're describing to me sounds like very interesting tests,
having people with different domain expertise,
has worked together to find places where the model might have a weakness
or might not be able to answer something.
And then you say that they're generating data.
Is the data they're generating simply where the LLM in question misses
or doesn't meet expectations?
Because I thought this was more like,
here's new information for the LN to be trained on
versus looking at a model that's already been trained.
And then does that make sense?
I feel like I may have had this backwards.
So that's a great question, Alex.
So the way these models are trained
is there's a step called pre-training
where you basically feed the model gobs and gobs of data.
The war in pre-training is mostly over.
Like almost all the models are trained
on the same subset of the internet.
now the battlefield is post-training, and there's a new field called mid-training,
which I can get into that later.
But with post-training, what you do is you have these human experts create question-answer pairs
where a software engineer might ask a question like, hey, how do you create an app that connects
dog walkers to dogs?
And can you write it in both Swift and Kotlin for Android, right?
And now the system has to generate the app.
So that's an example of a supervised fine-tuning data set where you gave it a prompt and a completion.
And the model learns how to how to how to how to auto cover, how to respond to that prompt, right?
So you need experts in different fields.
Now, the reason this has gotten harder is two years ago, a very low-skilled contractor was capable of creating tokens that could have advanced the model.
Now because the floor has gone up.
Now you need PhDs from Stanford, Berkeley, MIT to figure out where the model was.
You first have to break the model. You have to ask a question that literally stumps the model.
And then the humans create good question pairs of questions and answers that you then feed
into the model for fine-tuning and then the model learns.
Ah, it's the question and answer pairs that generates the data for the model. Okay, now that makes
good sense to me. So it sounds, though, like what you've done is back to your Avengers Iron Man
point, just got together like a super set of nerds. And basically you're applying the smartest humans
to find the flaws in the most advanced LLNs.
What happens, Jonathan, when we don't have PhDs that can ask questions that the model can't answer?
Because it seems to me that we've raised the bar, but we're raising it towards a ceiling.
Yeah, yeah.
I mean, I've oversimplified it a little bit.
So I would say, like, so one example I gave is you give question answer pairs.
There's another type of data that you generate for reinforcement learning where you,
the humans are generating questions and verifiers.
They're not generating the solution.
They're generating a way to verify
whether the solution is correct or not,
which you can do with certain fields
like coding, math, hard sciences.
You can do that, right?
Write a program to sort numbers in Python.
You can write test cases to verify
whether the program that you wrote is correct or not.
But you're not telling them every single step along the way.
You're just saying,
I can verify that what you've done,
the work you've done does generate an acceptable answer.
Correct, correct.
And the version of this for enterprise is you create an RL gym, a reinforcement learning gym.
I think this is one of the coolest things in computer science.
Like, just like a gym where humans go to train, this is like a gym where an agent trains.
And in these gyms, you create basically clones of different websites, like a Dodeash or an Uber
or a NetSuite or a Salesforce.
you create this virtual environment with prompts or workflows and ways to verify whether the task is
completed.
Now, the task would be, hey, salesperson, why don't you research everything you need to about this
prospect and then update Salesforce in this way?
And as long as when you create this virtual environment, you create these prompts and you create
these verifiers.
Now an agent tries out different combinations.
The agent is basically trying to use the tools in an appropriate way to actually complete that task.
And it's a cool way to learn whether the agent is learning through trial and error, how to execute a complex, complex, multi-step workflow.
You're not teaching the agent.
Go to LinkedIn first, go to Zoom info first, and then check out Salesforce for prior conversations.
You're not teaching it that, but it's learning.
It's learning just based on feedback on, okay, if I accomplished this, I get a reward, and then it learns how to operate in a way that maximizes future rewards.
really cool. To your question of what happens if we run out of human intelligence, right?
Fortunately, I think we are still a significant distance away from that. We've run out of
internet data, but there's lots of intelligence trapped in the minds of humans that is yet
to be transferred from human minds to machine minds. I would say, Alex, even the way, like, I love
your show, I love what you and Jason have done with this show. But when you're thinking about how to
interview somebody who's on the show, what type of questions to ask, what type of prep to do,
I guarantee that knowledge is not distilled into the models from Open AI or Anthropic or
or Meta or Google. The only way that's going to happen is if we hire somebody like an Alex or
a Jason to work on Turing to use our tools. And we didn't get into this, but we have tools
that make it easy to keep the quality of the data high. And we have AIs that will assist you
in generating this data so that we can make this scale.
Pun not intended.
No, I was going to let that one slide.
We don't allow more than two dad jokes per episode.
So we have to save them if that makes sense.
So here's my question.
I understand that scale being subsumed by meta has been great for the touring business.
You told me before the show you guys are at nine figures of revenue and profitable and growing.
And honestly, hell yeah.
I love that.
But if I'm open AI, why wouldn't I try to replicate what?
what Turing has built inside of my domain.
Because if there's one thing that large AI model companies have today,
it's a lot of access to capital.
And so as we've seen from Meta trying to buy literally every human who's touched an AI before,
there's a lot of money to play with here.
So are you guys just so good that open AI doesn't want to try to replicate what you've
built?
Or is there something else that I'm missing in that it's good to have a third party versus
an internal group be kind of telling you where you might want to work on things?
Yeah, I mean, that's a great question, Alex.
Firstly, to advance towards ASI, you need three things to move in parallel.
Actually, you need four things.
You need research and algorithms.
You need compute.
You need data.
And you need the application layer.
Those are the four things, right?
Research, compute data, and applications.
Now, the labs are really good at research, right?
And kudos to OpenAI, Anthropic, Meta,
Google, Apple, all of these companies for advancing the frontier forward. So they're spending their
energy there. We could flip this question to also ask, why don't they build their own compute?
And in some areas, the labs are investing in it. I mean, we've heard of Google having TPUs and
other companies trying to do that. But Nvidia clearly has the edge today. And there's some good
companies like GROC, Cerebris, etc. that are also building custom chips. So compute is also
advancing. And the third pillar is the data pillar. Right. Now,
generating this data is exceedingly complex.
Again, three dimensions.
It has to be complex.
It has to be realistic, meaning it has to mirror how a real human would interact with the model.
And third, it has to be diverse.
It's very hard for a lab to get all of this in-house.
I'll give you an example.
Like today at Turing, this month alone, we are hiring PhDs across the board in physics,
chemistry, math, biology, at the level of granularity of somebody who's an expert in dark matter,
somebody who's an expert in black holes, somebody who's an expert in molecular biology, right?
Like these very niche fields. So it's, and you ideally need these humans part time because the way
these humans are good at the job of training the models is because they are also good at their day job,
which keeps their skills sharp. So you kind of, and once you've ingested that knowledge,
You might want to move to the next frontier and the next frontier and the next frontier.
So you need a platform or a partner that can scale up very quickly to elite talent,
manage the talent to make sure the talent is generating data part-time.
The data is high quality.
We have to build a ton of tools.
Like we have this platform called Allen, where AI is used upstream of the human to minimize the work for the human.
AI works alongside the human, and AI does quality control.
It may sound a little dystopian.
It's like, you know, AI.
overseeing humans to improve AI?
I'm not afraid of our AI future, personally.
So that doesn't, my P-Doom, I guess, is very low.
But I want to go back to your Black Hole's Dark Matter point
because I'm a bit of a science fiction nerd,
and I'm also a bit of a space nerd.
So those are topics that are near and dear to my heart.
But if I spend a lot of time helping a particular LLM
better understand why we think about dark matter,
why we came up with the idea, you know,
galaxies and spinning and not having enough matter,
blah, blah, blah, blah, blah.
Will that work to make the LLM smarter in that particular domain have spillover effects to
other areas?
Does it make it more intelligent writ large or more intelligent only in that specific domain?
I'm not sure the answer here, so I figured I'd ask.
Yeah, phenomenal question.
What we know with relatively high confidence is that when the models get better at
coding and math, they seem to get better in a wide variety of other tasks that have nothing to do
with coding and math. Something about coding seems to teach the models how to think in a more
structured way, how to communicate with less ambiguity, how do you think step by step. So coding and
math, there have been some experiments in how they have out of domain performance outside of just
pure code generation. And coding and math is also interesting because sometimes when you ask
complex questions, Alex. The sub-steps involve being able to compute stuff or calculate stuff
and pass the results forward. For example, Alex, if you had a question like, hey, what are some
interesting themes in investing in AI? The answer to that might involve a model that knows how to
write code to query pitch book, write some Python code to analyze the data and say, you know,
AI and healthcare seem to be spiking. AI in retail seems to be going down. That requires the model
knowing how to write some Python code to analyze the data. It knows math plot lib to plot the results
and show you the results. So coding and math have broader applicability. But it's also true
that this is one of the mysteries of these language models. The models seem to learn some
representations about the world that seem to carry over to other areas. I can totally imagine
you understanding human psychology. If the models were really good in human,
psychology, it could help the model write better website code that designs the website to be more
persuasive, maybe like more conversion optimized, maybe makes it better for generating copy.
So it may be, who knows, maybe there is something to learn from studying the universe that
helps people in their day to day. It could be like some indirect thing, but we don't know for sure.
So Jonathan, here's to a next great couple of years as the world gets bigger and faster and smarter.
And just before I let you go, where can people find Turing on the internet?
And is there a particular job that you're having a hard time hiring for that she wanted to shout out to the world?
Great. Thank you, Alex.
If you're working on AGI research or in human data, come talk to us like we are hiring across the board.
And if you're building a frontier foundation model and you need data to make the model smarter at coding, reasoning, STEM, et cetera, come talk to us.
We're at Turing.com and you can email me at Jonathan at Turing.com.
Perfect.
All right, Donald, thank you so much.
We'll have you back on in another six months
when AI is twice as smart.
Until then, this is Twist.
Bye.
All right, so here on Twist,
we have talked ad nauseum about AI and the job market,
mostly about how AI might impact the job market change.
Who does what job, change what jobs are done.
You get the idea.
People have a lot of concerns that AI might take all the jobs.
We'll have to see what happens.
Today, though, I want to talk to a company
that wants to use AI to help people not just get jobs,
but get the right jobs.
And also to have companies find the right people.
Because as it turns out,
and you know this if you've done any hiring in your career, finding the right people pretty much
is terrible, even with the modern tools we have today. So to help explain how this is going to work,
please welcome to the show. It's Merker, CEO, and co-founder, Brendan Foodie. Brendan, hey, how you doing?
I'm doing great. It's awesome to be here and appreciate you're having, Alex.
It's my pleasure, man. I love doing these. Talking to people who are building what's actually
next for the economy literally never fails to give me a jolt like a good espresso. So thank you.
Now, one reason why I wanted to talk to your company is because you have to,
this amazingly huge vision. You guys wrote that you founded the company because, and I quote,
the labor market is the largest, most inefficient market in the world, which is a pretty big claim.
So before we dive into how a worker is going to approach this and how you're going to get
into the market, tell me why you think that and what brought you to that as the problem
you wanted to solve. Yeah, it really comes down to this matching problem where when you were
describing earlier, the reason that it's painful is that when a candidate is applying to a job,
they can only apply to a couple dozen jobs.
And when companies considering candidates in the market, they can only consider a fraction
of a percent of the people available that are looking for work because they need to solve
this matching problem manually.
They need to manually review resumes, manually conduct interviews, and manually decide who to hire.
But when you're able to solve this matching problem at the cost of software, it makes way
for this global unified labor market that every...
Every Canada applies to and every company hires from.
And so that's the end vision of the company and the North Star that we work backwards from.
Okay.
So essentially, when I go out through look for jobs, I might look at my local area.
I might look at jobs that companies that I've heard of.
But no matter how far I cast my net from my perspective, I'm still missing out on most of the gigs.
And therefore, most of the companies aren't hearing from as broad a talent pool as they might.
Exactly.
Yeah.
Okay.
So in the world in which we're dealing with kind of the demise of her,
work, isn't the labor market necessarily constrained by geography?
Well, not precisely.
I don't think we're dealing with the demise of remote work.
I just think that in many ways, remote work might actually become even more prevalent.
And part of this is that if you automate 90% of what it means to do remote knowledge
work, that means that the bottleneck to productivity is the other 10%.
And so for all the domains like software engineering, where demand is extremely elastic,
will just produce, you know, 10 or 100 times more.
And so I think that humans and the role that we play in knowledge work both remotely as well as in person
is going to become amplified with these huge increases and productivity that are starting to happen.
So just, just in that down for folks, do you think that in time remote work is not going to go away?
And therefore, there will be kind of a global talent pool.
And therefore, if you want to access the best people, you're not.
only going to have to have a much larger pool, but you're also going to look more broadly at the
corporate level. This is certainly the case. And most importantly, that I think it'll be centralized,
right? Instead of all of these decentralized, fragmented job searches and people, companies that are
looking to hire, I think it'll be one place that everyone goes, that every company hires from,
that can facilitate all of these job matches that people love and companies are finding a lot of value.
Okay. And I think you want that one place, that central hub to be, Merker, your company.
Yeah. And if you succeed, this means that recruiters, as we know them today, are going to have a hard time.
Probably generic, un-a-inable job boards like Indeed and similar are all going to get hit.
So that's the scale of what you guys are shooting for, essentially a revolution of how people find insecure jobs around the world.
Exactly. That's fantastic.
Well, so one way of thinking about it is there's sort of these two ends of the spectrum. On one end of the spectrum, there's companies like LinkedIn or Indeed, their job boards. And they have this very broad distribution, but they only aggregate this very thin layer of the person's resume, right? And so they capture about 1,000th of the value chain associated with facilitating a hire. On the other hand of the spectrum, there's the services companies, the recruiting agencies, the staffing agencies that do all the manual work and heavy lifting.
to get their 30% of first year or whatever their fee structure is.
But no one has been able to combine the distribution of LinkedIn and Indeed with the value
capture and value add of a recruiting and staffing firm because previously it wasn't possible
to automate services, right?
But now that's all changing.
Now it's becoming possible to automate everything that recruiters and staffing agencies
would otherwise do in this unified way.
that has the same kind of distribution scale of consumer platform with hundreds of millions of people.
All right.
I wanted to start here because I want people to know where you're going to.
But I'm now going to go the other direction and talk about what you're doing now because you guys have this really interesting, quote,
public secret plan, if you will, of using, helping AI companies find the right talent that they need as essentially your wedge into the market.
And to help explain this to folks, you guys want to learn.
how to place candidates very effectively in this one area where there's a short
feedback loop, if you will, so you can learn pretty quickly and then apply that more broadly
over time. But I think even pigeonholed by some in the press is like helping people hire
for AI when that does seem to be kind of a small fraction of the company's vision.
So what I want to know is how does it work today? I know you guys are working on one
subset of the market, but walk me through from the candidate and the company side,
how Merker actually does do what you're describing.
So, candidate will come to us looking for opportunities for work.
They'll see a lot of the jobs available or they can apply to one of those or talent pool more broadly.
They'll upload their resume and they'll take an interview with the AI on our platform to then get evaluated to see what they're good fit for, what they're interested in, what kinds of matches we might be able to create for them in facilitating huge volumes, tens of thousands of these matches with no human process.
and involvement. On the other side of the marketplace, there's companies that will give us these
requests of saying they need 100 people in a particular domain, maybe a subset of software
engineering or investment bankers or whatever the professional domain is. And we will go out to
the supply side of our marketplace, find all the people that are a good fit for them, and facilitate
all of those contract opportunities. Tell me more about the AI interview. And
I'll just say that I'm an AI bull.
I think the stuff is really cool and is going to work well.
On the other hand, I do have some friends that are in less prestigious jobs who have had to deal with some AI onboarding, some AI screenings.
And they've been pretty negative about it.
So I'm curious what you guys have built and how well it works and kind of what candidates think of that part of the process.
Yeah.
So we built the first AI interviewer in March of 2023 when I was in my college dorm room.
and initially I remember it would hallucinate like nothing else because it wasn't even Jupy 4 at that point, right?
Oh, God.
It had a 10-second latency.
And it's been extraordinary to just see this tailwind of model improvement, you know, lift all boats and make these applications possible.
And so a good heuristic for it is emulating all the processes that a human would otherwise do.
Where similar to how human would review resumes and conduct interviews, we automate all of the preparation.
for the interview and conducting the interview and evaluation of an interview with that similar
format and heuristic.
And while I think some candidates obviously prefer to talk to a human because they think about
it more as a selling process from the company rather than strictly a buying process,
the overwhelming sentiment is that there's millions of people that apply to jobs and just
get completely ghosted and don't even get the opportunity to interview.
right? It'll be like less than 1% of people even get to talk with someone or show more than their
resume for for the most competitive jobs. And so globalizing that ability and not just like
internationally, but also all across the U.S., for people to, you know, talk with the candidates
and actually consider everyone for the opportunity has been really impactful.
Does the AI interviewer you guys have built in its current interaction?
Does it work with multiple languages?
Because I presume that if you're only doing English, you're still constraining yourself
to a small portion of the overall possible job takers.
Yeah, we start out predominantly with English.
Now we're starting to do others, working with a lot of our customers to try to make sure
the models improve their multilingual capabilities so that the interviewer is well set
up with that.
And I presume that India is a market of choice for you guys.
But are there any other hotspots around the world where you're seeing a lot of talent
saying, hey, we really want to be part of this new marketplace that you're building.
Yeah, well, so actually the majority, the vast majority of our hires come from the U.S.
now, over 60%.
India is the second largest geography that we hire from, but also seeing lots of Eastern Europe,
lots of South America.
One thing to note is that the average pay rate in our marketplace is over $90 an hour.
And so it's very different from most labor marketplaces in that way.
It's, you know, a totally different league from what you would find,
with the crowdsourcing platforms like scale and surge that sort of built these legacy labor
marketplaces with more in the range of $10 to $30 an hour pay rates.
You guys are currently targeting a much more educated and rarefied employee pool, again,
as your starting point, as your wedge.
So I want to talk about how good your system is at finding and placing people because it
sounds good in theory, but in practice, of course, that's what matters.
So what's the right metric that you guys track in terms of finding the right person for the right job and then having both sides of that equation be happy?
Is it repeat business? Is it successful completion of a contract? What's the KPI there?
Yeah, I think about it as a two-sided retention problem. And there's sort of leading indicators on each of those retention problems.
So the supply side retention or applicant side retention is, of course, if they come back looking for work.
And so we have models that predict what is the probability that they're going to be interested in particular job opportunity that we present them with based on all their prior experience so that we can ensure, you know, we're retaining candidates very well.
And then on the demand side, it's how well are those candidates performing, where we collect all of the performance reviews of who's doing well for what reasons and use that as our eval set as our benchmarks to go internally on how do we predict, based on the interviews, the kinds of people that are going to perform well that are going to translate to.
a lot of value for our customers.
And that's why our demands-side retention is so large.
So now we're working with six out of the Mag 7.
We have 16105% net revenue retention on an annual basis.
And so it's sort of nuts.
1605%.
So it's 1,600%.
So 16X net revenue retention.
For folks out there who don't know why that's funny, mature software businesses will often
turn in a net revenue retention number of 115% or plus 15.
So the number that he put out there is humorously large and implies that customers are
by not just a little bit more of the services, but 16 times as much over a one-year basis.
Yeah, the two-side retention problem is working well.
Yeah.
No, two-side retention solution.
I don't think problem is the right word.
Okay.
So to me, AI companies that need to hire experts is a very, it's a market that has a lot
of reason to do well, to invest in new ideas, to try to find the right people, because they're in a
massive race to build the best stuff and therefore capture the market. How well do you think
the market or model is going to translate when you go wider and you're dealing with people
that might be less individually resume-wise, impressive, and are more average folks applying
for more average jobs? Because to me, difficult job, difficult person might actually be
easier to match than finding the right person from a more general pool for a more general gig?
Yeah. Well, we do hire a lot of people from general backgrounds as well. Like, it's really,
really broad across literally every industry and the economy. But maybe stepping back a little bit,
the background of the company is initially we were automating the processes of hiring people for
our friends. And then Scale AI came to us and they used our platform to hire over a thousand people.
And what we realized was that there was this huge transition in the market away from the crowdsourcing
paradigm of low and medium-skilled talent towards this medium and high-skilled sourcing and vetting
problem, not only with higher caliber people, but also people that work directly with the AI
researchers to help them interpret evals and push the frontier of model capabilities.
And that is one of the core reasons that the performance data we collect and all the things
we learn and how we facilitate better matches is actually much more similar to a general work
environment than most people realize on face.
When you say, Brendan, that you're collecting the e-vails, my impression of how this worked
was I would come to you, resume, interview, job, and I would go work for someone else for some
period of time.
But if you're collecting the evils to ingest back into your process, does that mean that
when I land a contract or a gig from Merker that I'm actually working for you guys?
So technically, the work product that our contractors are producing ends up going to the customers.
What we own is just the performance review on like, did this person do a good job on the project?
So the end customer tells you how the employee did, and that's part of the overall arrangement.
So the feedback does make it even if they're working for someone else.
Yeah.
Yeah.
And one interesting thing is we facilitate the entire payment stack.
So we learn from who's getting bonuses, who's getting raises, who's getting raises, who's
getting dismissed for what reasons, all as part of the data flywheel. Okay, so people love to talk
about the usefulness of data and how the more you have, the smarter you can be. So how much better
have your systems become after ingesting increasing amounts of performance reviews, bonus information,
contract retention and so forth, all this stuff we've talked about. How quickly does that filter back in
and actually generate a material and measurable improvement in your ability to place candidates?
Yeah, pretty much on a weekly cadence. I mean, we,
went from where it's been most impactful and our focus has been is predicting people at the high
end.
And the reason is there's this dynamic on a project where if we're providing 100 people to work
with a given customer, the top 10% of people are going to drive majority of the value.
It's similar to like if you have a team of 100 software engineers, probably the 10 core people
are these like this notion of 10x software engineers that are driving.
Are you telling me that power laws exist?
Power laws exist, right?
in many knowledge work verticals.
But the implications of that from our standpoint are profound,
because if we're able to build interviews and assessments
and all this technology that can predict these power law outcomes,
the amount of value that we drive for our customers
and performance of those people and the quality is extraordinary.
Okay, so let's play devil's advocate here
because I'm always on one hand a capitalist technologist,
and on the other hand, I'm a person who has friends
who have to make rent.
So if you can help people find the 10x engineers and the 10x engineers only, like let's say you can just say, look, here are the top five people and here's 45 others you can hire if you want.
Aren't we going to end up with a labor market in which companies are able to hire the best and get more out of them and then they're going to need fewer total people?
I'm just worried about folks who are like B-students.
Yeah.
Well, so I sympathize with a concern.
But one thing I've come to appreciate is that people sometimes over-ins.
index on the dimension of how exceptional someone is and under index on how relevant their
experience is. And that if you match people with the right thing that they're extraordinary at,
you can unlock these phenomenal outcomes. All right. Now, I want to talk about results because you guys
raised, I think it was a series B earlier this year, quite a large round, quite a large valuation.
And I think Tech Run said your revenue was somewhere in the realm of 75 million ARR. You guys have
also talked about, you know, 50% a month at one point in time. So one, since you're lost funding
round, has growth stayed as hot as it had before? And then when do you think will be the right
time to use the wedge to expand your remit and try to take on a wider range of jobs and also
longer tenure jobs? Yeah. So I'll end to the first part. The business has grown by an order of
magnitude since we received our term sheet for the series B. So the growth has been incredible.
Happy investors, very happy investors. Well profitable the entire time. And,
And so we have more cash in the bank than we've ever raised, which is unique for an AI company.
And then to your second point of timing the expansion to new markets, the context for why we focus on the AI labs is we realize that there's more of a comparative advantage when we focus on hiring someone for five weeks versus five years.
Like when we're hiring someone for five years, you want to get dinner with them, build a trust in a relationship.
Five weeks, you want this fast, efficient AI interviews, automated process.
And so we're leaning into that a lot now in the wake of the scale news with them no longer in the market.
There's so much market poll that we're focused on capturing that.
But what a lot of people don't realize is that throughout the duration of the business, we've still been doing lots of contract hiring outside of the market for AI labs and lots of full-time hiring for ourselves and our friends.
We still have the customers from 2023 before we even started working with AI labs that are continuing to grow with us.
And so are starting to also ramp up investments and a lot of those other kinds of hiring work.
So it sounds a little bit like I asked a question that was more relevant maybe a year ago.
But instead, you guys are expanding your work with the AILAS and doing other things.
So it sounds like the wedge is wedging and you guys are already growing your remit.
Okay.
Now, before you go, two things.
One, where can people find the company on the internet?
And then two, is there a particular role that you're looking to hire for and want to shout it out into the broader world to see if the right?
candidate's waiting for you, which is ironic, I know, given the conversation.
Absolutely. People can go find us up at Mercor.com, M-E-R-C-O-R-R-com, and we're hiring for a huge
volume of roles, particularly lots of software engineers. So for any software engineers that are
looking to join our team full-time, we're super eager to talk to you for anyone looking to
join our marketplace, whether it's part-time or full-time as a contractor. We pay exceptionally
well. We have phenomenal satisfaction on the marketplace and retention as we talked about. And so
we would love the opportunity to work with you. Do you use your own software as the place you
source some of your own internal candidates? Of course, absolutely. So you do like the taste of
dog food. All right. Well, Brendan, thank you so much. We'll have you back on in, I don't
reply six months when you do eventually raise more money than I can ask you about that. But in the
meantime, thank you. And we'll see you soon. Yeah, for sure. See yeah.
