This Week in Startups - TWiST 500 interviews with Cortical Labs, Turing, AND Mercor | E2159

Episode Date: August 1, 2025

Today’s show:Alex is back with three more awesome interviews with founders on the bleeding edge of innovative tech.Dr. Hon Weng Chong walks us through the basics of biological computing and Cortical... Labs’ first-ever commercial computer running on living human cells.Turing founder Jonathan Siddarth unpacks the secrets of LLM benchmarking, and explains why even our most advanced tests need to get much much harder right away.Finally, Mercor founder Brendan Foody on how AI is about to reinvent the hiring process, and marrying the effectiveness of recruiters with the ease of online job boards.It’s three — count ’em, three — can’t miss TWiST interviews guaranteed to make you smarterTimestamps:(0:00) OpenAI’s GPT-5: When is it coming out? Is it going to be TOO smart?(08:06) Cortical Labs’ Hon Weng Chong on the electric connection between neuroscience and machine learning(10:20) Northwest Registered Agent. Form your entire business identity in just 10 clicks and 10 minutes. Get more privacy, more options, and more done—visit https://www.northwestregisteredagent.com/twist today!(11:27) Show Continues…(15:17) The extreme difficulty of going from the lab to a shippable product(20:00) .TECH: Say it without saying it. Head to www.get.tech/twist or your favorite registrar to get a clean, sharp .tech domain today.(21:05) Show Continues…(28:15) Why data is a factor of time(29:52) AWS Activate - AWS Activate helps startups bring their ideas to life. Apply to AWS Activate today to learn more. Visit aws.amazon.com/startups/credits(31:16) Show Continues…(42:44) Turing CEO Jonathan Siddarth explains why it’s so important to keep benchmarking our LLMs(47:24) What it means when a model “saturates” a test, and why benchmarks need to get HARDER(50:22) What happens with the LLMs can answer all of our smartest questions?(53:44) AI Agents train in gyms? Wait, really?(01:01:33) Coding teaches models how to think, and more training mysteries don’t understand(01:03:11) Brendan Foody from Mercor explains the “matching problem” that makes hiring such a pain(01:07:02) How Mercor combines a job board’s distribution with the value of a recruitment agency(01:10:49) Brendan recalls building his first AI interviewer in his college dorm(01:20:16) Mercor has the opposite of a retention problem and crazy growthSubscribe to the TWiST500 newsletter: https://ticker.thisweekinstartups.comCheck out the TWIST500: https://www.twist500.comSubscribe to This Week in Startups on Apple: https://rb.gy/v19fcpFollow Lon:X: https://x.com/lonsFollow Alex:X: https://x.com/alexLinkedIn: ⁠https://www.linkedin.com/in/alexwilhelmFollow Jason:X: https://twitter.com/JasonLinkedIn: https://www.linkedin.com/in/jasoncalacanisThank you to our partners:(10:20) Northwest Registered Agent. Form your entire business identity in just 10 clicks and 10 minutes. Get more privacy, more options, and more done—visit https://www.northwestregisteredagent.com/twist today!(20:00) .TECH: Say it without saying it. Head to www.get.tech/twist or your favorite registrar to get a clean, sharp .tech domain today.(29:52) AWS Activate - AWS Activate helps startups bring their ideas to life. Apply to AWS Activate today to learn more. Visit aws.amazon.com/startups/creditsGreat TWIST interviews: Will Guidara, Eoghan McCabe, Steve Huffman, Brian Chesky, Bob Moesta, Aaron Levie, Sophia Amoruso, Reid Hoffman, Frank Slootman, Billy McFarlandCheck out Jason’s suite of newsletters: https://substack.com/@calacanisFollow TWiST:Twitter: https://twitter.com/TWiStartupsYouTube: https://www.youtube.com/thisweekinInstagram: https://www.instagram.com/thisweekinstartupsTikTok: https://www.tiktok.com/@thisweekinstartupsSubstack: https://twistartups.substack.comSubscribe to the Founder University Podcast: https://www.youtube.com/@founderuniversity1916

Transcript
Discussion (0)
Starting point is 00:00:00 It's really important to benchmark LLMs to figure out where they are good at, where are gaps in the models, so that you can generate data that could help the LLMs get better at those specific tasks. One big challenge, Alex, that I see today in the overall evaluation and benchmarking market is that a lot of the evaluations are somewhat academic and somewhat synthetic and don't connect to real-world applications or real-world use. Ideally, you want AGI to progress in a way where the models get better at useful tasks. This Weekend Startups is brought to you by Northwest Registered Agent.
Starting point is 00:00:43 Starting your business should be simple. With Northwest Registered Agent, you can form your entire business identity in just 10 clicks and 10 minutes. From LLCs to trademarks, domains to custom websites, they've got you covered. Get more privacy, more options, and... and more done. Visit northwest registeredagent.com slash twist today. Dottech. Say it without saying it. Head to get. Dot tech slash twist or your favorite registrar to get a clean, sharp dot tech domain today. And AWS Activate. AWS Activate helps startups bring their ideas to life. As you build and scale your business, activate credits grow with you to support your changing needs.
Starting point is 00:01:30 to AWS activate today and receive up to $100,000 in credits. Visit aWS. com slash startups slash credits. Hey, welcome to this week at startups. You may notice I am not Jason Kalakanis. He is on the road this week. We're going to do a quick show without him. I'm your host, Lon Harris.
Starting point is 00:01:51 Joining me, as always, Alex Wilhelm. Hello, hello. We have three incredible, amazing Twist 500 interviews coming up right now that Alex did all week. Don't move from that seat. But first, I do think we've got to talk about GPT5 a little bit out. Yes. Yes, GPT5. So this has gotten a lot of play.
Starting point is 00:02:09 But Sam Altman, one of the founders of Open AI and kind of what you might call the face of AI from the software side, Jensen being the hardware face, if you will, went on a podcast with a man named Theo Vaughn, and they were having a pretty candid conversation about what they're building. And Sam says, you know, I was playing with GPT5, and I gave it a challenge, and it did it very quickly. and he talks about how it sat him back and made him think about what they're building and the quote that really went viral was he was like, you know, there are moments in time
Starting point is 00:02:36 in science when people sit back and go what have we done? And you know, on one hand, Lon, I have three things here. First of all, Sam Allman loves to hype things up and then get mad at everyone when they get a little overhyped. He's a master at PR. So part one.
Starting point is 00:02:54 Two, if you go back in time to GPT3, even maybe even There was a lot of worries that as this technology advanced and progressing got better, it will be misused. And we're seeing that now when people are using voice cloning to trick elders out of their money and AI is being used in more and more nefarious circumstances. But we also at the time had a lot of people thinking that there was going to be a lot of improvement and things were going to get much better more quickly. So when I think about his GPT5 comments, they don't actually feel that different to me than
Starting point is 00:03:23 what we've seen with GPT4, GPT3, GPT2. So that's, to me, this probably does feel that impressive to him as he felt blown away before. So to me, I don't think he's being cynical. I don't think he's overhyping here. I think we're just seeing improvement. And I don't think we know exactly where it's all going. I mean, I think that sounds reasonable enough. And look, I'm sure he's playing around with GPT5 and is genuinely impressed at what it can do.
Starting point is 00:03:52 Like, I think there's two levels here. Like, even like, I was very cynical about AI early on. Even I have had experiences like that where I'm impressed by what it can do and it sort of raises the bar on what I thought was possible. I'm not discounting that experience at all. I'm sure he does have that experience. But I also think this is a weird and unfortunate narrative that we've sort of fallen in. And Jason has talked about this many times that like real science and real technology kind of
Starting point is 00:04:19 chases sci-fi. And I feel like sci-fi introduced this idea that AI is scary and that AI is scary and that AI is going to be thinking for itself and all of those things like from sky net and terminator to Scarlett Johansson and her evolving beyond Joaquin Phoenix and wanting to go off on her own like I think that's always spoilers so spoilers for her sorry everybody uh I feel like that's always the fictional narrative around AI and I do feel like that has an impact on real world AI CEOs and they're trying to be part of that narrative and play into that narrative because it's a helpful narrative for them. We're creating iconic, legendary history-shaping technology that's akin to the
Starting point is 00:05:01 atomic bomb or the discovery of electricity. And I mean, like, it plays into a friendly narrative if you're the CEO of an AI company. That's absolutely true. No, you're not wrong. I just think that AI has put up enough in terms of real results, real progress, real products, real services, that to me, we're just talking about how correct they are versus how incorrect they are when they make those large pronouncements. But I do think the context here is that open AI is expected to release their open weights model,
Starting point is 00:05:31 their kind of open source-ish project sometime soon, and then GPT5 is anticipated, we think sometime in August, which means that we're going to get past the hype and more into the testing. So, you know, will GPT5 be as good as it's expected to be? Well, we're going to have to find out pretty soon, and that's encouraging.
Starting point is 00:05:49 Pretty soon, in fact, the media is sort of saying, you know, early August or sometime in August, the Sharps at Polymarket, of course. Yeah, yeah, we were looking at this. Yeah, not content to wait and see what happens. There's already, of course, wagering going on. There's already a market going on for this. 25% are betting that by August 5th, which is early next week, that would be we are right on the verge of GPT5 being here.
Starting point is 00:06:17 Yeah. One in four. Yeah, one in four. And then by August 15th, people think it's a two-thirds chance that it's out. So my read of this lawn is that basically middle August is kind of the over-under and that after that will be a surprise earlier than that will be a surprise. But we're only a couple weeks away. That's what's exciting to me.
Starting point is 00:06:35 That's the cool thing. Remember how exciting it was when XAI dropped GROC 4 and they talked about GROC4 heavy and then the GROC4 coding? Then we got the Kimmy K-2 family of models from Moonshot. Then we got the Z.AI, I forget, the GLM 4.5 series. The point is it's been a flurry of models and Alibaba's Quinn models are improving. Yeah, we're in the middle of that open that meme that you are here the world's greatest AI model like we're right in the middle of another cycle, yeah.
Starting point is 00:07:03 But the question is, will this be just another iterative improvement? And if so, Open AI is going to take a lot of egg on its face? Or is it as big of a jump as maybe GPT 3 to 4 was or larger? And if that's the case, it does change the entire fabric and landscape that the technology world plays on. But, Lon, let's talk to some founders. I love it. Let's do it. All right. So we have three interviews for you today. One, cortical, then we're going to talk to Turing, and then we're going to talk to Murkor. Each one of these was an absolute treat. Cortical is probably the most out there in that we're talking about kind of like hybrid
Starting point is 00:07:37 biologic and digital computers. It's, uh, it's an interview I chased down for a long time. It's super fun. And then Turing and Mercor, a bit more traditional in the software space, but incredibly interesting founders, incredibly interesting companies, and especially I think in the case of Merckar, a vision for a company that could become incredibly instrumental to how labor functions in the future. So these are a treat. I had a blast.
Starting point is 00:08:02 Please enjoy them and say hello to three of our founders from the Twist 500. Hey, welcome back to Twist. This is Alex. Today we have another Twist 500 interview with a company that has been my white whale for several months, but I'm so glad to have them on the show. We're going to talk about Quarterly. Now, here at Twist, we love chips.
Starting point is 00:08:21 We love talking about what GROC is building to handle AI inference. We loved that etched is building ASICs just for LLMs. We've talked to X-Tropic and their thermodynamic computing paradigm. But all that's entirely silicon-based, more or less. Cortical wants to bring the world of biology to the world of computing in a way that when I first heard about it, I thought was science fiction, but they've taken it out of the lab and they're commercializing it. So please join me and welcoming to the show.
Starting point is 00:08:47 It's Cortical Labs CEO and co-founder, Han Wing Chong. Han, how you doing? Hi, good things. Thanks for having me on the show, Alex. I'm so excited. Okay, so you guys are fusing biology and I would say digital computing in a way that's really interesting. But I think to help everyone understand how you got to where you are.
Starting point is 00:09:05 Today, we should go back to the earlier days of the company when you guys were experimenting with your technology and you used it not to play Grand Theft Auto as everyone wants to do with generative AI, but instead to play pawns. So, Han, can you tell me about why you chose that and then how the experiment went to prove out what corticolabs wants to build? Yeah. So, you know, if you take a few steps back to, I guess, 2017-2018, there was, I guess, the first AI boom that happened. That was, I would call it the convolutional neural network reinforcement learning boom. I had just exited from my last business, and as part of that, it was a MetTech business that gathered a lot of data.
Starting point is 00:09:52 I was looking at using machine learning to do automated diagnostics and really got deep into the rabbit hole of machine learning and artificial intelligence. And I came across a paper written by Sir Demis Saba from DeepMind, who wrote that the machine learning, AI community had to go back to its roots in neuroscience because that's where most of the initial discoveries were made. Founders, if you've got a product, maybe you've got some customers or even just a little bit of traction, guess what? You've got yourself a startup. And it's time to make things legit.
Starting point is 00:10:34 You want to be official. Tighten it up. Investors don't want to wire money to a Gmail account with a PO box. No, they want to know you're a serious person and that your company is incorporated. In 10 clicks in 10 minutes, you can follow your LLC, get an actual domain name, launch your official website, claim your business email, and even start fast-tracking your trademark application. That's right. You need Northwest registered agent, the service that's going to do all of that for you. With NWRA's privacy by default option, they're going to use their address for all your public documents, not yours.
Starting point is 00:11:09 So you're not going to get a ton of spam or junk mail. And you can also get a legit address and working phone number with their virtual office setup. So get more than just an LLC. Get your entire business identity. Go to Northwest Registeredagent.com slash twist and show the world you're in business. So I did exactly that. I went to the neuroscience department of the University of Melbourne, which is my alumni, and spoke to the researchers there and asked what's exciting in your world?
Starting point is 00:11:37 What are you working on these days? You know, applying machine learning and AI to the new assistant. science world. And one thing that really kind of stood out to me was they said, oh, we have this device called the multi-electrodor array. It's not a new device that's been around, I guess, since the early 2000s, where you can actually itch in electrodes, in petri dishes, where you can grow neurons on top of them. And because neurons communicate electrically, and because chips run electricity, you now have this common language between the biological system and a compute system.
Starting point is 00:12:11 Electricity is the shared language between both neurons and chips. Absolutely. I mean, do you think about it, that's how neuralink works, right? I mean, the joke we have at Quirical Labs is that we are the inverted neuralink. Yes, you've taken the brain out and put it into the computer versus putting the computer into the brain. Correct. Exactly.
Starting point is 00:12:31 I had read the paper from Demis. I was inspired by it. And the first thing that Demas tried to do at Debe mine was he tried to get the machine to play palm. So I said, why don't we do the same thing? Yeah, yeah, yeah. Exactly. So quite a lot of the first few attempts were,
Starting point is 00:12:47 weren't really successful. But, you know, fortunately, these neurons are very malleable, and they're very dynamic and they self-organize. And when we realized that, you know, if we were trying to tweak the system while they were learning, you would end up with these very weird, so-called attracted dynamics where, because, you know, it's like how to describe it, when you're trying to find somebody in a shopping wall,
Starting point is 00:13:12 but the other person's also looking for you and you're looking for them. And so you almost never meet because they're always moving around in that same pattern. That's what we were doing. So what we ended up doing was we said, we're not going to change some parameters, we're just going to leave it fixed.
Starting point is 00:13:25 But we are going to, you know, flash them with different, like, signals and so forth. Maybe hopefully that way they wouldn't actually be continuously, like, we wouldn't be wandering around trying to find each other. So, Han, this is actually when I, I fell in love with what cortical was doing because according to, at least NPR's reporting of this, you guys provided kind of a positive and negative response electrically to the neurons that were interacting with the chip.
Starting point is 00:13:47 And you gave a burst of, I guess, unfortunate white noise to the device if it was the wrong action and then a positive organized burst of electrical activity if it got it right. So essentially, you very politely shocked it if it did bad pong and you rewarded if it did good pong. Kind of. I mean, I wouldn't say it's not really a reward or a shock kind of thing, because at the other day, we're still giving them electrical signal. It's just a structure of it. And this was actually driven by us coming across a theory developed by a neuroscientist, based in UCL in London, my name of Carl Fristin, which is actually really interesting, really fascinating, called the Free Energy Principle. The Free Energy Principle, The FNaginti principle kind of posits is that the brain, or all biologically intelligent systems, are actually generative models.
Starting point is 00:14:47 What we do in what it's now belief is happening in our brains is that we're not reactive machines. We're actually predictive machines. And so we're actually generating hypotheses about the world that is external to the brain in the brain as simulations. and then we're using the census that we receive and the ability to affect the world as many experiments to prove and disprove the hypotheses that are generated by the brain. Okay, so you guys did the Pong experiment
Starting point is 00:15:20 and showed that you could, in fact, create a hybrid biologic and digital computer that could play Pong reasonably well. It wasn't going to set world records for Pong playing. Now, take me from there to the CL1, your first commercial device that kind of brings this technology out into the world. How hard was it to go from successful experiment to commercialization and creating an actual shippable hardware product?
Starting point is 00:15:47 Oh, really hard. I would say this is probably one of the hardest things I've ever had to do because there's some element of doing something really cool, something like really novel and so forth. But more importantly is that if you're doing it as a commercial offering, it has to work a lot, like at least most of the time rather than some random time. So it has to work very reliably. You know, you do a lot of things that are very boring that no one really cares about, but it's important and vital for the product and the machine to work.
Starting point is 00:16:19 So for instance, life support plumbing, you can't publish that. That's kind of boring. Everyone knows what you need to do. But to actually go through the weeds of like developing, prototyping, and testing is, is quite difficult. So it did take a lot of work, but what was actually the motivation for it was that we had a lot of our colleagues reach out to us
Starting point is 00:16:39 after we had published the work and they were asking us, where can they buy this machine? Not this machine, but the machine that we had used. What software did we had to write? Whether we could help them if there were any questions. You know, we saw this recurring pattern.
Starting point is 00:16:55 I said, look, you know, people shouldn't be continuously reinventing the wheel, right? Like a round wheel works really well, and we know how to make it, maybe you should focus on building the rest of the car and we can build the wheel. And then you just use our wheel. If you think about the explosion in AI, right?
Starting point is 00:17:15 It had a very fertile ground because the groundwork had been laid by decades of gamers buying GPUs, keeping the likes of AMD and Nvidia alive until the AI systems came online, right? The venture capitalist, you're welcome for all my purchases to play games. Exactly. So I think this is something that we saw that and we're like, well, you know, somebody's going to take that leap and try to support the research community. And we decided that that was going to be our thing going forward. Mind you, we still do a lot of research work, you know. Of course. We have quite a few publications that came out. We have some very interesting stuff that's happening in the lab at the moment. But we think that this is more important that no one company can do this.
Starting point is 00:18:01 by ourselves. And we need to actually expand this out to everyone who's interested and try to reduce the barrier of entry for people to get into the space. Okay, I want to talk about commercial applications in a second. But first, I want to give people a quick look at this. So if you're on the video version of the podcast, you'll see what I'm about to show you. First of all, Han, what is this here? I believe this is from the Pong era. It's a petri dish with what appears to be a small dish in the middle. So what is this? Yeah, so this is what I was sorry about, the multi-electric array. It's got a well, so that round thing, that cylinder there, is used to contain the, what we call the cell culture media.
Starting point is 00:18:42 So the cell culture media is that liquid that bates the neurons to keep them alive. It's very similar to, I guess, cerebral spinal fluid, you know, that bathe our own brain. As these neurons are processing, they're consuming glucose. But at the same time, they're also producing lactic acid, just like running. You end up with the cramps. So, you know, one thing to be here we try to do is prevent it from getting too acidic. And we use the CO2 as a buffer for that. Hon, I'm just imagining now, like, you know, the dog didn't eat my homework.
Starting point is 00:19:12 Instead, my computer got the cramps, so I couldn't do it anymore. Yes. I want to connect this to this image right here. This, I believe, shows a closer up of the neurons actually spread across a similar electric interface. Yeah, yeah. So if we remember that image of the multilelectric array with that cylinder thing, in the center of a cylinder, there is actually a CCD or a CMOS sensor. This is actually the same sensor that you have in your cameras. And each of those square grids are actually a sensor that captures the electrical productivity. It's also somewhat photo receptive. So that's why we have to keep them in a double.
Starting point is 00:19:58 We all understand the importance of a crisp, memorable, easy-to-spell domain name. One of those names you can say over the phone and people know how to type it in without asking you the spelling. But let's get real. The good ones are either taken or there's some poacher who's holding it and waiting for some huge payday and they don't reply to you. Even if you want to pay for a premium domain, you don't want to use up all your runway on a domain name. That's just the truth for a startup. You want to put that valuable cash back into your startups operations. So you should consider this, a dot-tech domain. You can get a clean, crisp, super memorable name for your website and company and signal out loud to your customers and investors.
Starting point is 00:20:43 We're a tech company that's instant branding for you. That's why over 500,000 founders have collectively raised over $5 billion in investment, building their companies on dot-tech. So skip the hassle, head to www.w.gat.comet tech slash twist. Or go to your favorite registrar and grab your dot tech domain today. And so you'll see things like, you know, the large long piece that's going across, that's an axon, and all the stringy bits are the dendrites. So if you think about that, that's just one segment, and 50, that's microns. That's actually the width of a human hair, 50 microns.
Starting point is 00:21:22 And so, yeah, that's how small it is. and the level of connectivity, the level of self-organization as well is actually pretty phenomenal. It's so, this is, this image just, I've actually stared at this for probably 20 minutes just straight because it's such a, it's literally to me like 1980s science fiction,
Starting point is 00:21:42 but just actually real and now turning into a commercial product. Like this is so cool. And I just want to show everyone, the culmination of all this work that you guys have done at Cortical Labs is this bad boy. This is the CL1. If you're watching, sorry, if you're listening to the audio version, imagine a very long toaster with a clear plastic top and lots of tubes. And I believe this bad boy can keep neurons alive for six full months, Han, going back to your point about the plumbing for life support, yeah?
Starting point is 00:22:10 We'd say six months because it's just easier to explain it that way. But actually, you can keep, you know, we as a lab and other people have tapped them alive for, for, you know, months, even years kind of thing. When you're talking about trying to do large-scale experiments, you need a lot of samples. You need a lot of units to get statistical significance in power through a large end. We wanted that scale. We wanted that longitudinal ability to study the neurons, but we also didn't want to have to spend a lot of effort doing it. So a lot of, a bit of engineering went into a bit of plumbing to build this system that encapsulates the neurons in a sealed environment, which is really important because these neurons don't have an immune system. And so, you know, if you had COVID and you coughed into it, they would actually just also get COVID.
Starting point is 00:23:02 So your computer could literally catch the cold. That's really funny. Yes, pretty much. You know, don't eat bread next to it because you'll get the yeast and mold that goes into it and they'll kind of kill the cells. But we also built in our own neural interfacing system. You don't really see it at the base of the CL1 has a compute unit. So we have FPGA and what do we call it general purpose, compute units in there as well. That's where we digitize the signal.
Starting point is 00:23:35 We then put it through our very tight processing loops. And we're developing APIs and SDKs for this. so that, you know, as a programmer, you don't have to go deep into getting these things to stimulate the neurons with high precision, but, you know, a lot of code. You can just use our API, and we've developed a DSL domain-specific language,
Starting point is 00:24:02 to express how do you want to stimulate these neurons at individual channels at low-latency loops? That brings me to kind of, I think the question on everyone's mind who's listening to us right now, which is, okay, this is awesome. You've made a digital biologic brain. What's it good for? And I think that this is actually where my knowledge really kind of runs short because I'm not sure what compute loads, what experiment types, what commercial applications, the CL1 and its successors, are going to be best at. And so
Starting point is 00:24:33 when we think about the market that cortical outs is going after, how big is it? I think the two most interesting things that we've learned a long way about the system, here is that they do two things really well. Well, actually three things, but that third thing is attached to the second one. Firstly, they use significantly less energy than your traditional compute. They're not a phenomenon-based architecture. The energy is derived from the glucose in the actual cell culture media. So doing it back on the envelope calculation, we came across some really phenomenal numbers. So for our original upon experiment, we grew about 800,000 to a million neurons. And we did a very rough estimate where we just said, what's the glucose content in the
Starting point is 00:25:18 media? How often are you changing the media? So therefore, if it retained a media, you know, X number of times a week, and there's this much of glucose and they're combusting all of it, assuming, because they never combust 100%, you know, how much energy were they using? And it turned out it was actually 10 to the minus 4 watts of energy. So that's 0.001 watt of energy. It's probably even less than that because that's a massive overestimation. But still, even with massive overestimation, it's effectively zero. And glucose, last time I checked is sugar water. So it's nice and cheap.
Starting point is 00:25:50 Exactly. I mean, if you think about it, you and I as walking reference, you use only 20 watts in our brain. The guys at Xtropic were telling me this. They're like, you know, we have this amazing probabilistic computer that runs on 20 watts and it's literally inside your head. And they're working on thermodynamic committee, which is different. But I've been thinking about that ever since they brought it up. Okay, so clearly this is not going to be used right now to train the next major LLM,
Starting point is 00:26:14 but the people who are purchasing these CL1s that will come out sometime next year, what do they want to do with it? I think the thing you understand here as well is that LLMs could not exist 20 years ago, right? Even if we had the compute available, we could not make it work because LLMs required large datasets. I mean, they're called large for a reason, right? We've got into this point of, what is it, 40 years, what are we not, 20, 25, let's say 40 years of the public internet being this repository of free language training data. There are lots of other domains that are not language-based that do not have large data sets that are publicly available. And this is the second point that we've discovered along the way is that if you do a hit-to-hit comparison, and there's a paper.
Starting point is 00:27:06 coming out. Actually, this was in Europe's about two years ago where we put it in the RL Workshop. It's one of the posters. You took reinforcement learning agents. So we did an experiment where we took three of them, DQN, which is your classic AlphaGo. It's kind of, you know, I guess the gold standard, but it's kind of old now. There are better algorithms out there than what's so-called sample efficient. And we put that against the biological system and we get to the same game. But we said, here's the one thing.
Starting point is 00:27:33 We're going to constrain the reinforcement learning. agents so that they get the same amount of data that the biological system gets, right? Because if you think about reinforcement learning systems, nobody really talks about this. But the way they actually learn in a simulation is that they spawn millions of parallel processes and they speed up the time of the game by 200, 300 fold. And so if you read the Alpha Star paper or something like that, Deidmind kind of said that, you know, if you were to train a thing or human to play at the amount of data that Alpha Star was playing, it would take 400 years of continuous gameplay to get to their level of performance.
Starting point is 00:28:15 And so what we did was we said, we're going to sample constraint it because if you think about this, the real world, you can't speed up time. Time ticks at the same rate that you and I have. And if data is a factor of time, right, say bit rate and so forth, then if you, And if you wanted to get, you know, say 10 minutes of data, you would have to spend 10 minutes collecting that data. Knowing that, you know, these systems use far less data when we did the comparison with the reinforcement agent and, you know, data being a threat of time, we realize these things are actually potentially very good for problems that don't have a data set, problems that are real
Starting point is 00:28:56 time specific. And that is actually the vast majority of the domains that have not actually been touched by language. This is when I get really excited. And I don't want to make you project the future too much because that's not really fair. But to me, with neurons that we can interact with the software, it seems like we've managed to take kind of the best of both worlds, the programmability of computers, as we understand them, and also the probabilistic low power consumption and kind of generalized intelligence of brains and brought them together, Hans. So to me, in time, it feels like we should be able to use both, you know, sure, our mass GPU clusters that everyone's building, but also we should have probably large tanks of biologic, you know, neurons that are also helping us do a lot of, a lot of work. Is that too kind of science fictiony, or am I on the right path? We're all familiar with AWS, Amazon Web Services.
Starting point is 00:29:59 That's the cloud platform that powers so many of your. favorite brands. But do you know about AWS Activate? That's their program for startups, where they provide up to $100,000 in AWS credits for all startups. Whether you're backed by an investor or you're bootstrapping. Money is time to keep innovating. Time to delight customers and time for you to keep gaining traction. You need runway and AWS is Activate is going to help you with that runway. We hear this story from so many of our founding university companies. They're finding product market fit, the words getting out about their product, just a little boost. Finding some savings here or there to get a little extra time can be the difference between getting traction and bringing in revenue
Starting point is 00:30:39 or, hey, let's call it what it is, running out of money, okay, and shutting down. AWS knows this. That's why they've created the ultimate toolkit for early stage startups looking to boost growth. With AWS activate, you're going to get up to $100,000 in AWS credits, hands-on support, and training plus exclusive discounts with some of our favorite companies and tools. So start getting the support you need at every stage of your startup journey. To learn more, visit aWS. Amazon.com slash startups slash credits. That's right.
Starting point is 00:31:11 AWS. dot Amazon.com slash startups slash credits. I think you're on the right path. I mean, the thing about it is that we're good at some things that machines aren't. And machines are good at things that we aren't, right? So this is called Moravex paradox, particularly in the robot. And I think this is the motivating fact for why Elon set it up things like NeuroLankteri and the BCI people, which is that as AI gets smarter and better, we also need to skill up, right?
Starting point is 00:31:37 They're getting, so if you think about it, the machines are getting better at what we reserve as traditionally human or biologically centered tasks. It's not beyond the realms of possibility that we could also go the other way, assuming that the BCI has really become a thing, where not only do we utilize our brains for the low-power, stochastic computation, the publicistic computation. But we can also just, you know, have a, I don't know, an iPhone chip in there that can just give us the square root number at the speed of thought. Right. So I can essentially allow compute to handle the math that I don't want to do myself, but all
Starting point is 00:32:08 the learning I can use my brain for. So once again, you guys are kind of neuralink in reverse. So look ahead of just a couple of years, not super long term, but the CL1 gets into the market next year, customers. You guys are also going to have an API to let people access the technology remotely if they don't have the lab setup necessary to run this thing. What's the next generation? What are you guys going to build after the CL1?
Starting point is 00:32:31 What is the CL2? Well, I think the CL2 is, we've been thinking about it, and there were a lot of features that didn't make into the CL1 that we'll probably start thinking about putting into CL2. I mean, the things, your standard things, right, make it like smaller, make it, you know, probably cheaper to obtain, easier to program, you know, maybe more modular units. there are quite a few things that we've been thinking about.
Starting point is 00:32:57 The other question is, does it seem like it to the CL2 or does it get pushed out to the CL3 is a discussion for the team? But again, I think it's still a little bit premature. What we really want to do is get the CL1 out to our partners, our research collaborators, and really get their feedback. What does it work? So that way we can sort of triage and prioritize what we want to do for the CL2. No point of putting in features that no one else.
Starting point is 00:33:24 once. I mean, we might want it, but we have the ability to just spin up whatever, CL3, 4, whatever internally. But, you know, if there are features that are missing that we have on our list that's kind of lower down our sort of priority, but it would, you know, tremendously help the community get going faster or do more work for less. I think that's something that we want to be prioritizing. You talked about using mice neurons at the top of the show. And anyone who knows anything about biology and experiments knows that a lot of mice and rats die. So that is kind of par for the course for how we currently handle ethics. Are mice neurons and human neurons radically different?
Starting point is 00:34:08 Are they relatively fungible? Silly question, but I'm not sure about the answer. Before I get to an ethics question, I thought we'd start there. So this is the really interesting thing. We used to think that there was no difference between a human and a mouse neuron. If you read any of the papers before, I think, 2020, everyone said, yeah, there's no difference. All mammalian neurons are the same.
Starting point is 00:34:29 And it turns out that I think on average, human dendrites are actually longer than mouse dendrites. But they have the same number of ion channels. So these are the gates that allow the potassium and sodium to go back and forth between the cell and the external environment. and that's what generates the actual potential. But having space that more, you actually have the ability to hold more so-called electrical states.
Starting point is 00:35:00 Well, that's the theory that's been pushed by some of the researchers at Harvard, MIT. And so, yeah, it turns out they are different, and that probably contributes to the differential in the overall performance, and we've seen that as well. The human neurons are better at playing and processing information. But for now, for now, the mice neurons are sufficient. for the state of technology and the progress you guys want to make.
Starting point is 00:35:24 So we're not, there's going to eventually become a new story entitled Startup takes human neurons, puts them into computer, are we creating a new form of consciousness? Oh, oh, panic. But it sounds like... Well, we actually do use human neurons now. Alongside mouse neurons,
Starting point is 00:35:39 human neurons is the other really big area of research because that's where a lot of the biomedical stuff happens where we're looking for new drugs, you know, understanding disease models and so forth. We stand at the shoulder of giants. we use a lot of the techniques that have been developed by the, what you call the synthetic biology, the biological engineering field to grow stem cells obtained from adult cells. So if we went back about 10 years ago, in Japan, there was a researcher by name of Yamakana.
Starting point is 00:36:14 Professor Yamakana wanted them at what price, I think, a few years back for discovery of inducerable pluripotin stem cells. And what he discovered was that if you took anybody's cells, your cells, my cells, you know, skin or blood, anything with a nuclei,
Starting point is 00:36:30 and you expose them to four compounds, you can actually reverse the clockback of these cells where they go from a blood cell or a skin cell back into a naive stem cell that can they become turned into anything.
Starting point is 00:36:46 We utilize that functionality. Yeah, to, to essentially grow human neurons from stem cells. So this way we don't kill any animals. It's continuously renewable as long as you provide the right conditions. And yeah, that is how we do it now at Chronicle hours, because we don't really particularly like killing the mice as well.
Starting point is 00:37:08 And so we think that this is actually an easy approach. It's absolutely brutal. No, I'm totally in favor of that. I think there's going to come in time some questions that appear ethically interesting with what you guys were building. but I'm also of the opinion that they're not actually going to be ethically serious. I think they're going to be mostly good for not making fun of the media of my industry, but great for a headline, but less dicey in practice.
Starting point is 00:37:30 So I'm very bullish on the company. Just before I let you go, how has demand been for the CL1? I know you guys have a form-up. I know you guys are working on deliveries. Has demand exceeded expectations? Is it about what you thought? Just where's the business side of things going? Yeah, demand has actually far exceeded what we had expected.
Starting point is 00:37:57 Suffice to say, we are now struggling to try to figure out our logistics and supply chain to balance supply and demand. We've had a lot of sign-ups, particularly for the cortical cloud, I think over 3,000 sign-ups. We have no ability to service more than 20 or 30 people. on the cloud system. So you have 100X, more demand than you can say. Yeah, and so they're on the waiting list and so forth. And then, you know, for the hardware, we have had a lot of labs and research groups reach out to us. We are, you know, very excited about this, but now we're looking at the challenger of, oh, God, now we actually have to ship this thing.
Starting point is 00:38:41 And we're going to have to ship lots of them. And, you know, if they break in the field, we're kind of screwed. So, you know, we better make sure they don't break out there. So there's a lot of like, we're doing our best at the moment to finalize all of the software, finalize the hardware, you know, getting our partners ready. It's also very, very expensive, capax wise, right? Because we have to, we don't charge our customers until we ship them, but we still have to build the units first. And so, you know, trying to figure out how the next source of funding is going to come from is going to be a bit of a challenge. Han, want to just charge them half up front and resolve some of your cash flow issues?
Starting point is 00:39:15 Well, we could, but you know, it's... Do us. Yeah. We could do that. But, you know, I think for us, we want to make sure that, you know, we... I don't want to do a Kickstarter kind of thing, right? And at the start of the product like this. I don't think that's Kickstarter.
Starting point is 00:39:31 But I respect where you're at. I guess then the correct question to close with is this. Sometimes people say that venture capital is a little bit conservative. I think the VCs that bet on cortical were clearly being relatively adventurous, which is good. So as you guys do need more capital, is the venture capital ecosystem in Australia, Europe, and the West at large? Is it interested in putting more money into the business? Or are you guys still a little bit outside of what you might call it the venture norms? Yeah, I think we're outside of venture norms.
Starting point is 00:40:00 I don't know what it is, but the venture industry likes to gravitate to thematics. So in this case, it's been, what is it? We went from transformer LLMs to agents. and something whatever the next year brings. It's very hard, I think, unless you fall into specific thematics to track that. So, you know, I think we have some of the best investors in the world backing us because they're just the ones who are willing to take bets, which is essentially what the industry initially started out with, right?
Starting point is 00:40:34 Blackbird ventures and Horizons ventures. Correct, yeah. And, you know, we've had, you know, a new round that was just put together get us a little bit more fuel to deliver our products by kind of people like 3C, AGI, they've just come onto the game. And then, you know, we've had players like Nuketel as well who have backed us. It's really about, I think, trying to figure out who are the Mavericks in the space, who are the ones who have the imagination, right, and the sci-fi background.
Starting point is 00:41:01 Because at the end of the day, if you have 200, like maybe, not 200, maybe like, say, even 10 foundational model companies, but they're all pretty much, I couldn't really tell you, the difference between Claude versus Gemini versus ChatGBT, GBT, versus, I don't know, whatever else, like Quinn or Geepsy. They are mostly the same. And so if they're all mostly the same and I can jump between one another, I don't really see any, you know, significant stickiness or moat to it. So anyway, that's a problem of the VCs to resolve.
Starting point is 00:41:32 But, you know, rather than all jumping in into one space, I think we should try to spread it out. So for us, you know, we're very focused on delivering the product and, you know, we welcome anyone who's interested who's listening to your podcast, who understands the fundamental nature of building hardware first before you can get the software, you know, to look into the space because the thing, as we really talked about, the AI space already had that groundwork put in by the gamers, right? We didn't need to invest in the hardware because the gamers just bought it already. So I think that's something that we have to bear in mind
Starting point is 00:42:08 with any new compute space. There is always going to be a have a component, and we shouldn't shy away from having to invest and build into that space. Well, I do look forward to the eventual future when we have Canva in Sydney and we have Cortical in Melbourne, and we'll have a good old fashion competition about who's going to be the next future leader of Australian technology. Han, thank you so much for coming on. Thank you for answering all of my questions. And when you do have so much extra production capacity, I'll give you my address to ship my CL1 and you just tell me where to send the check, okay? All right. Thanks, Alex.
Starting point is 00:42:41 Thanks, hon. When meta invested in Scale AI, absorbing its CEO and co-founder in the process, startups that competed with Scale saw an absolutely huge opportunity. Now, mostly people focused on the data labeling side of what Scale had done historically, as the place that startups might actually capture the most market share. But Scale also offered AI evaluation tools to folks who build LLMs. So startups that offer AI evaluation tools may also be in line to benefit from scale aligning with Meta. a company that competes with other firms to build the next great AI model.
Starting point is 00:43:16 Why work with Scale if Meta owns about half of it? One startup CEO, Turin's Jonathan Siddharth, told Reuters after Scales partial exit to the social giant, that leading AI labs now realize that neutrality is no longer optional amongst service providers. It's, quote, essential. So to help us understand the LLM evaluation market, just how big it is and how data comes into play in 2025 to build those next models, please welcome to the show. It's Turin's CEO, Jonathan Siddharth. Jonathan, hey, welcome to the show.
Starting point is 00:43:44 Thank you, Alex, for having me. It's great to be here. I'm very impressed. Also, we're both in our home offices today, and I think this just goes to show that remote work, not entirely dead, even though everyone seems to claim that it is. I'm glad that you're here.
Starting point is 00:43:56 So, starting for folks who are less aware of what LLM evaluation is, Jonathan, can you just give us the working definition from your side of the fence? Yeah. So it's really important to benchmark LLMs, to figure out where they are good at, where are gaps in the models, so that you can generate data that could help the LLMs get better at those specific tasks. One big challenge, Alex, that I see today in the overall evaluation and benchmarking market
Starting point is 00:44:26 is that a lot of the evaluations are somewhat academic and somewhat synthetic and don't connect to real-world applications or real-world use. Ideally, you want AGI to progress in a way where the models get better, at useful tasks. So when we evaluate models, the three dimensions to look at are complexity. Like, are you evaluating them on really hard tasks? Real world use. Are you evaluating them on something that a human would actually care about?
Starting point is 00:44:58 And third is diversity. You want like a wide breadth of test cases that you're evaluating the models on. It's such an exciting space. And I think of evaluation as step zero of data generation. That's why we do both evaluation and data generation. Okay, so let's talk about the benchmarks because there's been some commentary. I think Apple had a paper that came out and they said that, you know, one of the problems we have with a lot of the benchmarks that everyone likes to shot out their new LLM and put them up against
Starting point is 00:45:25 and have the charts is that there's data contamination issues, there's overfitting. So based on what you just said about trying to solve real-world problems and also the fact that we know that these benchmarks are getting a little bit, I don't know, dicey to use, is the standard way that companies announce how their LLMs perform an effective form of evaluation, or is that mostly window dressing to get more social media hits because you're one point higher on one test or another? So I'd say the labs care about the benchmarks because it's good for bragging rights. It's good for recruiting.
Starting point is 00:45:59 Like who has the best model for coding? Who has the best model for STEM, et cetera? So benchmarks serve some useful purpose. And you can argue that in today's $100 million sort of talent wars, or maybe it's closer to $300, 400, 400 million dollar talent wars, those bragging rights help because the best researchers ideally want to join, like, the winning team, like who's already kind of close to being number one. But the labs care about two things.
Starting point is 00:46:26 Public benchmarks and private e-bails. And I would argue that private e-vails are actually more important because there you're evaluating how well is your model doing relative to the competition on the prompt distribution that you care about, meaning if you're Google or if you're meta or if you're opening eye or Anthropic, the type of queries you might get inside in a coding context might look different from what you might get on the phone
Starting point is 00:46:56 in like a general chat assistant context or what a human might put when you're searching through like a desktop. So you want to optimize for your own query distribution that your users are testing your model on. So that's why these private e-vals are helpful. So you need public benchmarks and private e-vals. The challenge with public benchmarks, Alex, is if you look at coding, for example, there is this good benchmark called SweeBench, which is created by this lab at Princeton. Last year, we went from 2% to 50% in Swaybench. This year, the best models are already north of 60%.
Starting point is 00:47:31 we'll probably saturate Rebench this year, and then where do you go? Right? Like, we haven't, clearly we haven't automated all of software engineering yet. The benchmarks are getting saturated. Yeah, can you double-click on what you mean by saturated in that context? I think it's an important point. Yes. So if a model does 90% plus on a benchmark, you've kind of aase the test to some degree.
Starting point is 00:47:53 And researchers call that saturating a test, because now the test no longer tells you whether your models are improving or not, right? So the obvious answer is you have to create a harder benchmark where the models would have even more headroom to climb, right? Even more room to improve, which is why at Turing, we're actually creating a benchmark for coding that's even harder than SwayBench. And it gives us deep satisfaction because we managed to have all the models start at zero. Oh, okay. So everyone's failing 100% of this new tech. Ah, okay, let's talk about private evals because this is a core thing of what? Turing offers to its customers. So I'm curious, let's say that I'm open AI. I have a new model. I want you guys to take a look at it. You find a couple of places where it's not quite up to snuff,
Starting point is 00:48:41 not quite where we expected it to be. So then do you turn around and say, hey, guys, here are the places where there are gaps and then help them solve those issues? Or do you guys just point out, here's the spot where you know, you might want to do some more work? So we do both. We evaluate the model and we generate data to help the models improve. But let's Let me take a step back, Alex, and let's look at this landscape, which is super interesting right now. So, as these models, what's happening is, as these models have gotten smarter and smarter, the data that's needed to advance the models has become increasingly harder to generate,
Starting point is 00:49:19 right? The models are advancing in depth. They're improving in coding, STEM, reasoning, et cetera. They're advancing in breadth in multimodality, multilinguality, multi-industry, et cetera. and the models are becoming agentic, meaning the models can now execute complex, multi-step workflows in a real-world business context. Now, with the models advancing like this,
Starting point is 00:49:40 what's needed in a platform is you don't just need Iron Man, you need the Avengers. Frontier models need frontier data. Frontier data needs a team of frontier humans, right? So we have like PhDs from physics, chemistry, math, biology, expert Olympiad level coders, people who are literally at the pinnacle of their fields, and sometimes you have to have them working together to break the model.
Starting point is 00:50:06 For example, in physics, if an expert is evaluating the model in physics to test some theory in physics, a PhD in physics might ask the question to test a theory. A software engineer might need to build a simulation to test that theory, and then a data scientist might have to analyze the results, of that simulation. So you need to daisy chain really smart humans together to generate data. Actually, Jonathan, can I talk to you about that?
Starting point is 00:50:34 Because what you're describing to me sounds like very interesting tests, having people with different domain expertise, has worked together to find places where the model might have a weakness or might not be able to answer something. And then you say that they're generating data. Is the data they're generating simply where the LLM in question misses or doesn't meet expectations? Because I thought this was more like,
Starting point is 00:50:57 here's new information for the LN to be trained on versus looking at a model that's already been trained. And then does that make sense? I feel like I may have had this backwards. So that's a great question, Alex. So the way these models are trained is there's a step called pre-training where you basically feed the model gobs and gobs of data.
Starting point is 00:51:16 The war in pre-training is mostly over. Like almost all the models are trained on the same subset of the internet. now the battlefield is post-training, and there's a new field called mid-training, which I can get into that later. But with post-training, what you do is you have these human experts create question-answer pairs where a software engineer might ask a question like, hey, how do you create an app that connects dog walkers to dogs?
Starting point is 00:51:42 And can you write it in both Swift and Kotlin for Android, right? And now the system has to generate the app. So that's an example of a supervised fine-tuning data set where you gave it a prompt and a completion. And the model learns how to how to how to how to auto cover, how to respond to that prompt, right? So you need experts in different fields. Now, the reason this has gotten harder is two years ago, a very low-skilled contractor was capable of creating tokens that could have advanced the model. Now because the floor has gone up. Now you need PhDs from Stanford, Berkeley, MIT to figure out where the model was.
Starting point is 00:52:20 You first have to break the model. You have to ask a question that literally stumps the model. And then the humans create good question pairs of questions and answers that you then feed into the model for fine-tuning and then the model learns. Ah, it's the question and answer pairs that generates the data for the model. Okay, now that makes good sense to me. So it sounds, though, like what you've done is back to your Avengers Iron Man point, just got together like a super set of nerds. And basically you're applying the smartest humans to find the flaws in the most advanced LLNs. What happens, Jonathan, when we don't have PhDs that can ask questions that the model can't answer?
Starting point is 00:52:58 Because it seems to me that we've raised the bar, but we're raising it towards a ceiling. Yeah, yeah. I mean, I've oversimplified it a little bit. So I would say, like, so one example I gave is you give question answer pairs. There's another type of data that you generate for reinforcement learning where you, the humans are generating questions and verifiers. They're not generating the solution. They're generating a way to verify
Starting point is 00:53:26 whether the solution is correct or not, which you can do with certain fields like coding, math, hard sciences. You can do that, right? Write a program to sort numbers in Python. You can write test cases to verify whether the program that you wrote is correct or not. But you're not telling them every single step along the way.
Starting point is 00:53:44 You're just saying, I can verify that what you've done, the work you've done does generate an acceptable answer. Correct, correct. And the version of this for enterprise is you create an RL gym, a reinforcement learning gym. I think this is one of the coolest things in computer science. Like, just like a gym where humans go to train, this is like a gym where an agent trains. And in these gyms, you create basically clones of different websites, like a Dodeash or an Uber
Starting point is 00:54:12 or a NetSuite or a Salesforce. you create this virtual environment with prompts or workflows and ways to verify whether the task is completed. Now, the task would be, hey, salesperson, why don't you research everything you need to about this prospect and then update Salesforce in this way? And as long as when you create this virtual environment, you create these prompts and you create these verifiers. Now an agent tries out different combinations.
Starting point is 00:54:42 The agent is basically trying to use the tools in an appropriate way to actually complete that task. And it's a cool way to learn whether the agent is learning through trial and error, how to execute a complex, complex, multi-step workflow. You're not teaching the agent. Go to LinkedIn first, go to Zoom info first, and then check out Salesforce for prior conversations. You're not teaching it that, but it's learning. It's learning just based on feedback on, okay, if I accomplished this, I get a reward, and then it learns how to operate in a way that maximizes future rewards. really cool. To your question of what happens if we run out of human intelligence, right? Fortunately, I think we are still a significant distance away from that. We've run out of
Starting point is 00:55:24 internet data, but there's lots of intelligence trapped in the minds of humans that is yet to be transferred from human minds to machine minds. I would say, Alex, even the way, like, I love your show, I love what you and Jason have done with this show. But when you're thinking about how to interview somebody who's on the show, what type of questions to ask, what type of prep to do, I guarantee that knowledge is not distilled into the models from Open AI or Anthropic or or Meta or Google. The only way that's going to happen is if we hire somebody like an Alex or a Jason to work on Turing to use our tools. And we didn't get into this, but we have tools that make it easy to keep the quality of the data high. And we have AIs that will assist you
Starting point is 00:56:12 in generating this data so that we can make this scale. Pun not intended. No, I was going to let that one slide. We don't allow more than two dad jokes per episode. So we have to save them if that makes sense. So here's my question. I understand that scale being subsumed by meta has been great for the touring business. You told me before the show you guys are at nine figures of revenue and profitable and growing.
Starting point is 00:56:36 And honestly, hell yeah. I love that. But if I'm open AI, why wouldn't I try to replicate what? what Turing has built inside of my domain. Because if there's one thing that large AI model companies have today, it's a lot of access to capital. And so as we've seen from Meta trying to buy literally every human who's touched an AI before, there's a lot of money to play with here.
Starting point is 00:56:57 So are you guys just so good that open AI doesn't want to try to replicate what you've built? Or is there something else that I'm missing in that it's good to have a third party versus an internal group be kind of telling you where you might want to work on things? Yeah, I mean, that's a great question, Alex. Firstly, to advance towards ASI, you need three things to move in parallel. Actually, you need four things. You need research and algorithms.
Starting point is 00:57:27 You need compute. You need data. And you need the application layer. Those are the four things, right? Research, compute data, and applications. Now, the labs are really good at research, right? And kudos to OpenAI, Anthropic, Meta, Google, Apple, all of these companies for advancing the frontier forward. So they're spending their
Starting point is 00:57:46 energy there. We could flip this question to also ask, why don't they build their own compute? And in some areas, the labs are investing in it. I mean, we've heard of Google having TPUs and other companies trying to do that. But Nvidia clearly has the edge today. And there's some good companies like GROC, Cerebris, etc. that are also building custom chips. So compute is also advancing. And the third pillar is the data pillar. Right. Now, generating this data is exceedingly complex. Again, three dimensions. It has to be complex.
Starting point is 00:58:18 It has to be realistic, meaning it has to mirror how a real human would interact with the model. And third, it has to be diverse. It's very hard for a lab to get all of this in-house. I'll give you an example. Like today at Turing, this month alone, we are hiring PhDs across the board in physics, chemistry, math, biology, at the level of granularity of somebody who's an expert in dark matter, somebody who's an expert in black holes, somebody who's an expert in molecular biology, right? Like these very niche fields. So it's, and you ideally need these humans part time because the way
Starting point is 00:58:58 these humans are good at the job of training the models is because they are also good at their day job, which keeps their skills sharp. So you kind of, and once you've ingested that knowledge, You might want to move to the next frontier and the next frontier and the next frontier. So you need a platform or a partner that can scale up very quickly to elite talent, manage the talent to make sure the talent is generating data part-time. The data is high quality. We have to build a ton of tools. Like we have this platform called Allen, where AI is used upstream of the human to minimize the work for the human.
Starting point is 00:59:33 AI works alongside the human, and AI does quality control. It may sound a little dystopian. It's like, you know, AI. overseeing humans to improve AI? I'm not afraid of our AI future, personally. So that doesn't, my P-Doom, I guess, is very low. But I want to go back to your Black Hole's Dark Matter point because I'm a bit of a science fiction nerd,
Starting point is 00:59:53 and I'm also a bit of a space nerd. So those are topics that are near and dear to my heart. But if I spend a lot of time helping a particular LLM better understand why we think about dark matter, why we came up with the idea, you know, galaxies and spinning and not having enough matter, blah, blah, blah, blah, blah. Will that work to make the LLM smarter in that particular domain have spillover effects to
Starting point is 01:00:17 other areas? Does it make it more intelligent writ large or more intelligent only in that specific domain? I'm not sure the answer here, so I figured I'd ask. Yeah, phenomenal question. What we know with relatively high confidence is that when the models get better at coding and math, they seem to get better in a wide variety of other tasks that have nothing to do with coding and math. Something about coding seems to teach the models how to think in a more structured way, how to communicate with less ambiguity, how do you think step by step. So coding and
Starting point is 01:00:57 math, there have been some experiments in how they have out of domain performance outside of just pure code generation. And coding and math is also interesting because sometimes when you ask complex questions, Alex. The sub-steps involve being able to compute stuff or calculate stuff and pass the results forward. For example, Alex, if you had a question like, hey, what are some interesting themes in investing in AI? The answer to that might involve a model that knows how to write code to query pitch book, write some Python code to analyze the data and say, you know, AI and healthcare seem to be spiking. AI in retail seems to be going down. That requires the model knowing how to write some Python code to analyze the data. It knows math plot lib to plot the results
Starting point is 01:01:41 and show you the results. So coding and math have broader applicability. But it's also true that this is one of the mysteries of these language models. The models seem to learn some representations about the world that seem to carry over to other areas. I can totally imagine you understanding human psychology. If the models were really good in human, psychology, it could help the model write better website code that designs the website to be more persuasive, maybe like more conversion optimized, maybe makes it better for generating copy. So it may be, who knows, maybe there is something to learn from studying the universe that helps people in their day to day. It could be like some indirect thing, but we don't know for sure.
Starting point is 01:02:26 So Jonathan, here's to a next great couple of years as the world gets bigger and faster and smarter. And just before I let you go, where can people find Turing on the internet? And is there a particular job that you're having a hard time hiring for that she wanted to shout out to the world? Great. Thank you, Alex. If you're working on AGI research or in human data, come talk to us like we are hiring across the board. And if you're building a frontier foundation model and you need data to make the model smarter at coding, reasoning, STEM, et cetera, come talk to us. We're at Turing.com and you can email me at Jonathan at Turing.com. Perfect.
Starting point is 01:03:03 All right, Donald, thank you so much. We'll have you back on in another six months when AI is twice as smart. Until then, this is Twist. Bye. All right, so here on Twist, we have talked ad nauseum about AI and the job market, mostly about how AI might impact the job market change.
Starting point is 01:03:18 Who does what job, change what jobs are done. You get the idea. People have a lot of concerns that AI might take all the jobs. We'll have to see what happens. Today, though, I want to talk to a company that wants to use AI to help people not just get jobs, but get the right jobs. And also to have companies find the right people.
Starting point is 01:03:33 Because as it turns out, and you know this if you've done any hiring in your career, finding the right people pretty much is terrible, even with the modern tools we have today. So to help explain how this is going to work, please welcome to the show. It's Merker, CEO, and co-founder, Brendan Foodie. Brendan, hey, how you doing? I'm doing great. It's awesome to be here and appreciate you're having, Alex. It's my pleasure, man. I love doing these. Talking to people who are building what's actually next for the economy literally never fails to give me a jolt like a good espresso. So thank you. Now, one reason why I wanted to talk to your company is because you have to,
Starting point is 01:04:04 this amazingly huge vision. You guys wrote that you founded the company because, and I quote, the labor market is the largest, most inefficient market in the world, which is a pretty big claim. So before we dive into how a worker is going to approach this and how you're going to get into the market, tell me why you think that and what brought you to that as the problem you wanted to solve. Yeah, it really comes down to this matching problem where when you were describing earlier, the reason that it's painful is that when a candidate is applying to a job, they can only apply to a couple dozen jobs. And when companies considering candidates in the market, they can only consider a fraction
Starting point is 01:04:41 of a percent of the people available that are looking for work because they need to solve this matching problem manually. They need to manually review resumes, manually conduct interviews, and manually decide who to hire. But when you're able to solve this matching problem at the cost of software, it makes way for this global unified labor market that every... Every Canada applies to and every company hires from. And so that's the end vision of the company and the North Star that we work backwards from. Okay.
Starting point is 01:05:11 So essentially, when I go out through look for jobs, I might look at my local area. I might look at jobs that companies that I've heard of. But no matter how far I cast my net from my perspective, I'm still missing out on most of the gigs. And therefore, most of the companies aren't hearing from as broad a talent pool as they might. Exactly. Yeah. Okay. So in the world in which we're dealing with kind of the demise of her,
Starting point is 01:05:32 work, isn't the labor market necessarily constrained by geography? Well, not precisely. I don't think we're dealing with the demise of remote work. I just think that in many ways, remote work might actually become even more prevalent. And part of this is that if you automate 90% of what it means to do remote knowledge work, that means that the bottleneck to productivity is the other 10%. And so for all the domains like software engineering, where demand is extremely elastic, will just produce, you know, 10 or 100 times more.
Starting point is 01:06:06 And so I think that humans and the role that we play in knowledge work both remotely as well as in person is going to become amplified with these huge increases and productivity that are starting to happen. So just, just in that down for folks, do you think that in time remote work is not going to go away? And therefore, there will be kind of a global talent pool. And therefore, if you want to access the best people, you're not. only going to have to have a much larger pool, but you're also going to look more broadly at the corporate level. This is certainly the case. And most importantly, that I think it'll be centralized, right? Instead of all of these decentralized, fragmented job searches and people, companies that are
Starting point is 01:06:48 looking to hire, I think it'll be one place that everyone goes, that every company hires from, that can facilitate all of these job matches that people love and companies are finding a lot of value. Okay. And I think you want that one place, that central hub to be, Merker, your company. Yeah. And if you succeed, this means that recruiters, as we know them today, are going to have a hard time. Probably generic, un-a-inable job boards like Indeed and similar are all going to get hit. So that's the scale of what you guys are shooting for, essentially a revolution of how people find insecure jobs around the world. Exactly. That's fantastic. Well, so one way of thinking about it is there's sort of these two ends of the spectrum. On one end of the spectrum, there's companies like LinkedIn or Indeed, their job boards. And they have this very broad distribution, but they only aggregate this very thin layer of the person's resume, right? And so they capture about 1,000th of the value chain associated with facilitating a hire. On the other hand of the spectrum, there's the services companies, the recruiting agencies, the staffing agencies that do all the manual work and heavy lifting.
Starting point is 01:07:59 to get their 30% of first year or whatever their fee structure is. But no one has been able to combine the distribution of LinkedIn and Indeed with the value capture and value add of a recruiting and staffing firm because previously it wasn't possible to automate services, right? But now that's all changing. Now it's becoming possible to automate everything that recruiters and staffing agencies would otherwise do in this unified way. that has the same kind of distribution scale of consumer platform with hundreds of millions of people.
Starting point is 01:08:36 All right. I wanted to start here because I want people to know where you're going to. But I'm now going to go the other direction and talk about what you're doing now because you guys have this really interesting, quote, public secret plan, if you will, of using, helping AI companies find the right talent that they need as essentially your wedge into the market. And to help explain this to folks, you guys want to learn. how to place candidates very effectively in this one area where there's a short feedback loop, if you will, so you can learn pretty quickly and then apply that more broadly over time. But I think even pigeonholed by some in the press is like helping people hire
Starting point is 01:09:12 for AI when that does seem to be kind of a small fraction of the company's vision. So what I want to know is how does it work today? I know you guys are working on one subset of the market, but walk me through from the candidate and the company side, how Merker actually does do what you're describing. So, candidate will come to us looking for opportunities for work. They'll see a lot of the jobs available or they can apply to one of those or talent pool more broadly. They'll upload their resume and they'll take an interview with the AI on our platform to then get evaluated to see what they're good fit for, what they're interested in, what kinds of matches we might be able to create for them in facilitating huge volumes, tens of thousands of these matches with no human process. and involvement. On the other side of the marketplace, there's companies that will give us these
Starting point is 01:10:04 requests of saying they need 100 people in a particular domain, maybe a subset of software engineering or investment bankers or whatever the professional domain is. And we will go out to the supply side of our marketplace, find all the people that are a good fit for them, and facilitate all of those contract opportunities. Tell me more about the AI interview. And I'll just say that I'm an AI bull. I think the stuff is really cool and is going to work well. On the other hand, I do have some friends that are in less prestigious jobs who have had to deal with some AI onboarding, some AI screenings. And they've been pretty negative about it.
Starting point is 01:10:43 So I'm curious what you guys have built and how well it works and kind of what candidates think of that part of the process. Yeah. So we built the first AI interviewer in March of 2023 when I was in my college dorm room. and initially I remember it would hallucinate like nothing else because it wasn't even Jupy 4 at that point, right? Oh, God. It had a 10-second latency. And it's been extraordinary to just see this tailwind of model improvement, you know, lift all boats and make these applications possible. And so a good heuristic for it is emulating all the processes that a human would otherwise do.
Starting point is 01:11:18 Where similar to how human would review resumes and conduct interviews, we automate all of the preparation. for the interview and conducting the interview and evaluation of an interview with that similar format and heuristic. And while I think some candidates obviously prefer to talk to a human because they think about it more as a selling process from the company rather than strictly a buying process, the overwhelming sentiment is that there's millions of people that apply to jobs and just get completely ghosted and don't even get the opportunity to interview. right? It'll be like less than 1% of people even get to talk with someone or show more than their
Starting point is 01:11:59 resume for for the most competitive jobs. And so globalizing that ability and not just like internationally, but also all across the U.S., for people to, you know, talk with the candidates and actually consider everyone for the opportunity has been really impactful. Does the AI interviewer you guys have built in its current interaction? Does it work with multiple languages? Because I presume that if you're only doing English, you're still constraining yourself to a small portion of the overall possible job takers. Yeah, we start out predominantly with English.
Starting point is 01:12:33 Now we're starting to do others, working with a lot of our customers to try to make sure the models improve their multilingual capabilities so that the interviewer is well set up with that. And I presume that India is a market of choice for you guys. But are there any other hotspots around the world where you're seeing a lot of talent saying, hey, we really want to be part of this new marketplace that you're building. Yeah, well, so actually the majority, the vast majority of our hires come from the U.S. now, over 60%.
Starting point is 01:13:01 India is the second largest geography that we hire from, but also seeing lots of Eastern Europe, lots of South America. One thing to note is that the average pay rate in our marketplace is over $90 an hour. And so it's very different from most labor marketplaces in that way. It's, you know, a totally different league from what you would find, with the crowdsourcing platforms like scale and surge that sort of built these legacy labor marketplaces with more in the range of $10 to $30 an hour pay rates. You guys are currently targeting a much more educated and rarefied employee pool, again,
Starting point is 01:13:38 as your starting point, as your wedge. So I want to talk about how good your system is at finding and placing people because it sounds good in theory, but in practice, of course, that's what matters. So what's the right metric that you guys track in terms of finding the right person for the right job and then having both sides of that equation be happy? Is it repeat business? Is it successful completion of a contract? What's the KPI there? Yeah, I think about it as a two-sided retention problem. And there's sort of leading indicators on each of those retention problems. So the supply side retention or applicant side retention is, of course, if they come back looking for work. And so we have models that predict what is the probability that they're going to be interested in particular job opportunity that we present them with based on all their prior experience so that we can ensure, you know, we're retaining candidates very well.
Starting point is 01:14:29 And then on the demand side, it's how well are those candidates performing, where we collect all of the performance reviews of who's doing well for what reasons and use that as our eval set as our benchmarks to go internally on how do we predict, based on the interviews, the kinds of people that are going to perform well that are going to translate to. a lot of value for our customers. And that's why our demands-side retention is so large. So now we're working with six out of the Mag 7. We have 16105% net revenue retention on an annual basis. And so it's sort of nuts. 1605%. So it's 1,600%.
Starting point is 01:15:08 So 16X net revenue retention. For folks out there who don't know why that's funny, mature software businesses will often turn in a net revenue retention number of 115% or plus 15. So the number that he put out there is humorously large and implies that customers are by not just a little bit more of the services, but 16 times as much over a one-year basis. Yeah, the two-side retention problem is working well. Yeah. No, two-side retention solution.
Starting point is 01:15:34 I don't think problem is the right word. Okay. So to me, AI companies that need to hire experts is a very, it's a market that has a lot of reason to do well, to invest in new ideas, to try to find the right people, because they're in a massive race to build the best stuff and therefore capture the market. How well do you think the market or model is going to translate when you go wider and you're dealing with people that might be less individually resume-wise, impressive, and are more average folks applying for more average jobs? Because to me, difficult job, difficult person might actually be
Starting point is 01:16:12 easier to match than finding the right person from a more general pool for a more general gig? Yeah. Well, we do hire a lot of people from general backgrounds as well. Like, it's really, really broad across literally every industry and the economy. But maybe stepping back a little bit, the background of the company is initially we were automating the processes of hiring people for our friends. And then Scale AI came to us and they used our platform to hire over a thousand people. And what we realized was that there was this huge transition in the market away from the crowdsourcing paradigm of low and medium-skilled talent towards this medium and high-skilled sourcing and vetting problem, not only with higher caliber people, but also people that work directly with the AI
Starting point is 01:16:57 researchers to help them interpret evals and push the frontier of model capabilities. And that is one of the core reasons that the performance data we collect and all the things we learn and how we facilitate better matches is actually much more similar to a general work environment than most people realize on face. When you say, Brendan, that you're collecting the e-vails, my impression of how this worked was I would come to you, resume, interview, job, and I would go work for someone else for some period of time. But if you're collecting the evils to ingest back into your process, does that mean that
Starting point is 01:17:30 when I land a contract or a gig from Merker that I'm actually working for you guys? So technically, the work product that our contractors are producing ends up going to the customers. What we own is just the performance review on like, did this person do a good job on the project? So the end customer tells you how the employee did, and that's part of the overall arrangement. So the feedback does make it even if they're working for someone else. Yeah. Yeah. And one interesting thing is we facilitate the entire payment stack.
Starting point is 01:18:01 So we learn from who's getting bonuses, who's getting raises, who's getting raises, who's getting dismissed for what reasons, all as part of the data flywheel. Okay, so people love to talk about the usefulness of data and how the more you have, the smarter you can be. So how much better have your systems become after ingesting increasing amounts of performance reviews, bonus information, contract retention and so forth, all this stuff we've talked about. How quickly does that filter back in and actually generate a material and measurable improvement in your ability to place candidates? Yeah, pretty much on a weekly cadence. I mean, we, went from where it's been most impactful and our focus has been is predicting people at the high
Starting point is 01:18:42 end. And the reason is there's this dynamic on a project where if we're providing 100 people to work with a given customer, the top 10% of people are going to drive majority of the value. It's similar to like if you have a team of 100 software engineers, probably the 10 core people are these like this notion of 10x software engineers that are driving. Are you telling me that power laws exist? Power laws exist, right? in many knowledge work verticals.
Starting point is 01:19:07 But the implications of that from our standpoint are profound, because if we're able to build interviews and assessments and all this technology that can predict these power law outcomes, the amount of value that we drive for our customers and performance of those people and the quality is extraordinary. Okay, so let's play devil's advocate here because I'm always on one hand a capitalist technologist, and on the other hand, I'm a person who has friends
Starting point is 01:19:32 who have to make rent. So if you can help people find the 10x engineers and the 10x engineers only, like let's say you can just say, look, here are the top five people and here's 45 others you can hire if you want. Aren't we going to end up with a labor market in which companies are able to hire the best and get more out of them and then they're going to need fewer total people? I'm just worried about folks who are like B-students. Yeah. Well, so I sympathize with a concern. But one thing I've come to appreciate is that people sometimes over-ins. index on the dimension of how exceptional someone is and under index on how relevant their
Starting point is 01:20:09 experience is. And that if you match people with the right thing that they're extraordinary at, you can unlock these phenomenal outcomes. All right. Now, I want to talk about results because you guys raised, I think it was a series B earlier this year, quite a large round, quite a large valuation. And I think Tech Run said your revenue was somewhere in the realm of 75 million ARR. You guys have also talked about, you know, 50% a month at one point in time. So one, since you're lost funding round, has growth stayed as hot as it had before? And then when do you think will be the right time to use the wedge to expand your remit and try to take on a wider range of jobs and also longer tenure jobs? Yeah. So I'll end to the first part. The business has grown by an order of
Starting point is 01:20:51 magnitude since we received our term sheet for the series B. So the growth has been incredible. Happy investors, very happy investors. Well profitable the entire time. And, And so we have more cash in the bank than we've ever raised, which is unique for an AI company. And then to your second point of timing the expansion to new markets, the context for why we focus on the AI labs is we realize that there's more of a comparative advantage when we focus on hiring someone for five weeks versus five years. Like when we're hiring someone for five years, you want to get dinner with them, build a trust in a relationship. Five weeks, you want this fast, efficient AI interviews, automated process. And so we're leaning into that a lot now in the wake of the scale news with them no longer in the market. There's so much market poll that we're focused on capturing that.
Starting point is 01:21:44 But what a lot of people don't realize is that throughout the duration of the business, we've still been doing lots of contract hiring outside of the market for AI labs and lots of full-time hiring for ourselves and our friends. We still have the customers from 2023 before we even started working with AI labs that are continuing to grow with us. And so are starting to also ramp up investments and a lot of those other kinds of hiring work. So it sounds a little bit like I asked a question that was more relevant maybe a year ago. But instead, you guys are expanding your work with the AILAS and doing other things. So it sounds like the wedge is wedging and you guys are already growing your remit. Okay. Now, before you go, two things.
Starting point is 01:22:24 One, where can people find the company on the internet? And then two, is there a particular role that you're looking to hire for and want to shout it out into the broader world to see if the right? candidate's waiting for you, which is ironic, I know, given the conversation. Absolutely. People can go find us up at Mercor.com, M-E-R-C-O-R-R-com, and we're hiring for a huge volume of roles, particularly lots of software engineers. So for any software engineers that are looking to join our team full-time, we're super eager to talk to you for anyone looking to join our marketplace, whether it's part-time or full-time as a contractor. We pay exceptionally well. We have phenomenal satisfaction on the marketplace and retention as we talked about. And so
Starting point is 01:23:06 we would love the opportunity to work with you. Do you use your own software as the place you source some of your own internal candidates? Of course, absolutely. So you do like the taste of dog food. All right. Well, Brendan, thank you so much. We'll have you back on in, I don't reply six months when you do eventually raise more money than I can ask you about that. But in the meantime, thank you. And we'll see you soon. Yeah, for sure. See yeah.

There aren't comments yet for this episode. Click on any sentence in the transcript to leave a comment.